Odroid N2 speedup is not as expected - MPI - mpich

katonar · July 15, 2020

I am trying to build a compute cluster from Odroid N2. But seems like the core 5-6 does not speed-up the application at all. And the speed up is also not linear along the cores.

Here are some measures from my MPI based application. I measured a total calculation time for a function.

1 core 0.838052sec
2 core 0.438483sec
3 core 0.405501sec
4 core 0.416391sec
5 core 0.514472sec
6 core 0.435128sec

12 core (4 core from 3 N2 boards) 0.06867sec

18 core (6 core from 3 N2 boards) 0.152759sec

I am using the Armbian Focal mainline kernel 5.6.y image.

Does it need any special configuration to use the Dual-Core Cortex-A53 CPU?

Is there any response time or syncronization time which shall be considered when using the Dual-Core Cortex-A53 CPU and using MPI?

nproc says there are 6 cores available.

int MyFun(int *array, int num_elements, int j)
{
  int result_overall = 0;

  for (int i = 0; i < num_elements; i++)
  {
    result_overall += array[i] / 1000;
  }
  return result_overall;
}

int compute_sum(int* sub_sums,int num_of_cpu)
{
  int sum = 0;
  for(int i = 0; i<num_of_cpu; i++)
  {
    sum += sub_sums[i];
  }
  return sum;
}

//measuring performance from main():
  if (world_rank == 0)
  {
    startTime = std::chrono::high_resolution_clock::now();
  }
  // Compute the sum of your subset
  int sub_sum = 0;
  for(int j=0;j<1000;j++)
  {
    sub_sum += MyFun(sub_intArray, num_elements_per_proc, world_rank);
  }

  MPI_Allgather(&sub_sum, 1, MPI_INT, sub_sums, 1, MPI_INT, MPI_COMM_WORLD);

  int total_sum = compute_sum(sub_sums, num_of_cpu);
  if (world_rank == 0)
  {
    elapsedTime = std::chrono::high_resolution_clock::now() - startTime;
    timer = elapsedTime.count();
  }

Edited July 15, 2020 by katonar