I am trying to build a compute cluster from Odroid N2. But seems like the core 5-6 does not speed-up the application at all. And the speed up is also not linear along the cores.
Here are some measures from my MPI based application. I measured a total calculation time for a function.
1 core 0.838052sec
2 core 0.438483sec
3 core 0.405501sec
4 core 0.416391sec
5 core 0.514472sec
6 core 0.435128sec
12 core (4 core from 3 N2 boards) 0.06867sec
18 core (6 core from 3 N2 boards) 0.152759sec
I am using the Armbian Focal mainline kernel 5.6.y image.
Does it need any special configuration to use the Dual-Core Cortex-A53 CPU?
Is there any response time or syncronization time which shall be considered when using the Dual-Core Cortex-A53 CPU and using MPI?
nproc says there are 6 cores available.
int MyFun(int *array, int num_elements, int j)
{
int result_overall = 0;
for (int i = 0; i < num_elements; i++)
{
result_overall += array[i] / 1000;
}
return result_overall;
}
int compute_sum(int* sub_sums,int num_of_cpu)
{
int sum = 0;
for(int i = 0; i<num_of_cpu; i++)
{
sum += sub_sums[i];
}
return sum;
}
//measuring performance from main():
if (world_rank == 0)
{
startTime = std::chrono::high_resolution_clock::now();
}
// Compute the sum of your subset
int sub_sum = 0;
for(int j=0;j<1000;j++)
{
sub_sum += MyFun(sub_intArray, num_elements_per_proc, world_rank);
}
MPI_Allgather(&sub_sum, 1, MPI_INT, sub_sums, 1, MPI_INT, MPI_COMM_WORLD);
int total_sum = compute_sum(sub_sums, num_of_cpu);
if (world_rank == 0)
{
elapsedTime = std::chrono::high_resolution_clock::now() - startTime;
timer = elapsedTime.count();
}