katonar Posted July 15, 2020 Share Posted July 15, 2020 (edited) I am trying to build a compute cluster from Odroid N2. But seems like the core 5-6 does not speed-up the application at all. And the speed up is also not linear along the cores. Here are some measures from my MPI based application. I measured a total calculation time for a function. 1 core 0.838052sec 2 core 0.438483sec 3 core 0.405501sec 4 core 0.416391sec 5 core 0.514472sec 6 core 0.435128sec 12 core (4 core from 3 N2 boards) 0.06867sec 18 core (6 core from 3 N2 boards) 0.152759sec I am using the Armbian Focal mainline kernel 5.6.y image. Does it need any special configuration to use the Dual-Core Cortex-A53 CPU? Is there any response time or syncronization time which shall be considered when using the Dual-Core Cortex-A53 CPU and using MPI? nproc says there are 6 cores available. int MyFun(int *array, int num_elements, int j) { int result_overall = 0; for (int i = 0; i < num_elements; i++) { result_overall += array[i] / 1000; } return result_overall; } int compute_sum(int* sub_sums,int num_of_cpu) { int sum = 0; for(int i = 0; i<num_of_cpu; i++) { sum += sub_sums[i]; } return sum; } //measuring performance from main(): if (world_rank == 0) { startTime = std::chrono::high_resolution_clock::now(); } // Compute the sum of your subset int sub_sum = 0; for(int j=0;j<1000;j++) { sub_sum += MyFun(sub_intArray, num_elements_per_proc, world_rank); } MPI_Allgather(&sub_sum, 1, MPI_INT, sub_sums, 1, MPI_INT, MPI_COMM_WORLD); int total_sum = compute_sum(sub_sums, num_of_cpu); if (world_rank == 0) { elapsedTime = std::chrono::high_resolution_clock::now() - startTime; timer = elapsedTime.count(); } Edited July 15, 2020 by katonar 0 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.