Jump to content

ejolson

Members
  • Posts

    12
  • Joined

  • Last visited

Reputation Activity

  1. Like
    ejolson got a reaction from FossCoder in Rock64 Focal Fossa Memory Frequency   
    Here is a performance comparison between the original 786MHz memory clock and the reduced 333MHz clock when running John McCalpin's stream memory bandwidth benchmark.
     


    The performance reduction for the scale operation was measured to be about 2.26 times which is slightly less than the expected 786/333=2.36 factor loss expected by just dividing out the clocks.
    In real world applications much of this performance reduction might be mitigated by cache memory; however, it would seem having a 600MHz setting might be a better compromise between super slow and super unstable.
     
    At any rate, I'll be performing some stress tests over the next few days to determine whether the system is really stable now.  For reference, I also have a thread running on the Pine64 forum
     
    https://forum.pine64.org/showthread.php?tid=11209&pid=76363#pid76363
     
    and will update both as resolved as soon as I verify the system is now stable.
  2. Like
    ejolson got a reaction from FossCoder in Rock64 Focal Fossa Memory Frequency   
    It seems patching the SD card is not very difficult.  I downloaded
     
    rk3328_ddr_333MHz_v1.16.bin
    rk3328_miniloader_v2.46.bin
     
    Then I followed the instructions on
     
    http://opensource.rock-chips.com/wiki_Boot_option
     
    typed as root

    # mkimage -n rk3328 -T rksd -d rk3328_ddr_333MHz_v1.16.bin idbloader16.img # cat rk3328_miniloader_v2.46.bin >>idbloader16.img # dd if=idbloader16.img of=/dev/mmcblk0 seek=64 conv=notrunc # sync
    and then rebooted.
     
    I'll report back later how much slower everything is later and whether the segmentation faults and kernel oops are gone.  Thanks for all your work on Armbian!
  3. Like
    ejolson got a reaction from FossCoder in Rock64 Focal Fossa Memory Frequency   
    My understanding is that this is a common problem with the Rock64 v2 boards and simply the result of some optimistic overclocking that should never have been done in the first place.  While such overclocking appears to be necessary to meet the minimum performance needed for playing back certain high-definition video, my usage for the Rock64 is not watching television but for it to function as a computer.
     
    This is essentially a new board that sat in storage for six months due to certain shelter at home rules.  As returning it is not an option, I would instead like to reduce the memory frequency to a rate that runs reliably.  I was having difficulty converting the instructions I read for Arch Linux to Armbian and thought I'd ask here.  Before adding some more waste to the landfill, I'd like to give the 333 MHz option a thorough test.
     
    I understand your reluctance to assist.  It must be irritating to continually deal with manufacturers that cut corners in unexpected ways with later revisions of their hardware.  Any guidance how to get started rebuilding the uboot initialization code for Armbian with the desired memory frequency would be appreciated.
  4. Like
    ejolson got a reaction from Myy in Rock64 Focal Fossa Memory Frequency   
    Here is a performance comparison between the original 786MHz memory clock and the reduced 333MHz clock when running John McCalpin's stream memory bandwidth benchmark.
     


    The performance reduction for the scale operation was measured to be about 2.26 times which is slightly less than the expected 786/333=2.36 factor loss expected by just dividing out the clocks.
    In real world applications much of this performance reduction might be mitigated by cache memory; however, it would seem having a 600MHz setting might be a better compromise between super slow and super unstable.
     
    At any rate, I'll be performing some stress tests over the next few days to determine whether the system is really stable now.  For reference, I also have a thread running on the Pine64 forum
     
    https://forum.pine64.org/showthread.php?tid=11209&pid=76363#pid76363
     
    and will update both as resolved as soon as I verify the system is now stable.
  5. Like
    ejolson got a reaction from tkaiser in [NanoPi M3] Cheap 8 core (35$)   
    I can partially confirm your claim that 12 Gflops is possible with a more efficient heatsink+fan.  In particular, I'm currently testing the M3's bigger brother the FriendlyArm NanoPi T3 which has the same 8-core SOC but a different heatsink.  Following the same build_instructions, I obtained 12.49 Gflops with version 2.2 of the linpack benchmark linked against version 0.2.19 of the OpenBLAS library.  My cooling arrangement looks like this.
     

     
    With the cover on the heat is trapped and the system throttles; however, upon removing the cover and due to the giraffe the system runs at full speed.  The Raspberry Pi 3 does about 6.2 Gflops with a similar cooling system.
  6. Like
    ejolson got a reaction from Igor in [NanoPi M3] Cheap 8 core (35$)   
    It is definitely possible to introduce parallel processing with a quad core. My thread on the Raspberry Pi forum discusses compiling new versions on gcc with support for the MIT/Intel Cilk parallel programming extensions on ARM devices. The compiler is tested with parallel algorithms for sorting, prime number sieves, fast Fourier transforms, computing fractal basins of attractions for complex Newton methods, numerical quadrature and approximating solutions to high-dimensional systems of ODEs.  
    It is of practical interest how well the implementation of each algorithm scales on physical hardware. Due to constraints of shared memory and I/O very few algorithms scale linearly with the number of cores over a very large range. This leads to an engineering problem that may trade off algorithmic efficiency for parallel scalability to achieve fastest execution times on multi core CPUs.
     
    With a quad core CPU the maximum theoretical scaling is four fold, while an eight fold increase is possible with an octo core. Modern compute nodes have 16 to 48 CPU cores and thousands of GPU cores. Thus, while it is possible to consider parallel optimization for a quad core, the problem is more interesting with eight cores.
  7. Like
    ejolson got a reaction from wildcat_paris in [NanoPi M3] Cheap 8 core (35$)   
    It is definitely possible to introduce parallel processing with a quad core. My thread on the Raspberry Pi forum discusses compiling new versions on gcc with support for the MIT/Intel Cilk parallel programming extensions on ARM devices. The compiler is tested with parallel algorithms for sorting, prime number sieves, fast Fourier transforms, computing fractal basins of attractions for complex Newton methods, numerical quadrature and approximating solutions to high-dimensional systems of ODEs.  
    It is of practical interest how well the implementation of each algorithm scales on physical hardware. Due to constraints of shared memory and I/O very few algorithms scale linearly with the number of cores over a very large range. This leads to an engineering problem that may trade off algorithmic efficiency for parallel scalability to achieve fastest execution times on multi core CPUs.
     
    With a quad core CPU the maximum theoretical scaling is four fold, while an eight fold increase is possible with an octo core. Modern compute nodes have 16 to 48 CPU cores and thousands of GPU cores. Thus, while it is possible to consider parallel optimization for a quad core, the problem is more interesting with eight cores.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines