tkaiser Posted June 24, 2016 Posted June 24, 2016 (edited) Check the following cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor find /sys -iname "*temp*" Then this is Cortex-A53 so 'armv7l' is already an indication that there's something wrong. This is one of the few cases where running 'sysbench' might be interesting, just to see whether this device will finish within a few seconds (like ODROID-C2, Pine64 or other Cortex-A53 implementations do that use ARMv8 code) or is as slow as RPi 3 (also Cortex-A53 but used in 32-bit mode only). And while you run sysbench you should monitor /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq constantly. You can install RPi-Monitor this way: apt-get install perl librrds-perl libhttp-daemon-perl libwww-perl libjson-perl libipc-sharelite-perl libfile-which-perl wget -O /tmp/rpimonitor_2.10-1_all.deb https://github.com/XavierBerger/RPi-Monitor-deb/blob/master/packages/rpimonitor_2.10-1_all.deb?raw=true dpkg -i /tmp/rpimonitor_2.10-1_all.deb This will at least enable monitoring of CPU clockspeed (and therefore throttling). And if you find a temperature node somewhere below /sys/ then you can adjust the relevant template to monitor thermal values too. Lowering temperatures is a complex process and involves deep knowledge, huge amounts of tests and a device worth the efforts (not true at all for this NanoPi M3 based on what's known yet) Edited July 8, 2016 by tkaiser Removed misleading Armbian lalala
wildcat_paris Posted June 24, 2016 Posted June 24, 2016 (edited) i bought nano pi M3 . but CPU is very hot .but i can not mesure its temperature .(lm-sensors is not OK) usually armbian could make its temperature lower , this was done in case of orange pi PC . I bought a Nano Pi M1. CPU is also hot. I put a copper heatsink. idle mode: 56°C, external copper heatsink 45°C (infrared sensor) Don't buy FriendlyARM boards with no proper power regulator (as tkaiser would write very well). try (may or may not vote work) cat /sys/class/thermal/thermal_zone?/temp Edited June 24, 2016 by wildcat_paris Brexit vote bug
hatahata Posted July 3, 2016 Posted July 3, 2016 about nano pi M3 cross compile is possible on debiandog64 (perhaps intel 64bit and debian etc 64bit) see -> http://akita-8core.blogspot.jp/2016/06/nano-pi-m3.html so diff .config .config-ori 53c53 < CONFIG_SWAP=y --- > # CONFIG_SWAP is not set 96d95 < # CONFIG_CGROUP_MEM_RES_CTLR_SWAP is not set 1801,1802c1800 < CONFIG_THERMAL=y < CONFIG_THERMAL_HWMON=y --- > # CONFIG_THERMAL is not set may show CPU temperature , but only posibility . --- regards
hatahata Posted July 4, 2016 Posted July 4, 2016 this morning , i try to boot nano pi M3 by rebuilded uIimage . nano pi M3 can boot , but swapon /SWAP swapon: /SWAP: swapon failed: Function not implemented --- regards
tkaiser Posted July 7, 2016 Posted July 7, 2016 about nano pi M3 cross compile is possible on debiandog64 (perhaps intel 64bit and debian etc 64bit) So what? Cross-compiling on a 64-bit Ubuntu system is what we do all the times for all our 32-bit kernels / OS images. Doesn't mean anything whether you can use a 64-bit host or not. NanoPi M3 has only a 32-bit kernel which is bad. The rest of your posting (thermal stuff) I didn't understand. Fortunately others provide information how to read-out temperatures and how horribly this M3 will overheat without a HUGE heatsink + ventilation: http://climbers.net/sbc/40-core-arm-cluster-nanopc-t3/ 1
hatahata Posted July 7, 2016 Posted July 7, 2016 i am suprising your vast knowlege . i only follow 'Install Cross Compiler' of http://wiki.friendlyarm.com/wiki/index.php/NanoPi_M3#Boot_NanoPi_M3_from_SD_Card i use 64bit CPU and 64bit debiandog . i finally underastand that nano pi M3 is 64bit hardware but the present OS is 32 bit .surely nano pi M3 becomes hot and hot and suddenly reboot .so i made the device which misses heat .now there is no rebooting . Because it doesn't work easily, it's fascinating.Because there is mystery, a detective story is read.Because there is a mountain, it's climbed. so i think . --- regards http://akita-8core.blogspot.jp/2016/06/nano-pi-m3.html
wildcat_paris Posted July 7, 2016 Posted July 7, 2016 @hatahata i am suprising your vast knowledge . so i made the device which misses heat .now there is no rebooting . TK is surely straight to the point and quite knowledgable, don't be surprised. you made quite a remarquable DIY heatsink have you put thermal paste (even cheap silicon one is enough) between the layers of copper/copper coins/aluminium please?
hatahata Posted July 7, 2016 Posted July 7, 2016 of course in japan , 10 yen coin ( about $0.1) is made by copper and is the cheapest material. the best is silver coin . are there coins made by copper in USA ? and acryl plate is easily holed out with metal drill . alminium cap presses coins against heat sink and so fixes coins and heat sink . i use
tkaiser Posted July 8, 2016 Posted July 8, 2016 I learned recently that there are three different types of heatsinks regarding the fins on top. Large distance between fins means: Convection does the job. Small distance between fins (like in your case) means: Forced airflow is necessary. Putting coins on top means: Ineffective since now air is not dissipating heat but an insulator Please run sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=8 sysbench --test=cpu --cpu-max-prime=200000 run --num-threads=8 and report back the times it takes and monitor CPU clockspeed (throttling!) /sys/class/hwmon/hwmon0/device/temp_label at the same time (as already outlined you simply could install RPi-Monitor and then exchange the one line below /etc/rpimonitor/templates that reads out /sys/class/thermal/thermal_zone0/temp)
hatahata Posted July 8, 2016 Posted July 8, 2016 date ; sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=8 ;dateFri Jul 8 15:24:54 JST 2016sysbench 0.4.12: multi-threaded system evaluation benchmarkRunning the test with following options:Number of threads: 8Doing CPU performance benchmarkThreads started!Done.Maximum prime number checked in CPU test: 20000Test execution summary: total time: 57.0156s total number of events: 10000 total time taken by event execution: 455.8727 per-request statistics: min: 45.43ms avg: 45.59ms max: 77.53ms approx. 95 percentile: 45.67msThreads fairness: events (avg/stddev): 1250.0000/0.87 execution time (avg/stddev): 56.9841/0.02Fri Jul 8 15:25:51 JST 2016 and while sysbench --test=cpu --cpu-max-prime=200000 run --num-threads=8 , Fri Jul 8 15:32:58 JST 2016 87fa@NanoPi3:~$ date ; cat /sys/class/hwmon/hwmon0/device/temp_labelFri Jul 8 15:33:03 JST 2016 87fa@NanoPi3:~$ date ; cat /sys/class/hwmon/hwmon0/device/temp_labelFri Jul 8 15:33:06 JST 2016 88fa@NanoPi3:~$ date ; cat /sys/class/hwmon/hwmon0/device/temp_labelFri Jul 8 15:33:09 JST 2016 85fa@NanoPi3:~$ date ; cat /sys/class/hwmon/hwmon0/device/temp_labelFri Jul 8 15:33:14 JST 2016 88fa@NanoPi3:~$ date ; cat /sys/class/hwmon/hwmon0/device/temp_labelFri Jul 8 15:33:18 JST 2016 87fa@NanoPi3:~$ date ; cat /sys/class/hwmon/hwmon0/device/temp_labelFri Jul 8 15:33:23 JST 2016 88fa@NanoPi3:~$ date ; cat /sys/class/hwmon/hwmon0/device/temp_labelFri Jul 8 15:33:25 JST 2016 85fa@NanoPi3:~$ date ; cat /sys/class/hwmon/hwmon0/device/temp_labelFri Jul 8 15:33:35 JST 2016 86fa@NanoPi3:~$ date ; cat /sys/class/hwmon/hwmon0/device/temp_label max is 88 . thanks for advice of detectiing Nano pi M3's CPU temperature !
tkaiser Posted July 8, 2016 Posted July 8, 2016 execution time (avg/stddev): 56.9841/0.02 fa@NanoPi3:~$ date ; cat /sys/class/hwmon/hwmon0/device/temp_label Fri Jul 8 15:33:14 JST 2016 88 To translate this: An ODROID-C2 (quad core Cortex-A53) is able to finish the same test in 3.x seconds (so now you know that it might make a huge difference to be able to execute ARMv8 code on ARMv8 cores -- NanoPi M3 is here as bad as RPi 3) And then exceeding 88°C means this: your heatsink approach doesn't work (compare with the measurements here -- 85° are also reached without any heatsink!) throttling occurs and we don't know how low clockspeed has been adjusted. Without also monitoring cpufreq stuff (as already outlined) your results are worthless Still: Why don't you simply install RPi-Monitor, adjust one single line in a template (temperature) and start to test like an adult?
tkaiser Posted July 8, 2016 Posted July 8, 2016 thanks for advice of detectiing Nano pi M3's CPU temperature ! You never follow advises you get (why?!). Read above in the first post on this page: find /sys -iname "*temp*" Time to stop since it gets boring repeating the same stuff over and over again.
hatahata Posted July 8, 2016 Posted July 8, 2016 i imagine banana pi 2 core -> 25 degree odroid-c2 4 core -> 55 degree nano pi M3 8 core ->85 degree then arm 16 core , 115 degree only imagination in fact odroid's 8 core ($79) has active cooling . --- regards ps: nano pi M3 has 3 problems 1) temperature : tkaiser solved 2) cannot use swap 3) cannot see youtube
mattelacchiato Posted July 16, 2016 Posted July 16, 2016 Hi! I'm thinking about to buy this board as mini NAS. Could somebody provide the max write performance (USB HDD) and network stats? Write performance: time (dd if=/dev/zero of=/path/to/usb-drive bs=1M count=1K && sync) Network Performance: On pi: iperf -s On your pc (wired): iperf -c <pi-IP> thanks a lot! Matthias
hatahata Posted July 20, 2016 Posted July 20, 2016 i try partially. 1) # date ; dd if=/dev/zero of=/ma1/k-test bs=1M count=1K && sync ; dateThu Jul 21 04:32:26 JST 20161024+0 records in1024+0 records out1073741824 bytes (1.1 GB) copied, 37.0172 s, 29.0 MB/sThu Jul 21 04:33:08 JST 2016 hard disk is old IDE . 2) iperf -s------------------------------------------------------------Server listening on TCP port 5001TCP window size: 85.3 KByte (default)------------------------------------------------------------ I only use 5mm thickness heat conducting material and alminium cap only . then Thu Jul 21 04:46:25 JST 2016 temperature -> 60 total used free shared buffers cachedMem: 872328 821296 51032 17908 21068 626484 when using plus active cooling by cooleng fan for raspberry pi Thu Jul 21 05:03:42 JST 2016 42 total used free shared buffers cachedMem: 872328 823560 48768 17908 23628 626752 i think 8 core is like a 8 cylinder engine . so it is harder to controll than 2 cylinder engine . but more and more expirience is piled up , someday breakthrough occur , i believe ---- regards
hatahata Posted July 22, 2016 Posted July 22, 2016 hi all . when sysbench --test=cpu --cpu-max-prime=200000 run --num-threads=8case 1) fan from upper , CPU temperature is 85 ~ 86 degree case 2) fan from lateral side , CPU temperature is 75 ~ 76 degree lateral is much better without fan Fri Jul 22 11:42:19 JST 201658 start lateral side fun Fri Jul 22 11:47:39 JST 201641
fnecboy Posted July 26, 2016 Posted July 26, 2016 Thank you very much for your interest in FriendlyARM's product. As far as M3's overheat issue is concerned I suggest trying the M3's cooling package which includes a specifically designed heat sink and a cooling fan: With these two utilities applied to the M3 the overheat issue will be greatly relieved.
tkaiser Posted July 26, 2016 Posted July 26, 2016 As far as M3's overheat issue is concerned I suggest trying the M3's cooling package which includes a specifically designed heat sink and a cooling fan Thanks for pointing this out. Since a lot of people discussed here NanoPi M3 I think it's ok to inform (potential) customers that FriendlyARM designed a specific heatsink + fan and has this in stock. Also good to see that you implemented a sane mounting solution and that you clearly speak of an 'overheat issue' so customers know that improved heat dissipation is necessary when thinking about constant higher loads It would be interesting to get some numbers regarding efficiency with and w/o fan as with NanoPC-T3 here: http://climbers.net/sbc/40-core-arm-cluster-nanopc-t3/ Apart from that you should be careful with product announcements here. Posts that look like spam trigger deletion and a blocked account pretty fast (Armbian mods try hard to keep the forums free from spam and normally act within minutes)
hatahata Posted July 27, 2016 Posted July 27, 2016 i find a page about heat sink of parallela ( http://www.rs-online.com/designspark/electronics/jpn/blog/content-1032). this also has lateral sid fan .
tkaiser Posted August 10, 2016 Posted August 10, 2016 As far as M3's overheat issue is concerned I suggest trying the M3's cooling package which includes a specifically designed heat sink and a cooling fan: Thanks for adding this to your support package (arrived yesterday). I'm currently looking a bit around -- it's quite easy to adopt RPi-Monitor templates since SoC temperature and dvfs settings are available through sysfs: root@NanoPi3:/# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_voltages 1275000 1225000 1175000 1125000 1100000 1075000 1050000 1025000 1000000 1000000 1000000 root@NanoPi3:/# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies 1400000 1300000 1200000 1100000 1000000 900000 800000 700000 600000 500000 400000 root@NanoPi3:/# cat /sys/class/hwmon/hwmon0/device/temp_label 87 These 87°C are only with heatsink applied and no fan while running sysbench cpu test and letting the SoC already throttle down to 400 MHz. With fan active cpufreq increases again and it's only a loss of 15 percent compared to full performance when no throttling occurs. In other words: yes, both heatsink and fan really help but NanoPi M3 has to be mounted vibration free otherwise the small fan sounds pretty annoying. Apart from that it seems you did a tremendous job regarding kernel, there's almost everything exposed through sysfs. I really start to like that board even if it doesn't match my normal use cases (due to lack of IO bandwidth)
tkaiser Posted August 10, 2016 Posted August 10, 2016 Some performance numbers: As we already know NanoPi M3's SoC is prone to overheating therefore throttling is an issue in case heavy workloads last longer ('longer' as in 'more than approx. 60 seconds'). So I let a few tests ran again using sysbench since NanoPi M3 is the other Cortex-A53 that makes no use of ARMv8 instruction sets since it comes with a 32-bit only kernel + userland just like RPi 3 (sysbench can not be used to compare different architectures since with ARMv8 code execution might be 15 times faster -- but we're producing numbers for ARMv7 code and can therefore compare with other 32-bit platforms) Without a heatsink: 455.5495 seconds and 79°C when running sysbench on a single CPU core, with subsequent runs adding more threads always the throttling treshold 85°C will be hit while execution time is around ~260 seconds (should be at ~230 secs without throttling when running with 2 threads) and heavy throttling occurs so not even 2 cores can run with heavy workloads at the same time without performance impacts. A heatsink is therefore mandatory if you want to do anything more heavy with this SBC. With FriendlyARM's heatsink but fan inactive: When utilizing only 1, 2 or 3 CPU cores no throttling occured but with 4 CPU cores fully active slight throttling started (test execution should finish within 114 seconds but the 3rd run with --num-threads=4 already took 122 seconds and the throttling treshold temperature 85°C has been reached). All subsequent runs with --num-threads=5/6/7/8 took around ~122 seconds so with this heatsink without the fan being active you get pretty much the same ARMv7 performance as an RPi 3 (which maxes out at 120 seconds remaining at 80°C so that no throttling occurs if you use a small heatsink). But for shorter load bursts (lasting less than 3 minutes) or with active cooling NanoPi M3 is over twice as fast as RPi 3 so it depends on whether you're willing to use an annoying fan or your workload whether CPU performance outperforms other SBCs or not. Attached the log of the different runs always adding one more thread to sysbench execution. I queried /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state before each new run so it's obvious how NanoPi M3's kernel throttles (interactive governor): At idle it remains at 400 MHz, when load increases it immediately jumps to 1.4 GHz but when the SoC temperature hits 85°C cpufreq will be lowered down to 800 MHz or lower if that's not sufficient. 1000-1300 MHz are not used. So there's some room for improvements in FA's kernel or maybe I'm just too inexperienced with this Nexell platform to know the tweaks. Format is: 2 newlines, then in a single line time stamp and current SoC temperature, followed by time_in_state to compare before/after and then the execution times of three sysbench runs each thereby increasing the --num-threads= by one: tk@NanoPi3:~$ cat /var/log/performance.log Fri Jan 1 17:45:34 CST 2016 55C 1400000 216 1300000 0 1200000 1 1100000 0 1000000 0 900000 4 800000 4 700000 9 600000 0 500000 15 400000 1039 Now testing 1 thread: tk@NanoPi3:~$ cat /var/log/performance.log Fri Jan 1 17:45:34 CST 2016 55C 1400000 216 1300000 0 1200000 1 1100000 0 1000000 0 900000 4 800000 4 700000 9 600000 0 500000 15 400000 1039 Now testing 1 thread: execution time (avg/stddev): 19181615.5700/0.00 execution time (avg/stddev): 455.5304/0.00 execution time (avg/stddev): 455.4859/0.00 Wed Aug 10 18:14:21 CST 2016 64C 1400000 136900 1300000 0 1200000 1 1100000 0 1000000 0 900000 4 800000 4 700000 9 600000 0 500000 27 400000 1066 Now testing 2 threads: execution time (avg/stddev): 227.7513/0.00 execution time (avg/stddev): 227.7390/0.02 execution time (avg/stddev): 227.7543/0.00 Wed Aug 10 18:25:45 CST 2016 72C 1400000 205236 1300000 0 1200000 1 1100000 0 1000000 0 900000 4 800000 4 700000 9 600000 0 500000 27 400000 1066 Now testing 3 threads: execution time (avg/stddev): 151.9069/0.01 execution time (avg/stddev): 151.8217/0.02 execution time (avg/stddev): 151.8324/0.01 Wed Aug 10 18:33:20 CST 2016 79C 1400000 250804 1300000 0 1200000 1 1100000 0 1000000 0 900000 4 800000 4 700000 9 600000 0 500000 27 400000 1066 Now testing 4 threads: execution time (avg/stddev): 113.8653/0.01 execution time (avg/stddev): 115.5463/0.01 execution time (avg/stddev): 122.3739/0.02 Wed Aug 10 18:39:12 CST 2016 84C 1400000 283720 1300000 0 1200000 1 1100000 0 1000000 0 900000 4 800000 1828 700000 459 600000 0 500000 27 400000 1066 Now testing 5 threads: execution time (avg/stddev): 116.5424/0.02 execution time (avg/stddev): 122.9468/0.01 execution time (avg/stddev): 122.7339/0.02 Wed Aug 10 18:45:15 CST 2016 85C 1400000 302574 1300000 0 1200000 1 1100000 0 1000000 0 900000 4 800000 6922 700000 5669 600000 6614 500000 491 400000 1066 Now testing 6 threads: execution time (avg/stddev): 121.1502/0.01 execution time (avg/stddev): 122.4795/0.01 execution time (avg/stddev): 121.6814/0.02 Wed Aug 10 18:51:20 CST 2016 84C 1400000 315553 1300000 0 1200000 1 1100000 0 1000000 0 900000 4 800000 11274 700000 10020 600000 10964 500000 4926 400000 7143 Now testing 7 threads: execution time (avg/stddev): 121.2647/0.02 execution time (avg/stddev): 122.3777/0.03 execution time (avg/stddev): 123.4143/0.03 Wed Aug 10 18:57:27 CST 2016 86C 1400000 324586 1300000 0 1200000 1 1100000 0 1000000 1 900000 4 800000 14956 700000 13695 600000 14640 500000 8596 400000 20132 Now testing 8 threads: execution time (avg/stddev): 121.1366/0.03 execution time (avg/stddev): 122.2647/0.05 execution time (avg/stddev): 121.9314/0.03 Small fan active: Now with fan active slight throttling happens when running 8 threads since then throttling temperature (85°C) is reached. So in case you want to run really heavy stuff (cpuminer for example) or are able to utilize GPU cores too, this combination of heatsink + fan is not enough. Fri Jan 1 17:06:30 CST 2016 39C 1400000 239 1300000 0 1200000 0 1100000 0 1000000 0 900000 0 800000 4 700000 0 600000 8 500000 59 400000 1342 Now testing 1 thread: execution time (avg/stddev): 19198877.6893/0.00 execution time (avg/stddev): 455.4465/0.00 execution time (avg/stddev): 455.5070/0.00 Wed Aug 10 22:23:01 CST 2016 49C 1400000 137019 1300000 0 1200000 0 1100000 0 1000000 0 900000 0 800000 4 700000 0 600000 14 500000 84 400000 1376 Now testing 2 threads: execution time (avg/stddev): 227.7335/0.01 execution time (avg/stddev): 227.7450/0.02 execution time (avg/stddev): 227.7162/0.00 Wed Aug 10 22:34:24 CST 2016 54C 1400000 205349 1300000 0 1200000 0 1100000 0 1000000 0 900000 0 800000 4 700000 0 600000 14 500000 84 400000 1378 Now testing 3 threads: execution time (avg/stddev): 151.8363/0.00 execution time (avg/stddev): 151.8241/0.02 execution time (avg/stddev): 151.8284/0.02 Wed Aug 10 22:41:59 CST 2016 58C 1400000 250909 1300000 0 1200000 0 1100000 0 1000000 0 900000 0 800000 4 700000 0 600000 14 500000 84 400000 1379 Now testing 4 threads: execution time (avg/stddev): 113.8751/0.01 execution time (avg/stddev): 113.9169/0.01 execution time (avg/stddev): 113.8720/0.01 Wed Aug 10 22:47:41 CST 2016 64C 1400000 285084 1300000 0 1200000 0 1100000 0 1000000 0 900000 0 800000 5 700000 0 600000 14 500000 84 400000 1380 Now testing 5 threads: execution time (avg/stddev): 91.1338/0.01 execution time (avg/stddev): 91.1068/0.01 execution time (avg/stddev): 91.1270/0.02 Wed Aug 10 22:52:15 CST 2016 68C 1400000 312430 1300000 0 1200000 0 1100000 0 1000000 0 900000 2 800000 5 700000 0 600000 14 500000 84 400000 1380 Now testing 6 threads: execution time (avg/stddev): 75.9181/0.02 execution time (avg/stddev): 75.9481/0.02 execution time (avg/stddev): 75.9080/0.01 Wed Aug 10 22:56:03 CST 2016 74C 1400000 335219 1300000 0 1200000 0 1100000 0 1000000 0 900000 2 800000 5 700000 0 600000 14 500000 84 400000 1380 Now testing 7 threads: execution time (avg/stddev): 65.0871/0.01 execution time (avg/stddev): 65.0932/0.01 execution time (avg/stddev): 65.0849/0.01 Wed Aug 10 22:59:18 CST 2016 81C 1400000 354756 1300000 0 1200000 0 1100000 0 1000000 0 900000 2 800000 5 700000 0 600000 14 500000 84 400000 1380 Now testing 8 threads: execution time (avg/stddev): 57.6927/0.01 execution time (avg/stddev): 60.0639/0.02 execution time (avg/stddev): 61.0930/0.02 Wed Aug 10 23:02:17 CST 2016 84C 1400000 370834 1300000 0 1200000 0 1100000 0 1000000 0 900000 2 800000 1825 700000 0 600000 14 500000 84 400000 1380 Adding 2nd fan: Last try using a 2nd fan blowing air directly from the side towards M3's heatsink (through the fins) and therefore helping the small fan on the heatsink. It's the same that can be seen in post #10 here. root@NanoPi3:~# cat /var/log/performance_2_fans.log Fri Jan 1 16:00:15 CST 2016 38C 1400000 244 1300000 0 1200000 0 1100000 0 1000000 0 900000 0 800000 4 700000 0 600000 0 500000 26 400000 1291 Now testing 1 thread: execution time (avg/stddev): 19198878.1969/0.00 execution time (avg/stddev): 455.4915/0.00 execution time (avg/stddev): 455.5227/0.00 Wed Aug 10 21:16:45 CST 2016 39C 1400000 136944 1300000 0 1200000 0 1100000 0 1000000 0 900000 0 800000 4 700000 0 600000 0 500000 26 400000 1309 Now testing 2 threads: execution time (avg/stddev): 227.7456/0.00 execution time (avg/stddev): 227.7520/0.00 execution time (avg/stddev): 227.7420/0.01 Wed Aug 10 21:28:08 CST 2016 45C 1400000 205278 1300000 0 1200000 0 1100000 0 1000000 0 900000 0 800000 4 700000 0 600000 0 500000 26 400000 1310 Now testing 3 threads: execution time (avg/stddev): 151.9040/0.02 execution time (avg/stddev): 151.8259/0.02 execution time (avg/stddev): 151.8836/0.01 Wed Aug 10 21:35:44 CST 2016 49C 1400000 250853 1300000 0 1200000 0 1100000 0 1000000 0 900000 0 800000 4 700000 0 600000 0 500000 26 400000 1310 Now testing 4 threads: execution time (avg/stddev): 113.8652/0.01 execution time (avg/stddev): 113.9173/0.02 execution time (avg/stddev): 113.8786/0.01 Wed Aug 10 21:41:25 CST 2016 54C 1400000 285029 1300000 0 1200000 0 1100000 0 1000000 0 900000 1 800000 4 700000 0 600000 0 500000 26 400000 1310 Now testing 5 threads: execution time (avg/stddev): 91.1168/0.02 execution time (avg/stddev): 91.0957/0.02 execution time (avg/stddev): 91.1245/0.02 Wed Aug 10 21:45:59 CST 2016 55C 1400000 312376 1300000 0 1200000 0 1100000 0 1000000 0 900000 1 800000 4 700000 0 600000 0 500000 26 400000 1310 Now testing 6 threads: execution time (avg/stddev): 75.9273/0.02 execution time (avg/stddev): 75.9355/0.01 execution time (avg/stddev): 75.9105/0.02 Wed Aug 10 21:49:47 CST 2016 60C 1400000 335165 1300000 0 1200000 0 1100000 1 1000000 0 900000 1 800000 4 700000 0 600000 0 500000 26 400000 1310 Now testing 7 threads: execution time (avg/stddev): 65.0825/0.01 execution time (avg/stddev): 65.0925/0.01 execution time (avg/stddev): 65.0917/0.01 Wed Aug 10 21:53:02 CST 2016 64C 1400000 354703 1300000 0 1200000 0 1100000 1 1000000 0 900000 1 800000 4 700000 0 600000 0 500000 26 400000 1310 Now testing 8 threads: execution time (avg/stddev): 56.9910/0.01 execution time (avg/stddev): 57.0447/0.02 execution time (avg/stddev): 57.0200/0.01 Wed Aug 10 21:55:54 CST 2016 69C 1400000 371821 1300000 0 1200000 0 1100000 2 1000000 2 900000 1 800000 4 700000 0 600000 0 500000 26 400000 1489 No throttling occured and even with full load on 8 CPU cores SoC temp not exceeding 70°C. So this is good news since you could combine heatpad + heatsink (even FA's own) with one large/silent fan that blows enough air over the heatsink's surface (through the fins!) and are able to make use of the full octa-core power without annoying noise. Look at the picture with the cardboard roll above to get the idea (tested with small fan removed -- while running sysbench on all 8 cores SoC temperature never exceeded 78°C so using a large fan with controlled airflow you can remove the small fan on the heatsink and remain below throttling tresholds) BTW: M3 was powered through the 4-pin header (using FriendlyARM's convenient PSU-ONECOM board). Do not even think about powering it through Micro USB since this won't work with these workloads The script used to do these measurements (called from /etc/rc.local) looks like this root@NanoPi3:~# cat /usr/local/bin/check-perf.sh #!/bin/bash for i in 1 2 3 4 5 6 7 8 ; do if [ -f /var/log/performance.log ]; then echo -e "\n\n$(date) $(cat /sys/class/hwmon/hwmon0/device/temp_label)C\n$(cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state)\nNow testing $i threads:" >>/var/log/performance.log else echo -e "$(date) $(cat /sys/class/hwmon/hwmon0/device/temp_label)C\n$(cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state)\nNow testing 1 thread:" >>/var/log/performance.log fi sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=${i} | grep "execution time" >>/var/log/performance.log sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=${i} | grep "execution time" >>/var/log/performance.log sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=${i} | grep "execution time" >>/var/log/performance.log done echo -e "\n\n$(date) $(cat /sys/class/hwmon/hwmon0/device/temp_label)C\n$(cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state)" >>/var/log/performance.log
tkaiser Posted August 28, 2016 Posted August 28, 2016 Since I like this board more and more another round of tests. NanoPi M3 is equipped with a GBit Ethernet interface obviously using the stmmac implementation combining an internal GbE MAC implementation in the SoC with RTL8211E external GBit PHY. First turn FriendlyARM's Debian distro into a server OS: sudo systemctl disable lightdm (that's all, if you want to reclaim space on the SD card you might want to deinstall the GUI stuff but that's not important for performance behaviour) Let's use iperf first to do some passive benchmarking. M3 and Client (x86 host capable of maxing out its GBit interfaces connected to lab switch): M3 --> Client: 730 Mbits/sec Client --> M3: 640 Mbits/sec root@armbian:/var/git/Armbian# iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.83.115 port 5001 connected with 192.168.83.113 port 52681 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 845 MBytes 706 Mbits/sec [ 5] local 192.168.83.115 port 5001 connected with 192.168.83.113 port 52682 [ 5] 0.0-10.0 sec 841 MBytes 702 Mbits/sec [ 4] local 192.168.83.115 port 5001 connected with 192.168.83.113 port 52683 [ 4] 0.0-10.0 sec 862 MBytes 721 Mbits/sec [ 5] local 192.168.83.115 port 5001 connected with 192.168.83.113 port 52684 [ 5] 0.0-10.1 sec 869 MBytes 726 Mbits/sec [ 4] local 192.168.83.115 port 5001 connected with 192.168.83.113 port 52685 [ 4] 0.0-10.0 sec 849 MBytes 710 Mbits/sec [ 5] local 192.168.83.115 port 5001 connected with 192.168.83.113 port 52686 [ 5] 0.0-300.0 sec 25.5 GBytes 730 Mbits/sec root@NanoPi3:~# iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 192.168.83.113 port 5001 connected with 192.168.83.115 port 60970 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 736 MBytes 616 Mbits/sec [ 5] local 192.168.83.113 port 5001 connected with 192.168.83.115 port 60972 [ 5] 0.0-10.0 sec 745 MBytes 624 Mbits/sec [ 4] local 192.168.83.113 port 5001 connected with 192.168.83.115 port 60974 [ 4] 0.0-10.0 sec 761 MBytes 637 Mbits/sec [ 5] local 192.168.83.113 port 5001 connected with 192.168.83.115 port 60976 [ 5] 0.0-10.0 sec 724 MBytes 606 Mbits/sec [ 4] local 192.168.83.113 port 5001 connected with 192.168.83.115 port 60978 [ 4] 0.0-10.0 sec 748 MBytes 626 Mbits/sec [ 5] local 192.168.83.113 port 5001 connected with 192.168.83.115 port 60980 [ 5] 0.0-300.0 sec 22.4 GBytes 643 Mbits/sec Now let's take a closer look what happened by using htop, limiting maximum cpufreq to the lower frequency so CPU cores remain at 400 MHz all the time and looking at /proc/interrupts and monitoring cpu clockspeeds: while true ; do cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq; sleep 1; done 5 minutes later: cpufreq matters, when running only at 400 MHz instead of 1400 MHz throughput is affected (so we must look at cpufreq scaling behaviour in the next step!) eth0 IRQs are spread accross all 8 CPU cores with this kernel (not that good, it's known that a fixed IRQ affinity helps on SMP systems with Ethernet loads) When testing client --> M3 performance iperf runs single threaded and maxes out 1 CPU core. This is obviously a limiting factor When testing M3 --> client performance iperf activity is spread accross all CPU cores. This leads to cpufreq remaining most of the times on the lowest CPU clockspeed (400 MHz) with interactive cpufreq governor which is obviously a limiting factor To address the last issue switching to performance governor would be a 'solution' or looking into interactive taking notice of this sort of activity and switching to maximum clockspeed. So let's try to improve Ethernet IRQ handling and also create an artificial bottleneck and see what happens. Only change made is this: echo 2 >/sys/class/net/eth0/queues/rx-0/rps_cpus Now iperf and iperf3 performance increases a lot. We're exceeding already 900 Mbits/sec in both directions: root@NanoPi3:~# iperf3 -c 192.168.83.115 -w 512k -l 512k Connecting to host 192.168.83.115, port 5201 [ 4] local 192.168.83.113 port 60061 connected to 192.168.83.115 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 108 MBytes 904 Mbits/sec 0 273 KBytes [ 4] 1.00-2.00 sec 110 MBytes 924 Mbits/sec 0 273 KBytes [ 4] 2.00-3.00 sec 108 MBytes 909 Mbits/sec 0 273 KBytes [ 4] 3.00-4.02 sec 109 MBytes 896 Mbits/sec 0 273 KBytes [ 4] 4.02-5.01 sec 105 MBytes 891 Mbits/sec 0 273 KBytes [ 4] 5.01-6.02 sec 110 MBytes 912 Mbits/sec 0 273 KBytes [ 4] 6.02-7.00 sec 106 MBytes 903 Mbits/sec 0 273 KBytes [ 4] 7.00-8.00 sec 109 MBytes 917 Mbits/sec 0 273 KBytes [ 4] 8.00-9.00 sec 108 MBytes 909 Mbits/sec 0 273 KBytes [ 4] 9.00-10.00 sec 105 MBytes 883 Mbits/sec 0 273 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 1.05 GBytes 905 Mbits/sec 0 sender [ 4] 0.00-10.00 sec 1.05 GBytes 905 Mbits/sec receiver root@armbian:/var/git/Armbian# iperf3 -s ----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- Accepted connection from 192.168.83.113, port 60058 [ 5] local 192.168.83.115 port 5201 connected to 192.168.83.113 port 60059 [ ID] Interval Transfer Bandwidth [ 5] 0.00-1.00 sec 102 MBytes 851 Mbits/sec [ 5] 1.00-2.00 sec 107 MBytes 896 Mbits/sec [ 5] 2.00-3.00 sec 112 MBytes 937 Mbits/sec [ 5] 3.00-4.00 sec 106 MBytes 893 Mbits/sec [ 5] 4.00-5.00 sec 111 MBytes 927 Mbits/sec [ 5] 5.00-6.00 sec 106 MBytes 891 Mbits/sec [ 5] 6.00-7.00 sec 107 MBytes 901 Mbits/sec [ 5] 7.00-8.00 sec 105 MBytes 885 Mbits/sec [ 5] 8.00-9.00 sec 110 MBytes 925 Mbits/sec [ 5] 9.00-10.00 sec 110 MBytes 922 Mbits/sec [ 5] 10.00-10.06 sec 7.11 MBytes 941 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 5] 0.00-10.06 sec 1.06 GBytes 904 Mbits/sec 0 sender [ 5] 0.00-10.06 sec 1.06 GBytes 903 Mbits/sec receiver Now the most important thing to notice: 900 Mbits/sec reported by iperf are enough if we start to think about how synthetic benchmarks correlate with reality. Using iperf with default window sizes is a joke (way too small) so by further tuning this stuff also iperf performance numbers will improve. Are these better numbers important or even good? Not at all since real world applications behave differently (see here for an example how Windows' Explorer or OS X Finder tune their settings dynamically to get the idea how wrong it is to use iperf with default window sizes). Also iperf is limited by being bound to one CPU core when running as server task. So what did we achieve with this single line added to /etc/rc.local: echo 2 > /sys/class/net/eth0/queues/rx-0/rps_cpus We let all Ethernet RX IRQs be processed on a single CPU core (better performance) and also created a bottleneck which let the interactive governor behave 'better' and let cpufreq immediately jump from 400 MHz to 1400 Mhz which helps with benchmark numbers. What does this mean for real workloads running a web server or a NAS daemon? There the overall higher CPU activity would for sure lead to cpufreq scaling jumping to the maximum 1400 MHz and inventing a bottleneck as above might even negatively affect performance. This is something that has to be tested with real world workloads. Just better benchmark scores are not sufficient. What do we learn from that? Passive benchmarking as it's done by most people does only create numbers without meaning and is crap. Always. By looking at what's happening we are able to identify the bottlenecks that prevent synthetic benchmarks producing nice numbers. We now know how to improve numbers for this specific benchmark, we know that the tool in question sucks somehow (being maxed out by acting single threaded in a specific mode) and that the 'fix' for better benchmark scores might be counterproductive for real world workloads. So now we know that Gbit Ethernet on this board performs very well, we know that by influencing CPU affinity of Ethernet RX IRQs we affect performance in 2 different ways (better IRQ processing and better cpufreq scaling of interactive governor) and most importantly we know where to look at when more serious network testing starts with real world workloads and not silly stuff like iperf/iperf3 with default settings. BTW: Openend issue at FriendlyARM to let them know: https://github.com/friendlyarm/linux-3.4.y/issues/5 Update: I tested static delivery of web pages with nginx, this testfile with weighttp using this command line from my x86 box weighttp -n 100000 -c 20 -t 4 -k http://$target:80/testfile.js and with/without tweaking /sys/class/net/eth0/queues/rx-0/rps_cpus: No differences (result variation below 5 percent difference can be regarded identical). It's 4330 req/s on average (Pine64+ with BSP kernel, no network tuning at all and GbE connected to the same Gbit switch gets a +4800 req/s score, Pine64 testing my x86 host gets 4950 req/s but results are constant 496 req/s when one of the hosts is forced to use 100 MBits/sec network -- ethtool -s eth0 speed 100 duplex full -- so this test is bullshit anyway since it's not a webserver test but a throughput test similar to iperf) root@armbian:/usr/local/src/weighttp/build/default# ./weighttp -n 100000 -c 20 -t 4 -k 192.168.83.113:80/testfile.js weighttp 0.4 - a lightweight and simple webserver benchmarking tool starting benchmark... spawning thread #1: 5 concurrent requests, 25000 total requests spawning thread #2: 5 concurrent requests, 25000 total requests spawning thread #3: 5 concurrent requests, 25000 total requests spawning thread #4: 5 concurrent requests, 25000 total requests progress: 10% done progress: 20% done progress: 30% done progress: 40% done progress: 50% done progress: 60% done progress: 70% done progress: 80% done progress: 90% done progress: 100% done finished in 23 sec, 28 millisec and 421 microsec, 4342 req/s, 100135 kbyte/s requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 2361295050 bytes total, 25295050 bytes http, 2336000000 bytes data root@armbian:/usr/local/src/weighttp/build/default# ./weighttp -n 100000 -c 20 -t 4 -k 192.168.83.113:80/testfile.js weighttp 0.4 - a lightweight and simple webserver benchmarking tool starting benchmark... spawning thread #1: 5 concurrent requests, 25000 total requests spawning thread #2: 5 concurrent requests, 25000 total requests spawning thread #3: 5 concurrent requests, 25000 total requests spawning thread #4: 5 concurrent requests, 25000 total requests progress: 10% done progress: 20% done progress: 30% done progress: 40% done progress: 50% done progress: 60% done progress: 70% done progress: 80% done progress: 90% done progress: 100% done finished in 23 sec, 208 millisec and 782 microsec, 4308 req/s, 99356 kbyte/s requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 2361295050 bytes total, 25295050 bytes http, 2336000000 bytes data 1
tkaiser Posted August 29, 2016 Posted August 29, 2016 Some more performance numbers As we've already seen increasing Gbit Ethernet throughput to the max was a simple echo 2 > /sys/class/net/eth0/queues/rx-0/rps_cpus added to /etc/rc.local since cpufreq governor behaviour negatively influenced throughput numbers. It will be interesting what has to be tweaked to get also lowest latency since one possible use case for this beefy board is to run cluster workloads. Let's try cpuminer first: Grab https://sourceforge.net/projects/cpuminer/files/pooler-cpuminer-2.4.5.tar.gz/download and then do sudo apt-get install libcurl4-gnutls-dev ./configure CFLAGS="-O3 -mfpu=neon" make ./minerd --benchmark I was not able to run the benchmark with my setup at 1.4 GHz, 1.3 GHz was the maximum since then already 85°C have been reached and with the M3's kernel somewhat inefficient throttling starts (only switching between very low clockspeeds and 1400 MHz leading to lower khash/s values compared to fixing max clockspeed to 1.3GHz). Running at 1300 MHz on 8 cores I got a whopping 9.96 khash/s score with this NEON optimized cpuminer version. Increasing maximum cpufreq slowly was necessary since otherwise the board simply deadlocked (I would suspect the PSU is simply overloaded when cpuminer starts on 8 cores with 1400 MHz) So with better cooling +10.6 khash/s should be possible. As a comparison: with quad-core H3 (Orange Pi PC) at 1296 MHz we get 2.35 khash/s and with quad-core A64 (Pine64+ overclocked/overvolted to 1296 MHz) we get 3.9 khash/s. Now Linpack/OpenBlas with NEON optimizations: I followed these instructions: https://github.com/deater/performance_results/tree/master/build_instructions With a freshly built Linpack with NEON optimizations I thought I start with 800 MHz cpufreq: 7.476 GFLOPS, then using 900 MHz I got 8.227 GFLOPS and at 1000 MHz the M3 simply deadlocked -- most probably a sign that my PSU is too weak since at 900 MHz SoC temperature only reached 70°C so it was not a thermal issue (this Linpack version is known to power off SBCs with insufficient power supply 100 percent reliable). That means in case one uses a better PSU than mine, a more efficient heatsink+fan exceeding 12 GFLOPS should be possible but at the cost of insanely high consumption and huge efforts for cooling and power supply As a comparison: with quad-core H3 (Orange Pi PC) at 1296 MHz we get 1.73 GFLOPS and with quad-core A64 (Pine64+ overclocked/overvolted to 1296 MHz) we get 3.4 GFLOPS while a RPi 3 running at just 1.2 GHz achieves 3.6 GFLOPS.
tkaiser Posted August 30, 2016 Posted August 30, 2016 Today a look at IO and disk performance / features / capabilities The SoC used on NanoPi M3 (applies to M2 and NanoPC-T2/T3 too) has one USB OTG port available through the Micro USB connector and one host port connected to an internal USB hub (Genesys Logic GL852G). All ports are USB 2.0 and that means that all USB receptacles and the 2 USB ports available on a pin header are behind the internal USB hub and have to share bandwidth A quick test with an ordinary notebook disk confirms that USB performance on the host ports (behind the hub) is not that good, I got sequential speed results at around ~27 MB/s. Also the kernel only supports USB mass storage mode and not UASP so better no expectations to get high random IO numbers when connecting a SSD in an UASP capable enclosure. So let's try out the OTG port available through the Micro USB jack (obviously you need to power the board then through the 4 pin header next to the 40 pin GPIO header which is recommended anyway!). The OTG port is in device mode by default so let's take a short adapter cable and test the disk again. Good news, sequential USB performance on the Micro USB port is excellent (I used only hdparm since on the disk are HFS+ filesystems so with a more serious benchmark performance will be somewhere between 35 and 37 MB/s), also it's possible to query the disk there with SMART and even fire up SMART selftests, but using hdparm to control standby/sleeping behaviour didn't work and trying to spin the disk down immediately (hdparm -Y /dev/sda) ended with a kernel panic root@NanoPi3:~# lsusb Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 001 Device 002: ID 05e3:0610 Genesys Logic, Inc. 4-port hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 003: ID 13fd:1840 Initio Corporation INIC-1608 SATA bridge Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub root@NanoPi3:~# cat /sys/bus/platform/devices/dwc_otg/otg_mode device root@NanoPi3:~# hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 118 MB in 3.03 seconds = 39.01 MB/sec root@NanoPi3:~# smartctl -t short /dev/sda smartctl 6.4 2014-10-07 r4002 [armv7l-linux-3.4.39-s5p6818] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Tue Aug 30 16:28:16 2016 Use smartctl -X to abort test. root@NanoPi3:~# smartctl -a /dev/sda smartctl 6.4 2014-10-07 r4002 [armv7l-linux-3.4.39-s5p6818] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Momentus SpinPoint M8 (AF) Device Model: ST1000LM024 HN-M101MBB Serial Number: S2RXJ9BDA05287 LU WWN Device Id: 5 0004cf 20b5c5a31 Firmware Version: 2AR10002 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 3.0, 3.0 Gb/s (current: 1.5 Gb/s) Local Time is: Tue Aug 30 16:29:38 2016 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART Status command failed: scsi error medium or hardware error (serious) SMART overall-health self-assessment test result: PASSED Warning: This result is based on an Attribute check. General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (13380) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 223) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 0 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 086 086 025 Pre-fail Always - 4457 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 203 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 91 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 40 191 G-Sense_Error_Rate 0x0022 252 252 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 064 058 000 Old_age Always - 30 (Min/Max 21/42) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 3 223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 501 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 90 - SMART Selective self-test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Completed [00% left] (0-65535) 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Please note: USB IRQs are distributed accross all CPU cores by default but love cpu0 for unknown reasons (so there is a chance that by controlling IRQ distribution further performance improvements can be made): 81: 7366 1527 1764 2337 3630 2880 2980 3620 GIC dwc_otg, dwc_otg_pcd, dwc_otg_hcd:usb2 82: 1038 576 387 944 617 472 410 658 GIC ehci_hcd:usb1, ohci_hcd:usb3 I also used performance cpufreq governor since I only tested with hdparm (which should not be used as benchmark since it's only able to read and 'benchmark' execution is way too short!) and when using interactive the benchmark results would've been tampered to much switching from the lowest clockspeed to the upper. For real world workloads interactive is pretty fine since within ~0.5 seconds cpufreq will be increased to the max. Using 2 USB disks in a RAID-0 or RAID-1 should work when one disk is connected to Micro USB and the other to a normal host port. IMO only RAID-0 to increase performance (expect ~50MB/s which is ok-ish given the GbE performance) would make some sense since disk redundancy is not that useful on a SBC. Setting up a RAID-0 when connecting two disks to the both USB host ports is useless of course since ports have to share bandwidth. TL;DR: USB performance on the host ports is somewhat limited (to be avoided anyway since all ports have to share bandwidth) but using disks connected to the Micro USB connector works by default, with high performance and rather full feature set (at least SMART is possible if the USB enclosure also supports it!). Since UASP support is missing due to a horribly outdated kernel version you shouldn't expect wonders regarding random IO speeds (but this is something where only a few sunxi SoCs can shine since H3, A20 and A64 support UASP with USB 2.0 when running on mainline kernel)
constantius Posted August 30, 2016 Posted August 30, 2016 Hi short question.: Will you make Armbian for Nano pi M2 and M3 or not.... best regards Radek
tkaiser Posted August 30, 2016 Posted August 30, 2016 short question.: Will you make Armbian for Nano pi M2 and M3 or not.... Short answer already given yesterday -- but in the wrong thread: http://forum.armbian.com/index.php/topic/1917-armbian-running-on-pine64-and-other-a64h5-devices/#entry14698 I don't know what Igor thinks (he has also 2 M3 on his desk) and at least I won't start with M3. I only had a look into a few performance tunables and here the board looks quite nice so we know what to expect from an Armbian port or from this SBC in general. But IMO FriendlyARM's OS images are pretty nice, support all their peripherals (displays, Matrix stuff) out of the box in a perfect way and all you've to do to get headless operation mode would be to disable start of lightdm daemon (and a few performance tweaks are listed in this thread already). So there's not much to gain from an Armbian port anyway. And I fear if we would start supporting M2/M3 (and therefore also NanoPC-T2/T3) we would have to focus on headless usage since I would find it frustrating releasing desktop images that suck compared to FriendlyARM's since not all of the stuff is working (for example pairing one of the touchscreen LCDs from FA with M2/M3 is just connecting the ribbon cable, everything else works out of the box)
constantius Posted August 30, 2016 Posted August 30, 2016 "but Friendy ARM OS images are pretty nice" Nano M2 image ubuntu 15.04 after update to 15.10 does not work, Kali linux no sound, Debian 8 suppoert 720p resolution. To change resolution 1080p please compile kernel - doesnot work. Android - works . Nano M3 the same as Nano M2 but without Kali and Ubuntu which doesnot exist
eternalWalker Posted August 30, 2016 Author Posted August 30, 2016 ...and aditionally..."Matrix " is not working (on M3), the kernel is only 32-bit
tkaiser Posted August 30, 2016 Posted August 30, 2016 "but Friendy ARM OS images are pretty nice" Nano M2 image ubuntu 15.04 after update to 15.10 does not work, Kali linux no sound, Debian 8 suppoert 720p resolution. To change resolution 1080p please compile kernel - doesnot work. Android - works . Nano M3 the same as Nano M2 but without Kali and Ubuntu which doesnot exist Well, then you won't gain that much from headless Armbian images anyway. I really don't care about display output and if I would do I would use the LCDs since I've never seen board + lcd working that flawlessly out of the box. After dealing too long with OS images from other Chinese vendors (especially those from 'Team BPi') my expectations weren't that high but dealing with their Debian images for M1 and M3 so far was impressive. Everything worked nearly perfect (only upgrading from Jessie to Stretch on their M3 image failed but hey, that's the 'testing' distribution so why should I expect that this works without problems?) "Matrix " is not working (on M3), the kernel is only 32-bit I was talking about hardware for which they provide comprehensive tutorials to get up and running and the necessary code. What are you talking about?
eternalWalker Posted August 30, 2016 Author Posted August 30, 2016 soft ... A problem from FriendlyArm Forum: The question: "Hi Devs,Please test the M3 with the latest matrix from git.I am getting failures on initialising PMW & GPIORegards" And the answer (FATechsupport) "Unfortunately the Matrix code may not work with M2 for now and we haven't tested the code yet. We only tested the code for M1/NEO and Pi2/Fire/M2/T2" Is not it great answer?
Recommended Posts