[NanoPi M3] Cheap 8 core (35$)


eternalWalker
 Share

7 7

Recommended Posts

Armbian is a community driven open source project. Do you like to contribute your code?

Code excerpted from the original build script just to show you could try to DD the blobs to /dev/sdX ($IMG_NAME), just don't know what would be the Params.

I think we need someone  to step in here and give some guidance. Maybe @ejolson or @igor or some  experienced odroid XU4 user here. Sorry.

Link to post
Share on other sites

Code excerpted from the original build script just to show you could try to DD the blobs to /dev/sdX ($IMG_NAME), just don't know what would be the Params.

I think we need someone  to step in here and give some guidance. Maybe @ejolson or @igor or some  experienced odroid XU4 user here. Sorry.

 

@lex

At least I managed to get a full image generated from the info and repositories mentioned in the thread (artik7 though.. "./release.sh -c config/artik710.cfg -m").

I mean performed succesful compilation and generation of u-boot, kernel, rootfs and all of them combined in a image specifically for SD. The way is there in the driver developer's guide.

But when booted did nothing, not even uboot prompt on M3's tty/uart port.

Either way though that means the necessary components are there and just sit and waiting for some armbian-team guru ( @Igor , @tkaiser ) to come and see how this can be tailored for the M3 ..just sayin..

 

Christos

Link to post
Share on other sites

@lex

At least I managed to get a full image generated from the info and repositories mentioned in the thread (artik7 though.. "./release.sh -c config/artik710.cfg -m").

I mean performed succesful compilation and generation of u-boot, kernel, rootfs and all of them combined in a image specifically for SD. The way is there in the driver developer's guide.

But when booted did nothing, not even uboot prompt on M3's tty/uart port.

Either way though that means the necessary components are there and just sit and waiting for some armbian-team guru ( @Igor , @tkaiser ) to come and see how this can be tailored for the M3 ..just sayin..

 

Christos

 

Just occurred to me if you could create a new compatible initrd (64bit), write this new u-boot and kernel to SD card using dd command with the same Blobs from original 32bit kernel, could this work?
 
Any thoughts?
Link to post
Share on other sites

I purchased a nano-pi m3 to play with a cheap 8 core arm board and figured I would share my experiences.  Much of what I found matches what is posted on this thread but not all.

 

I have not set up a dedicated SBC bench yet so I bring boards up in my mobile area since it is the only place with a spare HDMI cable.  I use a good power supply with a long, crappy micro usb cable.  A pine-64+ lives there under full load in a constant state of thermal limit.  Let's see how long it takes to burn it out.  But that is a different post, this one is about the M3.

 

Initial bringup was a breeze.  Mount the optional HSF, flash the sdcard and plug it all in.  It took three tries to get the fan plug on the right way as usual.  Only two ways to plug it in but it always takes three tries.  Not sure why.

 

The board came up just fine, idle temperatures were fine.  Everything looked good.  So down comes multi-miner for a bit of a stress test.  Fire it up and temperatures quickly peak and it goes into thermal limit.  Less than a minute later it crashes.  I repeat this a few times before I see a pattern.  Not needing the HDMI any more I move the board to my SBC pile with an Anker 60 watt, 6 port usb charger and short, high power USB charging cables.  I inserted a USB doctor between the charger and the cable.  I started it up and connected via SSH.  Idle power consumption was a bit higher than the other SBC's.  I start the miner and temperatures quickly peak and the board goes into thermal limit.  USB doctor reports 1.5 amps of current draw.

 

Three and a half days later it is still running.  I have no problems running the NanoPI M3 from the micro usb cable when I use a sensible power supply and sensible cable.  It crashes and burns about as you would expect if you use a silly wimpy little cable you found in your sock drawer that you love since it is nice and long and gets from the wall wart behind your bed to your night stand where you charged your phone.

root@NanoPi3:~# w
 05:06:33 up 3 days, 16:51,  2 users,  load average: 8.87, 8.88, 9.02
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
fa       :0       :0               Wed12   ?xdm?   3days  0.24s /usr/bin/lxsession -s LXDE -e LXDE
root     pts/0  x.x.x.x       Wed12    1.00s  0.39s  0.01s w
root@NanoPi3:~# cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state
1400000 29541406
1300000 0
1200000 0
1100000 12
1000000 0
900000 3
800000 1226558
700000 1218579
600000 352
500000 239
400000 9521
My main concerns with this board are that is fails to gracefully slow down.  It is alternating between 700, 800 and 1400 MHz.  The pine64 I am abusing has settled down to alternating between 960 and 1080 MHz.  This feels a bit more sensible than what the M3 is doing.  I know it is only software to fix it so I have started down the path of building my own image.  I have set up the cross tool chain and built everything so now all I have to do is tweak it..
 
I am a bit disappointed with the FA provided HSF.  The fan is loud and ineffective and the processor overheats.  I suppose this is better than the HSF offered with the Pine64 (none).  The heat sinks that come with the Odroid's are much better.  My C2 does thermal limit a bit running the miner but that is with the board in the optional plastic box.
 
Probably my biggest annoyance is the lack of a 64 bit OS.  I really wanted the 64 bit extensions for mining / boinc.  Though, given the weak cooling I suspect the lower performing kernel is a blessing in disguise.
Link to post
Share on other sites

Sure, but my question was about the M3.

 

Then an extra question, is the fan spinning full speed on the NanoPi A64 too? Or is it controlled based on the temperature?

Edited by tkaiser
Moved from NanoPi M64 thread here since off-topic there
Link to post
Share on other sites

Sure, but my question was about the M3.

 

Then an extra question, is the fan spinning full speed on the NanoPi A64 too? Or is it controlled based on the temperature?

 

@zikzak

For M3, you have to connect the fan power cable to the I2S connector, Black->pin1, Red->pin2 (FA's relevant M3 fan pictures are in error..)

this way the fan is working at speeds controlled by temp CPU giving 3.3V and works much better.

Link to post
Share on other sites

@zikzak

For M3, you have to connect the fan power cable to the I2S connector, Black->pin1, Red->pin2 (FA's relevant M3 fan pictures are in error..)

this way the fan is working at speeds controlled by temp CPU giving 3.3V and works much better.

Hi,

    About how to connect the fan to the M3 , there is a photo you can see :

M3-heat-sink_en_08.jpg

Link to post
Share on other sites

Hi,

    About how to connect the fan to the M3 , there is a photo you can see :

M3-heat-sink_en_08.jpg

 

@friendlyarm

I have to tell you that this image is mis-leading. There is no populated 2-pin pinheader right behind the spk/mic jack. And those two pins are not power..

The image you show has errors as I said before.

You can try it yourself.

 

In order for the fan to work, it has to be either connected for optimal speed/noise in the I2S pins1+2 or in the main 40-pin connector.

 

@friendlyarm

You have to corret your image as it made quite some trouble to many M3 heatsink users.

 

 

 

P.S.

I made those remarks based on the nickname and the assumption that this nickname ihas some FA direct-affiliation. If not, pls ignore the comments.

Link to post
Share on other sites

Indeed I never got the fan working on these pins behind the jack connector.

At some point I used the GPIO (pins 4 and 6 if I remember well) but the fan was blowing full speed and made way too much noise.

Eventually, I destroyed the fan as it doesn't spin any more.

Link to post
Share on other sites

@tkaiser

Thanks for housekeeping here.

 

 

Attached are a couple of screenshots on how I have connected the fan on the board providing to it 3.3V thus having a bit lower noise level than connecting it to 5V.

 

Friendlyarm's images were shown to connect on a pair of unpopulated pins behind the audio jack whereas those pins are just gpio related and provide no power at all. that is the error I was talking about and hope that friendlyarm will eventually figure it out and rectify it on their photos.

 

 

 

 

post-2589-0-14280300-1484664783_thumb.jpg

post-2589-0-65561700-1484664802_thumb.jpg

Link to post
Share on other sites

Friendlyarm's images were shown to connect on a pair of unpopulated pins behind the audio jack whereas those pins are just gpio related and provide no power at all.

 

Nope, their instructions/image is for PCB rev 1605 or higher (check the bottom PCB side for your board revision). There VDD_5V is available on the 2 pin header. And it's still possible to use 3.3V on the I2S header as well (but why, the fan is not sufficient even with 5V when M3 runs under full load). But on the most recent PCB revision that ruined WiFi due to no IPX connector any more it's perfectly ok to use the 2 pin header (for whatever purpose, AFAIK DC-IN is routed directly).

Link to post
Share on other sites

Thanks for mentioning that Thomas.

 

BTW my board shows 1604 underneath.

 

Yet, in their wiki pages, Friendlyarm does not mention any 1605 revision and for their latest (until today 17/Jan/2017) they have only the 1604 schematic!

-> http://wiki.friendlyarm.com/wiki/index.php/NanoPi_M3

 

How did you came accross that info you refer to? Is it on their product or wiki pages?

Link to post
Share on other sites

I have two 1605 boards and can confirm the two pins behind the headphone jack are perfect for the fan.

 

I'm having an issue getting CAM500A to work on the M3. I think I had it working on another M3 (that I no longer have contact with) but cannot get it working now on a different M3.

 

Does anyone know if it requires CAM500B or is it possible for CAM500A to work?

 

Is it possible that Ubuntu core would be different to Debian in terms of setting up the camera, despite using an identical kernel for both? I also had trouble with it detecting my temperature sensor on GPIO 104.

Link to post
Share on other sites

I'm having an issue getting CAM500A to work on the M3. I think I had it working on another M3 (that I no longer have contact with) but cannot get it working now on a different M3.

 

Does anyone know if it requires CAM500B or is it possible for CAM500A to work?

 

 

I tested long ago CAM500A and it worked with Android on M2 and M3, but FA has this on their wiki for the M3:

Check this link [2] for details about the camera CAM500B we tested in this case

link: http://wiki.friendlyarm.com/wiki/index.php/Matrix_-_CAM500B

 

Maybe they changed something on the new board revision. so all the boards works with CAM500B now, better check with them.

I have halted testing camera on Ubuntu due to some linking problem i had with the encode/decode blob, problem is at my side.

Link to post
Share on other sites

I tested long ago CAM500A and it worked with Android on M2 and M3, but FA has this on their wiki for the M3:

Check this link [2] for details about the camera CAM500B we tested in this case

 

That's what I thought. I made a customised driver for CAM500A on Debian to support more resolutions and it was working fine in Android.

 

This camera (also a CAM500A) is not working in Android so I assume it is faulty. It takes a second to give a response and it is just a purple image with a few white dots.

Link to post
Share on other sites

Thanks all for a lot of good information in this thread, you saved me a lot of time and possible future headache, for example I happened to stumble across this thread and saw the lack of support for swap files in the FA kernel, which I was about to try after Qt 5.7 compile failed due to exhausted memory. As a result, I've had to retroactively make my Qt 5.7 apps compatible with 5.3.2, since I was able to successfully compile without running out of RAM. 

 

Just a quick question: searching here for ""nanopi m3" & 1080p" I couldn't find anything... Just wondering if anyone has compiled a kernel for the M3 that defaults to 1080p HDMI and has an image they can share. Bonus points if it has Qt 5.7 and/or swap file support so I can enable a swap file to compile Qt on my own, which I would then share of course :)

 

Am I correct in assuming after reading all 4 pages of this thread that Armbian was never ported to this board? 

 

Anything would be greatly appreciated. 

Link to post
Share on other sites

On 5/11/2017 at 0:36 PM, cbartik said:

Just a quick question: searching here for ""nanopi m3" & 1080p" I couldn't find anything... Just wondering if anyone has compiled a kernel for the M3 that defaults to 1080p HDMI and has an image they can share. Bonus points if it has Qt 5.7 and/or swap file support so I can enable a swap file to compile Qt on my own, which I would then share of course :)

Here you have: nanopi-m3_1920x1080P.png

 

Lubuntu with 1080P, no QT,  just raw image.  If you are interested and want to get your hands dirty i can manage to provide my POC image, will try to do it during weekend... Be aware it will be in a very unusual way... no fastboot way.

 

Link to post
Share on other sites

I got one of these running last night, and it just edges out the XU4 for absolute throughput on my workload, and takes a fairly clear lead on perf/$. The kernel comes without cifs support, so I had to compile up cifs and md4 as kernel modules before I could mount my filespace.

 

The board I received was marked 1610, had the fan header populated (and labelled), had the socket for an external wifi antenna, and came with one supplied in the box, so it looks like FA are listening to feedback. I really hope that a 64 bit kernel is possible on this board, as that would add to its usefulness as a test mule. The board is so small it's tempting to do a little farm box with somewhere between 8 and 16 of them stacked together just for the sheer cuteness factor.

Link to post
Share on other sites

29 minutes ago, James Kingdon said:

it looks like FA are listening to feedback

 

They definitely do. Few weeks ago we had a lot of conversation around IC choices on a specific product (NAS HAT -- it took them just one month to phase out the 1st version with a much better chip starting with hardware revision 1.2) and on kernel choices. Now they started to rely on mainline kernel for their H3 and H5 boards (easy choice since almost all the necessary patches were already floating around) so let's hope @friendlyarm enters next level: actively supporting their own hardware and sending patches for initial support upstream and also becoming maintainers of their own hardware.

 

I did not pay that much attention regarding kernel support for the SoC here but AFAIK some months ago Samsung provided something based on a more recent kernel for similiar IoT modules using the same engine?

Link to post
Share on other sites

I did a little more with the board last night. The biggest problem for my use case is the 1G ram, so I was a bit disturbed to find that 150M is reserved for the GPU. I tried disabling ION and CMA but couldn't get it to compile, so as a fallback I cut the size of the reserved block. The value is in the config as CONFIG_ION_NXP_CONTIGHEAP_SIZE. I've reduced to 64M which at least boots and runs, but without any hdmi output. X11 crashes if you try and start it (and for some reason lightdm goes into a busy loop for a couple of minutes until the whole graphics stack shuts down), so I configured the board to start-up in text mode. Since I'm not getting anything out of the hdmi port now I plan on reducing the reserved space further until I find the minimum needed to start up, but this is already looking a lot better than default:

 

KiB Mem:    961856 total,   205952 used,   755904 free,    16952 buffers
 

I didn't like the look of the factory HSF, so I added my normal set of heat sinks and a 40mm 12v fan. That brought the idle temperature down from about 50C to 31C and kept the full (non-simd, non-gpu) load temperature in the low 50s. Feeling pleased with that I thought I'd try enabling the higher frequencies by editing the dfs_freq_table in arch/arm/plat-s5p6818/nanopi3/device.c to match the values in s5p6818-cpufreq.h. That gave me a nice stable 1.6GHz with only a small increase in load temperature (again, this may not apply to all boards or if you are loading the gpu/neon extensions).

 

This from the end of a short (2 minute) run:

1.60 GHz ???V 50.0C fan n/a
1.60 GHz ???V 51.0C fan n/a
1.60 GHz ???V 51.0C fan n/a
1.60 GHz ???V 51.0C fan n/a
400 MHz  ???V 44.0C fan n/a
400 MHz  ???V 41.0C fan n/a
400 MHz  ???V 40.0C fan n/a

 

I haven't figured out where to read the voltage from on this board yet - none of the regulators have an obvious name for vdd_arm, and the fan is running continuously, so not hooked up to a gpio at the moment. That might be this weekends job, depending on the weather.

Link to post
Share on other sites

On 2016-07-08 at 3:05 AM, tkaiser said:

To translate this: An ODROID-C2 (quad core Cortex-A53) is able to finish the same test in 3.x seconds (so now you know that it might make a huge difference to be able to execute ARMv8 code on ARMv8 cores -- NanoPi M3 is here as bad as RPi 3)

I was curious about the large difference in performance on this test, so I took a look. The code is using 64 bit longs and double precision floating point, and the majority of the time is in a small loop, so it's a candidate for a higher than average delta, but I would still have expected something closer to 2x than 10x or more. Here's the key part:

 

  unsigned long long c;
  unsigned long long l;
  double t;
  unsigned long long n=0;

  for(c=3; c < max_prime; c++)
  {
    t = sqrt((double)c);
    for(l = 2; l <= t; l++)
      if (c % l == 0)
        break;
    if (l > t )
      n++;
  }

The 64 bit code looks fairly clean, here's the inner loop:

 

  f4:	91000442 	add	x2, x2, #0x1                // l++    <--- top of l loop
  f8:	9e630042 	ucvtf	d2, x2                    // convert l to float
  fc:	1e622010 	fcmpe	d0, d2                    // compare with t
 100:	54fffe4b 	b.lt	c8 <cpu_execute_event+0x28> // jump to top of c loop ^^^
 104:	9ac20a61 	udiv	x1, x19, x2               // c / l -> x1
 108:	9b02cc23 	msub	x3, x1, x2, x19           // x1 * x2 - x19, i.e. c - (c/l)*l 
 10c:	b5ffff43 	cbnz	x3, f4 <cpu_execute_event+0x54>   // branch to top of l loop ^^^

That's nice and straight-forward, so what's going on with the 32 bit version? (unnecessary detail, the following was compiled for arm instructions where as the default is usually thumb. It doesn't make much difference to the performance or the analysis, I'm just more used to reading 32 bit arm than thumb)

 118:	e2944001 	adds	r4, r4, #1                 // l++            <-------- top of l loop
 11c:	e2a55000 	adc	r5, r5, #0                   // with overflow
 120:	e1a00004 	mov	r0, r4                       // copy l into r0/1
 124:	e1a01005 	mov	r1, r5
 128:	ebfffffe 	bl	0 <__aeabi_ul2d>             // and call helper to convert to double (ouch again)
 12c:	e1a02004 	mov	r2, r4                       // copy l into r2/3
 130:	e1a03005 	mov	r3, r5
 134:	ec410b17 	vmov	d7, r0, r1                 // double(l) -> d7
 138:	e1a00006 	mov	r0, r6                       // copy c into r0/1
 13c:	e1a01007 	mov	r1, r7
 140:	eeb48bc7 	vcmpe.f64	d8, d7                 // l <= t
 144:	eef1fa10 	vmrs	APSR_nzcv, fpscr
 148:	baffffdc 	blt	c0 <cpu_execute_event+0x2c>  // branch to top of c loop ^^^
 14c:	ebfffffe 	bl	0 <__aeabi_uldivmod>         // call library for c % l
 150:	e1923003 	orrs	r3, r2, r3                 // look for any non-zero bit in the r2/3 remainder
 154:	1affffef 	bne	118 <cpu_execute_event+0x84> // branch to top of l loop ^^^

Obviously, not so nice. The 64 bit longs have to be carried in pairs of registers, e.g. l is r4 and r5, so a simple l++ takes two instructions to cover both the low and high words. That's where the factor of 2x I was expecting comes from. But the pain comes in when we have to convert a long to a double. No ucvtf instruction available here, so we have to make a helper call to __aeabi_ul2d. And that means we have to copy l (in r4 and r5) into the parameter registers r0 and r1, make the call and then move the result from r0/r1 into d7 where we want it. The same problem arises when we want to compute c % l, without an instruction available we have make a helper call, which this time means copying 4 registers into the param regs before making the call. It's having to make these calls to helper methods that explains the large performance delta.

 

This is rather a neat example of why 64 bit on arm is about more than addressing a large memory space. As well as twice as many registers each of twice the size, you also get the newer instruction set which has been optimized for more modern work-loads. It's still going to be rare to see this big a delta, but 10 to 40% probably isn't unusual on cpu intensive applications.

 

Just for fun, results from my (sadly still 32bit) m3. I grabbed sysbench from github, so I had to change the command line a bit, but hopefully this is comparable to previous results:

./sysbench cpu run --cpu-max-prime=20000 --threads=8 --events=10000 --time=100

CPU speed:
    events per second:   197.90

General statistics:
    total time:                          50.5295s
    total number of events:              10000

 

Temperature seems to be stable at 52C during max-prime=200000 (it's probably about 18 or 19C in the basement today)

 

1.60 GHz ???V 52.0C fan n/a
1.60 GHz ???V 52.0C fan n/a
1.60 GHz ???V 52.0C fan n/a
1.60 GHz ???V 52.0C fan n/a


(readings at 10s intervals, with a couple of minutes worth of the same value before that)

 

 

Link to post
Share on other sites

Guest
This topic is now closed to further replies.
 Share

7 7