41 41
Christos

Armbian for OrangePi PC2, AllWinner H5

Recommended Posts

11 hours ago, znoxx said:

My armbian "cluster" works 24x7 with full load

Znoxx, What is your Armbian cluster? and what does it do?

 

I have been seriously thinking about the FUN of building a cluster but I can not think of a use or purpose that ?

 

 

PS have you bench-marked your Orange Pi PC2 and or cluster?

 

Seasalt

Share this post


Link to post
Share on other sites

@Seasalt, not sure I've got the question "What is your cluster" -- link I posted do have some pics :) and there is a "translate" button in menu (better than nothing).

About cluster purpose -- currently it it mining some Verium crypto.

Before I tried Apache Spark, for example. But not really successfull due to faulty microSDs and large number of read/write.

Regarding benchmark - in mining (hashrate) it is comparable to e.g. core i5..i7, but well -- it consumes much less energy.  E.g. 60..70 watts vs 150..200.

I was inspired by this guy http://picocluster.com -- but my setup is much more cool :)). I'm using custom PCB to drive ATX PSU and fans.

 

Share this post


Link to post
Share on other sites

@znoxx

Nice job! I like the way how it looks, handmade devices have their own charm! :)

 

So the debug proccess is still in process. I tried u-boot 2017.11 but got unfortunately no success.

Would you please check "dmesg" of one of yours OPI PC2? Thank you in advance.

The problem is it's almost impossible to notice the crash when there's no serial cable attached nor display as system recovers it but also notices in dmesg.

Spoiler

[  219.123436] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000088
[  219.132368] Mem abort info:
[  219.135292]   ESR = 0x96000004
[  219.138419]   Exception class = DABT (current EL), IL = 32 bits
[  219.144427]   SET = 0, FnV = 0
[  219.147569]   EA = 0, S1PTW = 0
[  219.150830] Data abort info:
[  219.153734]   ISV = 0, ISS = 0x00000004
[  219.157667]   CM = 0, WnR = 0
[  219.160744] user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000022abd91e
[  219.167460] [0000000000000088] pgd=0000000000000000
[  219.172458] Internal error: Oops: 96000004 [#1] SMP
[  219.177346] Modules linked in: zstd zram sun8i_codec_analog snd_soc_hdmi_codec snd_soc_simple_card sun8i_adda_pr_regNmap sun4i_i2s sun4i_codec snd_soc_simple_card_utils snd_soc_core snd_pcm_dmaengine snd_pcm cpufreq_dt snd_timer sun4i_gpadc_iio industrialio thermal_sys sunxi_cir rc_core snd w1_therm soundcore w1_gpio wire lima sy8106a_regulator gpu_sched ttm dw_hdmi_i2s_audio dw_hdmi_cec uas realtek
[  219.212663] CPU: 2 PID: 2473 Comm: grep Not tainted 4.19.8-sunxi64 #5.67.181213
[  219.219971] Hardware name: Xunlong Orange Pi PC 2 (DT)
[  219.225120] pstate: 00000005 (nzcv daif -PAN -UAO)
[  219.229945] pc : _get_random_bytes+0xac/0xe8
[  219.234255] lr : _get_random_bytes+0xac/0xe8
[  219.238547] sp : ffff00000d26bbc0
[  219.241876] x29: 0000000000000000 x28: ffff80002a676100
[  219.247203] x27: 0000000000000002 x26: ffff80002ff85780
[  219.252527] x25: ffff80003634f000 x24: ffff800033e2e900
[  219.257848] x23: 0000000000000010 x22: ffff000008d688c8
[  219.263176] x21: ffff00000d26bc08 x20: 0000000000000010
[  219.268494] x19: ffff00000d26bd58 x18: 0000000000000000
[  219.273811] x17: 00000000cf0ed6af x16: 000000009759a261
[  219.279135] x15: 00000000846c09cb x14: 00000000e1321b33
[  219.284464] x13: 00000000ab8ed928 x12: 00000000e3b80007
[  219.289786] x11: 000000005ceb0788 x10: 000000003ce02806
[  219.295107] x9 : 00000000c61535a9 x8 : ffff00000d26bc48
[  219.300427] x7 : 0000000000000000 x6 : ffff00000d26bc08
[  219.305757] x5 : ffff800033d5e400 x4 : 0000000000000008
[  219.311088] x3 : 0000000000000030 x2 : 0000000000000008
[  219.316408] x1 : 0000000000000000 x0 : ffff00000d26bc08
[  219.321738] Process grep (pid: 2473, stack limit = 0x000000004c1dba41)
[  219.328266] Call trace:
[  219.330732]  _get_random_bytes+0xac/0xe8
[  219.334662] Code: d2800801 aa1503e0 912322d6 940ed73a (f94047a1)
[  219.340765] ---[ end trace 71540e1e95143ba8 ]---

I tried ATX power supply as well - so got same problem and also soldered 1000uF 10v capacitor directly to power supply connector to be sure there's no power fluctuation because of long cables but the problem still persists.

Share this post


Link to post
Share on other sites

@svts

Here is dmesg from 2 nodes.

One bought long time ago:

 

https://pastebin.com/Q0As9JTh

 

And second from node which was recently replaced due to dead Ethernet (still not sure, that I picked right node, but anyway -- 2 dmesgs better than one):

 

https://pastebin.com/8s91VAuU

 

Hope it will help you. I still have a strong feeling, that the problem is inside hardware (RAM). May be it makes sense to order new one from official store@Aliexpres ? Just to be sure. I can bet, it will arrive faster, than you will debug the thing...

 

Share this post


Link to post
Share on other sites

@znoxx
Thank you. Well actually I did just what you said. When I started expecting some issues I ordered a new board and when it arrived I confimed that the problem did persist. Technically it's possible that the board I ordered has the same buggy parts but... it's almost unbelievable you know :)

Share this post


Link to post
Share on other sites

@Seasalt, thanks! :) I'm working on new version, with OpiOne+ (faster!! newer :)) and fitted into standard microATX case. Want something more standard and reproducible.

 

@svts Hmm... You are right. I even upgraded whole bunch of board to latest armbian to check if it is an issue of latest builds. But looks like no. ~1 hour 100% load, no "mayday mayday"

 

zno@cluster:~$  for i in `seq 0 9`; do ssh node$i uptime; done
 10:47:44 up  1:02,  0 users,  load average: 4.01, 4.03, 4.00
 10:47:45 up  1:01,  0 users,  load average: 4.08, 4.02, 4.01
 10:47:46 up  1:01,  0 users,  load average: 4.18, 4.06, 4.02
 10:47:47 up  1:00,  0 users,  load average: 4.00, 4.01, 4.00
 10:47:48 up  1:00,  0 users,  load average: 4.00, 4.00, 3.99
 10:47:50 up 59 min,  0 users,  load average: 4.02, 4.05, 4.00
 10:47:51 up 59 min,  0 users,  load average: 4.00, 4.00, 3.98
 10:47:52 up 58 min,  0 users,  load average: 4.16, 4.09, 4.02
 10:47:54 up 58 min,  0 users,  load average: 4.00, 4.05, 4.00
 10:47:55 up 57 min,  0 users,  load average: 4.00, 4.01, 4.00

 

Consider "yet another lame idea". Can you try to power stuff via GPIO pins, not via barrel connector ? I used 2 wires from standard Ethernet patch cord. Just as experiment.

Share this post


Link to post
Share on other sites

@znoxx

Maybe it depends on what it is running... Would you please to run my test script there if it's possible?

Spoiler

#!/bin/bash

while true; do
  date +%H:%M:%S | tr -d '\n'
  for I in `seq 5 10`; do
    timeout $(($I/2)) cat /dev/zero > /dev/null &
    timeout $(($I/2)) cat /dev/zero > /dev/null &
    cat /sbin/`ls -S /sbin | head -n1` | grep 123 > /dev/null &
  done
  top -d0 -n2 | grep 'Cpu' | tail -n1 | sed 's/%Cpu(s)\://' | tr -d '\n'
  echo -n " $(($((`cat /sys/bus/cpu/devices/cpu0/cpufreq/scaling_cur_freq 2>/dev/null`+0))/1000))MHz"
  echo -n " $(($((`cat /sys/class/thermal/thermal_zone0/temp 2>/dev/null`+0))/1000))C"
  echo -n " "
  ps -eF | grep -E '[0-9]{2} cat' | awk '{print $7}' | tr -d '\n'
  wait

  echo -n " b>"
  read -t1 BREAK
  echo "B"

  if dmesg | tail -n20 | grep '\---\[ end trace'; then break; fi
  if [ ! -z "$BREAK" ]; then break; fi
done

This script breaks the system in like 5 to 15 minutes usually. Thank you!

Share this post


Link to post
Share on other sites

@svts

 

see results here:

https://pastebin.com/5CXyyuQm

 

Script runned for approx. 10 minutes, giving message presented in the paste (I've added only last 3 minutes I guess because "screen" truncated it - hope it's enough)

 

Also dmesg has swap/oom errors and I see docker container is restarting, but it was done during high load to ram/swap/cpu.

 

Note to myself: invest into better microsd cards :).

 

Share this post


Link to post
Share on other sites

So the problem was solved!

sed -i -e '1imw.l 0x01C20020 0x80101810\' /boot/boot.cmd
mkimage -C none -A arm -T script -d /boot/boot.cmd /boot/boot.scr

The reason of crashes was DRAM PLL value which seems too high for the boards I have.

Default DRAM_PLL value set by u-boot 2017.05+ is 624MHz. It seems not all board support this value.

I changed it to 600MHz by setting 0x01c20020 register to 0x1810 value (instead of 0x1910 set by u-boot) and there's no any crash anymore.

 

The script adds "mw.l 0x01C20020 0x80101810" line to boot.cmd and then compiles boot.scr.

After that u-boot will change DRAM_PLL value to a bit lower one to avoid crashes.

 

Thanks everybody who helped me to find a solution.

Special thanks to @znoxx for testing and logging :)

 

Share this post


Link to post
Share on other sites

@svts, u digged great thing.

U told me, that boards were bought not from Xunlong shop, but from some third-party guys.

 

Good thing for Xunlong -- their products are being copied =) with cheaper components (which is sooo aliexpress -typical). 

Bad things for most of us -- when you buy a board from a "local supplier" or cheaper than in official store, most probably you can bump into a "fake one". Because supplier cares about margin/income/whatever, not about the quality of product. Probably in Europe/US the situation is better, but in countries like Russia it's pretty typical.

 

So time to say "beware of fake Orange Pi's " ?

Share this post


Link to post
Share on other sites
On 12/17/2018 at 5:35 AM, svts said:

So the problem was solved!


sed -i -e '1imw.l 0x01C20020 0x80101810\' /boot/boot.cmd
mkimage -C none -A arm -T script -d /boot/boot.cmd /boot/boot.scr

The reason of crashes was DRAM PLL value which seems too high for the boards I have.

Default DRAM_PLL value set by u-boot 2017.05+ is 624MHz. It seems not all board support this value.

I changed it to 600MHz by setting 0x01c20020 register to 0x1810 value (instead of 0x1910 set by u-boot) and there's no any crash anymore.

 

The script adds "mw.l 0x01C20020 0x80101810" line to boot.cmd and then compiles boot.scr.

After that u-boot will change DRAM_PLL value to a bit lower one to avoid crashes.

 

Thanks everybody who helped me to find a solution.

Special thanks to @znoxx for testing and logging :)

 

 

Great work! I have the same problem with a PC 2 which came direct from the manufacturer

 

Your script has some unicode characters so I removed them:

sed -i -e '1imw.l 0x01C20020 0x80101810\' /boot/boot.cmd
mkimage -C none -A arm -T script -d /boot/boot.cmd /boot/boot.scr

 

Share this post


Link to post
Share on other sites
41 41