Jump to content

Orange pi zero random hangs


Atgeek

Recommended Posts

I've a opi zero running the latest armbian stable release, it just handle a pihole dhcp server and run the transmission client, but randomly stuck becoming inaccessible through ssh and webserver pages. All return normal when it restart by manual removing the power supply, but this way it's very uncomfortable, so what could be the cause? Is there a way to log the system errors for read them the next reboot? Thanks

Link to comment
Share on other sites

Install RPi monitor (armbianmonitor) and see what is going on. If you can, change PSU and monitor the voltage at board running at full. Use stress or similar stressing tool.

Link to comment
Share on other sites

I ran into same issue: my Zero (512MB v1.4) frequently hangs with Armbian (stable, legacy kernel). The Zero runs pi-hole and motioneye.

 

Solution: build my own image, Armbian (Xenial) 16.04 with desktop enabled. Sadly hw transcoding (mjpeg/h264) it seems not working out-of-the-box due to lack of ffmpeg 3.1.x with Cedrus.

 

But... with this image run stable, at this moment 6 day in uptime. I've not changed micro SD or PSU and the archive downloaded from armbian.com was not damaged.

 

Spoiler

  ___                               ____  _   _____              
 / _ \ _ __ __ _ _ __   __ _  ___  |  _ \(_) |__  /___ _ __ ___  
| | | | '__/ _` | '_ \ / _` |/ _ \ | |_) | |   / // _ \ '__/ _ \
| |_| | | | (_| | | | | (_| |  __/ |  __/| |  / /|  __/ | | (_) |
 \___/|_|  \__,_|_| |_|\__, |\___| |_|   |_| /____\___|_|  \___/
                       |___/                                     

Welcome to ARMBIAN 5.32 user-built Ubuntu 16.04.3 LTS 3.4.113-sun8i   
System load:   0.28 0.29 0.68      Up time:       6 days        
Memory usage:  26 % of 494MB      IP:            192.168.1.101
CPU temp:      61°C               
Usage of /:    33% of 7.2G       storage/:      4% of 29G        

[ General system configuration: armbian-config ]

Last login: Sat Sep 23 13:11:52 2017 from 192.168.1.105

 

Is only a bit slow (without running services... should score about 6,5 s at 1200 MHz):

 

Spoiler

francesco@orangepizero:~$ sysbench --num-threads=4 --test=cpu --cpu-max-prime=2000 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 2000


Test execution summary:
    total time:                          8.5835s
    total number of events:              10000
    total time taken by event execution: 34.3080
    per-request statistics:
         min:                                  3.42ms
         avg:                                  3.43ms
         max:                                  8.96ms
         approx.  95 percentile:               3.46ms

Threads fairness:
    events (avg/stddev):           2500.0000/2.12
    execution time (avg/stddev):   8.5770/0.00

 

Link to comment
Share on other sites

I believe I am having this exact same issue.   I am using the same board with the same OS version and getting the same hangs. ( If this deserves its own thread, let me know)

 

Details of my system/problem:

 

Pi zero H2 running 16.04

Running pihole and pivpn 

Installed in case with heatsink

Powered from Samsung 2A charger

Cat6 to router

 

Running pihole and just doing dns masq, it is stable for days.  If I enable dhcp, it tanks within hours.  If I turn dhcp on and reboot the router, it tanks as soon as everything hits it simultaneously for an address.

 

Running openVPN (dhcp disabled), it will run fine if nothing is connected.  Once a client connects, it will lock up within 20 minutes.

 

Heres the interesting part...

 

I was experimenting last night.

 

With base load (no VPN clients connected, just serving dns) the system idles just north of 50C.  Connecting a client causes a mild increase in temp and system load prior to the hang.  However.... I hammered on it a bit with a bemchmark, pulling loads and temps much higher than what it sees before it crashes, and it remains stable.

 

Here is the output of pimon during that test.  There are two drops.   The uptick before the first drop is where I connected and openVPN client.  After rebooting the system, I did the benchmarking ( the rise in temp and cpu), followed by reconnecting to the VPN before the second crash.

 

mvNTDMK.png

 

Link to comment
Share on other sites

28 minutes ago, AngularSpecter said:

I believe I am having this exact same issue.   I am using the same board with the same OS version and getting the same hangs. ( If this deserves its own thread, let me know)

 

Details of my system/problem:

 

Pi zero H2 running 16.04

Running pihole and pivpn 

Installed in case with heatsink

Powered from Samsung 2A charger

Cat6 to router

 

Running pihole and just doing dns masq, it is stable for days.  If I enable dhcp, it tanks within hours.  If I turn dhcp on and reboot the router, it tanks as soon as everything hits it simultaneously for an address.

 

 

 

 

 

What are you seeing in dmesg before and after?

Link to comment
Share on other sites

1 hour ago, lanefu said:

 

 

What are you seeing in dmesg before and after?

 

I'm still digging through logs.   The only thing I really see are a ton of ARSIC errors, but those are present even when it's stable. 

 

19 minutes ago, Atgeek said:

Since in all 3 the cases there is pihole running, maybe this the problem? 

 

I don't think so... at least not directly.   I can run pihole doing only dns filtering for days and its fine.   

 

If anything, it seems more correlated to activity on the NIC, as the things that kill it are an influx of dhcp requests or routing VPN traffic

Link to comment
Share on other sites

As another test, I limited the max core speed to 912 MHz.   I was able to successfully run the VPN for an hour with fairly heavy traffic (streaming video), without it crashing.  The performance logs also showed zero increase in temperature and less switching of clock speed and Vcore.

 

I'm wondering if either my psu isn't really supplying a steady 5V on those surges or if there is a bad setting in the voltage/clock scaling ( based on the ARSIC errors, I think there is )

Link to comment
Share on other sites

19 hours ago, Atgeek said:

I've already tried with frequency limit (912 Mhz) but with no result. 

 

Isn't a stress or a overcharge problem.
My Zero (v1.4 - 4 core at 1,2GHz and about 70°C) constantly runs at 100% with pihole + motioneye image analysis, some peaks to 400% making/saving h264 timelapse with motioneye.

With user-built image I've no problem (but hangs with image downloaded from armbian.com).

 

Istantanea_2017-09-24_15-42-02.png.5b72a68687f1921f5238c971a8c3729c.png

 

 

Link to comment
Share on other sites

22 hours ago, AngularSpecter said:

if there is a bad setting in the voltage/clock scaling ( based on the ARSIC errors, I think there is )

 

If you get ARISC errors then you use wrong settings anyway (most probably created/chosen an image for the wrong board, one with I2C accessible voltage regulator). I don't really remember but most probably in such a situation with wrong voltage regulator settings the board does not do cpufreq scaling but remains at the clockspeed set by u-boot (that's what monitoring is for, use either RPi-Monitor or 'armbianmonitor -m' to check).

 

3 affected people and not a single log. Regardless which thread I visit here in the forums there's always a huge 'Before reporting problems with your board running Armbian, check the following:' intro at the top asking for 'armbianmonitor -u' output.

Link to comment
Share on other sites

9 hours ago, tkaiser said:

3 affected people and not a single log. Regardless which thread I visit here in the forums there's always a huge 'Before reporting problems with your board running Armbian, check the following:' intro at the top asking for 'armbianmonitor -u' output.

 

What logs are useful?  I can recreate the issue pretty reliably.  So you need a dump from armbianmonitor prior to and after recovery?

Link to comment
Share on other sites

3 hours ago, AngularSpecter said:

So you need a dump from armbianmonitor prior to and after recovery?


Hunting a bug is crime scene investigation :D armbianmonitor -m at working image, to see your setup, versions, possible 3rd party hardware ... then a screen picture or console log with crash dump information, if any. 

Link to comment
Share on other sites

5 hours ago, AngularSpecter said:

So you need a dump from armbianmonitor prior to and after recovery?

 

As already said: you use wrong settings and don't comment on which/why? If you would have provided 'armbianmonitor -u' output from your image we would already know and could check how you currently operate your board. If you get ARISC errors you most probably chose the wrong settings for a board equipped with I2C voltage regulator and these are afaik clocked at 1008 MHz by u-boot.

 

So a difference might be that you're currently running all the time at 1008 MHz while with an image using correct settings cpufreq would be switching between 240 and 1200 MHz. But without logs this all is just an absurd waste of time :)

Link to comment
Share on other sites

18 hours ago, Atgeek said:

What's your maximum up time? Anyway if so than may be a sort of bug in the official release... 

New installation (user-build) ca. 7 day and keeps working fine. Old installation: max 1 day (hangs... no reply from ssh or pihole web gui or motioneye web gui).
 

17 hours ago, tkaiser said:

... 3 affected people and not a single log...

I'm sorry but I've deleted old installation.

Link to comment
Share on other sites

Here's my setup info:

 

http://sprunge.us/eMED

 

I'm running the image I found listed for the zero.  When I was getting  the crashes the first time, I had ARSIC errors as well.   I tried running the fix thermal issues script that I found on this site, and that made no difference. 

 

The dvfs table pre-mod was

 

[dvfs_table]
pmuic_type = 1
pmu_gpio0 = port:PL06<1><1><2><1>
pmu_level0 = 11300
pmu_level1 = 1100
max_freq = 1200000000
min_freq = 648000000
LV_count = 2
LV1_freq = 1200000000
LV1_volt = 1300
LV2_freq = 648000000
LV2_volt = 1100

 

I can post the full fex if it helps.

 

To alter the max speed I used h3consumption

 

Active settings:

cpu       912 mhz allowed, 1200 mhz possible, 4 cores activ

dram      408 mhz

hdmi/gpu  off

usb ports off

eth0      100Mb/s/Full, Link: yes

tun0      no wireless extensions.

wlan0     IEEE 802.11bgn  ESSID:off/any
          Mode:Managed  Access Point: Not-Associated   Tx-P
wer=20 dBm
          Retry  long limit:7   RTS thr:off   Fragment thr:
ff
          Encryption key:off
          Power Management:on
 

Link to comment
Share on other sites

14 minutes ago, AngularSpecter said:

I can post the full fex if it helps.

 

Not necessary since contained in your link: http://sprunge.us/eMED

 

The dvfs table is wrong since not allowing to clock below 648 MHz while /etc/defaults/cpufrequtils tries to idle at 240 MHz. We did not change anything to these settings within the last year: https://github.com/armbian/build/commits/master/config/fex/orangepizero.fex so for the ARISC errors the 'I tried running the fix thermal issues script that I found on this site' is responsible.

 

I adjusted http://kaiser-edv.de/tmp/H9rWPf/fix-thermal-problems.sh right now since it's a historical piece of code suited to fix the other OS images that users had to use prior to Armbian. Please try this to recover from the situation:

ln -sf /boot/bin/orangepizero.bin /boot/script.bin

then make your h3consumption adjustments, reboot and check/report. If this does NOT work, check for /boot/script.bin.bak and replace script.bin with it or even download latest version of orangepizero.fex from Github, convert it using 'fex2bin /path/to/orangepizero.fex /boot/script.bin' and then follow the above steps.

 

 

Link to comment
Share on other sites

7 minutes ago, Atgeek said:

 

You were the one running transmission? Since I've seen no connected USB storage I would assume that's the culprit since on a counterfeit/crap SD card this can't work well (name suggests SanDisk while oemid/manfid show 'noname'):

### mmc0:0001 info:

                 cid: 0000005344344742000001a46c00f137 
                 csd: 400e00325b5900001dbf7f800a400085 
                 scr: 02b5800000000000 
                date: 01/2015 
                name: SD4GB 
                type: SD 
preferred_erase_size: 4194304 
               fwrev: 0x0 
               hwrev: 0x0 
               oemid: 0x0000 
              manfid: 0x000000 
              serial: 0x0001a46c 
              uevent: DRIVER=mmcblk MMC_TYPE=SD MMC_NAME=SD4GB MODALIAS=mmc:block 
          erase_size: 512 

I would check performance as per: https://forum.armbian.com/index.php?/topic/954-sd-card-performance/ and also run 'armbianmonitor -c $HOME' (as normal user so not as root / without sudo)

 

Link to comment
Share on other sites

11 hours ago, tkaiser said:

 

You were the one running transmission? Since I've seen no connected USB storage I would assume that's the culprit since on a counterfeit/crap SD card this can't work well (name suggests SanDisk while oemid/manfid show 'noname'):


### mmc0:0001 info:

                 cid: 0000005344344742000001a46c00f137 
                 csd: 400e00325b5900001dbf7f800a400085 
                 scr: 02b5800000000000 
                date: 01/2015 
                name: SD4GB 
                type: SD 
preferred_erase_size: 4194304 
               fwrev: 0x0 
               hwrev: 0x0 
               oemid: 0x0000 
              manfid: 0x000000 
              serial: 0x0001a46c 
              uevent: DRIVER=mmcblk MMC_TYPE=SD MMC_NAME=SD4GB MODALIAS=mmc:block 
          erase_size: 512 

I would check performance as per: https://forum.armbian.com/index.php?/topic/954-sd-card-performance/ and also run 'armbianmonitor -c $HOME' (as normal user so not as root / without sudo)

 

Yes I'm running transmission, I never though it could be the SD, it isn't new but it's genuine for sure. I will try another micro sd asap. Thanks 

Link to comment
Share on other sites

16 hours ago, Moklev said:

Old installation: max 1 day (hangs... no reply from ssh or pihole web gui or motioneye web gui).

I don't know how much resources are needed by pihole (never used it). But motion, motioneye and all these stuff should run stable on the downloadable armbian for the pi zero.

During the dogafu experiment, I had this kind of stuff in mind to let my pi crashing... Just for some testings I let it run on a appropriate setting (good SD-Card & PSU), together with a samba server in the Background and mencoder to generate the time lapse videos. This setup lived for more than 2 weeks before I decided that I would do more senseful stuff first for the dogafu experiment.

 

Setup: Armbian legacy 3.4.113 Ubuntu, Samsung Evo+ 32GB,  2A 5V PSU (microUSB), network over ETH, no case but also no cooler on SoC

 

Link to comment
Share on other sites

6 hours ago, Atgeek said:

Yes I'm running transmission, I never though it could be the SD

 

https://forum.armbian.com/index.php?/topic/838-best-budget-device-as-torrent-box/&do=findComment&comment=6427

 

TL;DR: transmission-daemon set up to write to storage that is insanely slow wrt RANDOM IO will slow the system that much down that it seems to hang. Whether that's the problem with your setup iozone will show.

 

6 hours ago, Atgeek said:

the SD, it isn't new but it's genuine for sure

 

How/why? Do a web search for 'SD4GB manfid oemid' for example. Everything with only zeroes in manfid/oemid is not genuine but counterfeit/broken or cheap crap in the first place (and if a 'genuine' SD card starts to 'forget' who the manufacturer was then I would assume that it soon also starts to forget the data written to it).

6 hours ago, Atgeek said:

I will try another micro sd asap

 

C'mon, I asked you to TEST your card for performance and whether it's ok (and of course provide the results). I gave you a link where it's explained how easy this is since it's really just doing this

cd $HOME ; iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2
armbianmonitor -c $HOME

We need these test results to get an idea whether we can ALWAYS assume that if your card shows horribly low random IO performance (which I assume, especially with 16K block size) then it's able to slow your system that much down that it seems to 'hang'. Since we need a way to avoid wasting our time with the same issues over and over again (SD card crappiness and power supply issues).

Link to comment
Share on other sites

In the meantime I'm thinking about modifying firstrun script in the following way: Check SD card info and when metadata is questionable, capacity below 8 GB or random IO 16K write IOPS below 25 immediately enter 'self destruction' mode, preventing Armbian from being installed. If SD card has passed the check then enter 120 seconds cpuburn stage to crash the installation on systems with insufficient power supplies. But that's obviously just me, nothing will change and leaving the forum is the better alternative :) 

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines