Atgeek Posted September 22, 2017 Posted September 22, 2017 I've a opi zero running the latest armbian stable release, it just handle a pihole dhcp server and run the transmission client, but randomly stuck becoming inaccessible through ssh and webserver pages. All return normal when it restart by manual removing the power supply, but this way it's very uncomfortable, so what could be the cause? Is there a way to log the system errors for read them the next reboot? Thanks
Igor Posted September 22, 2017 Posted September 22, 2017 Check most recent preview build. This will be next stable. https://dl.armbian.com/orangepizero/Ubuntu_xenial_next.7z In general, make sure you have good power supply and cables. Wifi rather don't use since it has a very bad chip and driver.
zador.blood.stained Posted September 22, 2017 Posted September 22, 2017 8 minutes ago, Igor said: This will be next stable. https://dl.armbian.com/orangepizero/Ubuntu_xenial_next.7z I wouldn't make a "stable" mainline build yet. Without DVFS and thermal throttling it will be as "stable" as this: https://4pda.ru/forum/index.php?showtopic=810858&st=80#entry64772707
Igor Posted September 22, 2017 Posted September 22, 2017 Yes, you are correct. To soon. I remove them.
Atgeek Posted September 22, 2017 Author Posted September 22, 2017 Sorry I didn't specified that it's connect via the Ethernet port. The power supply is a 5V 2A psp charger, feeding power directly to the pin header
Igor Posted September 23, 2017 Posted September 23, 2017 Install RPi monitor (armbianmonitor) and see what is going on. If you can, change PSU and monitor the voltage at board running at full. Use stress or similar stressing tool.
Atgeek Posted September 23, 2017 Author Posted September 23, 2017 I will replace the psu, anyway is rpi monitor capable of showing system error log? In such case I need to determine first if the opi just stuck or if for some reason it can't connect to the network anymore.
Moklev Posted September 23, 2017 Posted September 23, 2017 I ran into same issue: my Zero (512MB v1.4) frequently hangs with Armbian (stable, legacy kernel). The Zero runs pi-hole and motioneye. Solution: build my own image, Armbian (Xenial) 16.04 with desktop enabled. Sadly hw transcoding (mjpeg/h264) it seems not working out-of-the-box due to lack of ffmpeg 3.1.x with Cedrus. But... with this image run stable, at this moment 6 day in uptime. I've not changed micro SD or PSU and the archive downloaded from armbian.com was not damaged. Spoiler ___ ____ _ _____ / _ \ _ __ __ _ _ __ __ _ ___ | _ \(_) |__ /___ _ __ ___ | | | | '__/ _` | '_ \ / _` |/ _ \ | |_) | | / // _ \ '__/ _ \ | |_| | | | (_| | | | | (_| | __/ | __/| | / /| __/ | | (_) | \___/|_| \__,_|_| |_|\__, |\___| |_| |_| /____\___|_| \___/ |___/ Welcome to ARMBIAN 5.32 user-built Ubuntu 16.04.3 LTS 3.4.113-sun8i System load: 0.28 0.29 0.68 Up time: 6 days Memory usage: 26 % of 494MB IP: 192.168.1.101 CPU temp: 61°C Usage of /: 33% of 7.2G storage/: 4% of 29G [ General system configuration: armbian-config ] Last login: Sat Sep 23 13:11:52 2017 from 192.168.1.105 Is only a bit slow (without running services... should score about 6,5 s at 1200 MHz): Spoiler francesco@orangepizero:~$ sysbench --num-threads=4 --test=cpu --cpu-max-prime=2000 run sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 4 Doing CPU performance benchmark Threads started! Done. Maximum prime number checked in CPU test: 2000 Test execution summary: total time: 8.5835s total number of events: 10000 total time taken by event execution: 34.3080 per-request statistics: min: 3.42ms avg: 3.43ms max: 8.96ms approx. 95 percentile: 3.46ms Threads fairness: events (avg/stddev): 2500.0000/2.12 execution time (avg/stddev): 8.5770/0.00
AngularSpecter Posted September 23, 2017 Posted September 23, 2017 I believe I am having this exact same issue. I am using the same board with the same OS version and getting the same hangs. ( If this deserves its own thread, let me know) Details of my system/problem: Pi zero H2 running 16.04 Running pihole and pivpn Installed in case with heatsink Powered from Samsung 2A charger Cat6 to router Running pihole and just doing dns masq, it is stable for days. If I enable dhcp, it tanks within hours. If I turn dhcp on and reboot the router, it tanks as soon as everything hits it simultaneously for an address. Running openVPN (dhcp disabled), it will run fine if nothing is connected. Once a client connects, it will lock up within 20 minutes. Heres the interesting part... I was experimenting last night. With base load (no VPN clients connected, just serving dns) the system idles just north of 50C. Connecting a client causes a mild increase in temp and system load prior to the hang. However.... I hammered on it a bit with a bemchmark, pulling loads and temps much higher than what it sees before it crashes, and it remains stable. Here is the output of pimon during that test. There are two drops. The uptick before the first drop is where I connected and openVPN client. After rebooting the system, I did the benchmarking ( the rise in temp and cpu), followed by reconnecting to the VPN before the second crash.
lanefu Posted September 23, 2017 Posted September 23, 2017 28 minutes ago, AngularSpecter said: I believe I am having this exact same issue. I am using the same board with the same OS version and getting the same hangs. ( If this deserves its own thread, let me know) Details of my system/problem: Pi zero H2 running 16.04 Running pihole and pivpn Installed in case with heatsink Powered from Samsung 2A charger Cat6 to router Running pihole and just doing dns masq, it is stable for days. If I enable dhcp, it tanks within hours. If I turn dhcp on and reboot the router, it tanks as soon as everything hits it simultaneously for an address. What are you seeing in dmesg before and after?
Atgeek Posted September 23, 2017 Author Posted September 23, 2017 Since in all 3 the cases there is pihole running, maybe this the problem?
AngularSpecter Posted September 23, 2017 Posted September 23, 2017 1 hour ago, lanefu said: What are you seeing in dmesg before and after? I'm still digging through logs. The only thing I really see are a ton of ARSIC errors, but those are present even when it's stable. 19 minutes ago, Atgeek said: Since in all 3 the cases there is pihole running, maybe this the problem? I don't think so... at least not directly. I can run pihole doing only dns filtering for days and its fine. If anything, it seems more correlated to activity on the NIC, as the things that kill it are an influx of dhcp requests or routing VPN traffic
AngularSpecter Posted September 23, 2017 Posted September 23, 2017 As another test, I limited the max core speed to 912 MHz. I was able to successfully run the VPN for an hour with fairly heavy traffic (streaming video), without it crashing. The performance logs also showed zero increase in temperature and less switching of clock speed and Vcore. I'm wondering if either my psu isn't really supplying a steady 5V on those surges or if there is a bad setting in the voltage/clock scaling ( based on the ARSIC errors, I think there is )
Atgeek Posted September 23, 2017 Author Posted September 23, 2017 I've already tried with frequency limit (912 Mhz) but with no result.
Moklev Posted September 24, 2017 Posted September 24, 2017 19 hours ago, Atgeek said: I've already tried with frequency limit (912 Mhz) but with no result. Isn't a stress or a overcharge problem. My Zero (v1.4 - 4 core at 1,2GHz and about 70°C) constantly runs at 100% with pihole + motioneye image analysis, some peaks to 400% making/saving h264 timelapse with motioneye. With user-built image I've no problem (but hangs with image downloaded from armbian.com).
Atgeek Posted September 24, 2017 Author Posted September 24, 2017 What's your maximum up time? Anyway if so than may be a sort of bug in the official release...
tkaiser Posted September 24, 2017 Posted September 24, 2017 22 hours ago, AngularSpecter said: if there is a bad setting in the voltage/clock scaling ( based on the ARSIC errors, I think there is ) If you get ARISC errors then you use wrong settings anyway (most probably created/chosen an image for the wrong board, one with I2C accessible voltage regulator). I don't really remember but most probably in such a situation with wrong voltage regulator settings the board does not do cpufreq scaling but remains at the clockspeed set by u-boot (that's what monitoring is for, use either RPi-Monitor or 'armbianmonitor -m' to check). 3 affected people and not a single log. Regardless which thread I visit here in the forums there's always a huge 'Before reporting problems with your board running Armbian, check the following:' intro at the top asking for 'armbianmonitor -u' output.
AngularSpecter Posted September 25, 2017 Posted September 25, 2017 9 hours ago, tkaiser said: 3 affected people and not a single log. Regardless which thread I visit here in the forums there's always a huge 'Before reporting problems with your board running Armbian, check the following:' intro at the top asking for 'armbianmonitor -u' output. What logs are useful? I can recreate the issue pretty reliably. So you need a dump from armbianmonitor prior to and after recovery?
Igor Posted September 25, 2017 Posted September 25, 2017 3 hours ago, AngularSpecter said: So you need a dump from armbianmonitor prior to and after recovery? Hunting a bug is crime scene investigation armbianmonitor -m at working image, to see your setup, versions, possible 3rd party hardware ... then a screen picture or console log with crash dump information, if any.
tkaiser Posted September 25, 2017 Posted September 25, 2017 5 hours ago, AngularSpecter said: So you need a dump from armbianmonitor prior to and after recovery? As already said: you use wrong settings and don't comment on which/why? If you would have provided 'armbianmonitor -u' output from your image we would already know and could check how you currently operate your board. If you get ARISC errors you most probably chose the wrong settings for a board equipped with I2C voltage regulator and these are afaik clocked at 1008 MHz by u-boot. So a difference might be that you're currently running all the time at 1008 MHz while with an image using correct settings cpufreq would be switching between 240 and 1200 MHz. But without logs this all is just an absurd waste of time
Moklev Posted September 25, 2017 Posted September 25, 2017 18 hours ago, Atgeek said: What's your maximum up time? Anyway if so than may be a sort of bug in the official release... New installation (user-build) ca. 7 day and keeps working fine. Old installation: max 1 day (hangs... no reply from ssh or pihole web gui or motioneye web gui). 17 hours ago, tkaiser said: ... 3 affected people and not a single log... I'm sorry but I've deleted old installation.
AngularSpecter Posted September 25, 2017 Posted September 25, 2017 Here's my setup info: http://sprunge.us/eMED I'm running the image I found listed for the zero. When I was getting the crashes the first time, I had ARSIC errors as well. I tried running the fix thermal issues script that I found on this site, and that made no difference. The dvfs table pre-mod was [dvfs_table] pmuic_type = 1 pmu_gpio0 = port:PL06<1><1><2><1> pmu_level0 = 11300 pmu_level1 = 1100 max_freq = 1200000000 min_freq = 648000000 LV_count = 2 LV1_freq = 1200000000 LV1_volt = 1300 LV2_freq = 648000000 LV2_volt = 1100 I can post the full fex if it helps. To alter the max speed I used h3consumption Active settings: cpu 912 mhz allowed, 1200 mhz possible, 4 cores activ dram 408 mhz hdmi/gpu off usb ports off eth0 100Mb/s/Full, Link: yes tun0 no wireless extensions. wlan0 IEEE 802.11bgn ESSID:off/any Mode:Managed Access Point: Not-Associated Tx-P wer=20 dBm Retry long limit:7 RTS thr:off Fragment thr: ff Encryption key:off Power Management:on
tkaiser Posted September 25, 2017 Posted September 25, 2017 14 minutes ago, AngularSpecter said: I can post the full fex if it helps. Not necessary since contained in your link: http://sprunge.us/eMED The dvfs table is wrong since not allowing to clock below 648 MHz while /etc/defaults/cpufrequtils tries to idle at 240 MHz. We did not change anything to these settings within the last year: https://github.com/armbian/build/commits/master/config/fex/orangepizero.fex so for the ARISC errors the 'I tried running the fix thermal issues script that I found on this site' is responsible. I adjusted http://kaiser-edv.de/tmp/H9rWPf/fix-thermal-problems.sh right now since it's a historical piece of code suited to fix the other OS images that users had to use prior to Armbian. Please try this to recover from the situation: ln -sf /boot/bin/orangepizero.bin /boot/script.bin then make your h3consumption adjustments, reboot and check/report. If this does NOT work, check for /boot/script.bin.bak and replace script.bin with it or even download latest version of orangepizero.fex from Github, convert it using 'fex2bin /path/to/orangepizero.fex /boot/script.bin' and then follow the above steps.
Atgeek Posted September 25, 2017 Author Posted September 25, 2017 Here is my log: http://sprunge.us/aKKf I just noticed that if I leave it plugged after it hangs, a few hours later it become online again.
tkaiser Posted September 25, 2017 Posted September 25, 2017 7 minutes ago, Atgeek said: http://sprunge.us/aKKf You were the one running transmission? Since I've seen no connected USB storage I would assume that's the culprit since on a counterfeit/crap SD card this can't work well (name suggests SanDisk while oemid/manfid show 'noname'): ### mmc0:0001 info: cid: 0000005344344742000001a46c00f137 csd: 400e00325b5900001dbf7f800a400085 scr: 02b5800000000000 date: 01/2015 name: SD4GB type: SD preferred_erase_size: 4194304 fwrev: 0x0 hwrev: 0x0 oemid: 0x0000 manfid: 0x000000 serial: 0x0001a46c uevent: DRIVER=mmcblk MMC_TYPE=SD MMC_NAME=SD4GB MODALIAS=mmc:block erase_size: 512 I would check performance as per: https://forum.armbian.com/index.php?/topic/954-sd-card-performance/ and also run 'armbianmonitor -c $HOME' (as normal user so not as root / without sudo)
Atgeek Posted September 25, 2017 Author Posted September 25, 2017 11 hours ago, tkaiser said: You were the one running transmission? Since I've seen no connected USB storage I would assume that's the culprit since on a counterfeit/crap SD card this can't work well (name suggests SanDisk while oemid/manfid show 'noname'): ### mmc0:0001 info: cid: 0000005344344742000001a46c00f137 csd: 400e00325b5900001dbf7f800a400085 scr: 02b5800000000000 date: 01/2015 name: SD4GB type: SD preferred_erase_size: 4194304 fwrev: 0x0 hwrev: 0x0 oemid: 0x0000 manfid: 0x000000 serial: 0x0001a46c uevent: DRIVER=mmcblk MMC_TYPE=SD MMC_NAME=SD4GB MODALIAS=mmc:block erase_size: 512 I would check performance as per: https://forum.armbian.com/index.php?/topic/954-sd-card-performance/ and also run 'armbianmonitor -c $HOME' (as normal user so not as root / without sudo) Yes I'm running transmission, I never though it could be the SD, it isn't new but it's genuine for sure. I will try another micro sd asap. Thanks
chwe Posted September 26, 2017 Posted September 26, 2017 16 hours ago, Moklev said: Old installation: max 1 day (hangs... no reply from ssh or pihole web gui or motioneye web gui). I don't know how much resources are needed by pihole (never used it). But motion, motioneye and all these stuff should run stable on the downloadable armbian for the pi zero. During the dogafu experiment, I had this kind of stuff in mind to let my pi crashing... Just for some testings I let it run on a appropriate setting (good SD-Card & PSU), together with a samba server in the Background and mencoder to generate the time lapse videos. This setup lived for more than 2 weeks before I decided that I would do more senseful stuff first for the dogafu experiment. Setup: Armbian legacy 3.4.113 Ubuntu, Samsung Evo+ 32GB, 2A 5V PSU (microUSB), network over ETH, no case but also no cooler on SoC
tkaiser Posted September 26, 2017 Posted September 26, 2017 6 hours ago, Atgeek said: Yes I'm running transmission, I never though it could be the SD https://forum.armbian.com/index.php?/topic/838-best-budget-device-as-torrent-box/&do=findComment&comment=6427 TL;DR: transmission-daemon set up to write to storage that is insanely slow wrt RANDOM IO will slow the system that much down that it seems to hang. Whether that's the problem with your setup iozone will show. 6 hours ago, Atgeek said: the SD, it isn't new but it's genuine for sure How/why? Do a web search for 'SD4GB manfid oemid' for example. Everything with only zeroes in manfid/oemid is not genuine but counterfeit/broken or cheap crap in the first place (and if a 'genuine' SD card starts to 'forget' who the manufacturer was then I would assume that it soon also starts to forget the data written to it). 6 hours ago, Atgeek said: I will try another micro sd asap C'mon, I asked you to TEST your card for performance and whether it's ok (and of course provide the results). I gave you a link where it's explained how easy this is since it's really just doing this cd $HOME ; iozone -e -I -a -s 100M -r 4k -r 16k -r 512k -r 1024k -r 16384k -i 0 -i 1 -i 2 armbianmonitor -c $HOME We need these test results to get an idea whether we can ALWAYS assume that if your card shows horribly low random IO performance (which I assume, especially with 16K block size) then it's able to slow your system that much down that it seems to 'hang'. Since we need a way to avoid wasting our time with the same issues over and over again (SD card crappiness and power supply issues).
Atgeek Posted September 26, 2017 Author Posted September 26, 2017 Transmission daemon isn't running constantly, only when needed, and it download the file on an external hard drive. The hang occur even if transmission isn't enabled.
tkaiser Posted September 26, 2017 Posted September 26, 2017 In the meantime I'm thinking about modifying firstrun script in the following way: Check SD card info and when metadata is questionable, capacity below 8 GB or random IO 16K write IOPS below 25 immediately enter 'self destruction' mode, preventing Armbian from being installed. If SD card has passed the check then enter 120 seconds cpuburn stage to crash the installation on systems with insufficient power supplies. But that's obviously just me, nothing will change and leaving the forum is the better alternative
Recommended Posts