fabbronet Posted April 14, 2018 Posted April 14, 2018 Hi everyone, I have an armbian 5.38 headless system on an orange pi zero2plus board. Attached to this board there's the expansion usb board with a wifi dongle and another board connected to twi interface of the gpio header. I power supply the board from the same header with a 5V 5A power supply, I verified and I have no holes on 5V and 3.3V voltage. The external usb dongle is configured as AP with nmcli, the internal wifi board is configured as w-station and is hooked to my home network. On this system is running a python-flask-uwsgi "daemon" that I'm developing. Well, randomly my system hangs, I can't connect to my flask web page, I can't connect via ssh from putty, if I plug an usb cable my windows system says that device is not recognized so I can't access from serial port with ssh too. The only way is cycle power and start again. I've tried without the usb dongle, with other PSUs and I checked the ssd integrity...everything seems fine. This can happen one hour or 30 hours after boot, it's random. If I look to my journalctl I can't find anything beacuse it flushes every startup so I looked at my syslog, here is an extract of what happens: Quote Apr 14 05:34:58 grace dnsmasq-dhcp[1201]: DHCPREQUEST(wlx00304f8b17e8) 10.42.0.240 5c:cf:7f:c6:da:52 Apr 14 05:34:58 grace dnsmasq-dhcp[1201]: DHCPACK(wlx00304f8b17e8) 10.42.0.240 5c:cf:7f:c6:da:52 ESP_C6DA52 Apr 14 05:35:01 grace CRON[3549]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Apr 14 05:45:01 grace CRON[3613]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Apr 14 05:55:01 grace CRON[3670]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Apr 14 06:02:11 grace dnsmasq-dhcp[1201]: DHCPREQUEST(wlx00304f8b17e8) 10.42.0.240 5c:cf:7f:c6:da:52 Apr 14 06:02:11 grace dnsmasq-dhcp[1201]: DHCPACK(wlx00304f8b17e8) 10.42.0.240 5c:cf:7f:c6:da:52 ESP_C6DA52 Apr 14 06:05:01 grace CRON[3719]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Apr 14 06:15:01 grace CRON[3778]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Apr 14 06:17:01 grace CRON[3788]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Apr 14 06:17:37 grace systemd[1]: Starting Daily apt upgrade and clean activities... Apr 14 06:18:16 grace systemd[1]: Started Daily apt upgrade and clean activities. Apr 14 06:25:01 grace CRON[3902]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )) Apr 14 06:25:01 grace CRON[3901]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) <<<<----- above here normal operations --->>>>>>>>>####################here something happened, log is muted and restarts when I reboot the system!!!!!!!!!!!!!!!!!!!!!!############################### Apr 14 07:17:05 grace rsyslogd: [origin software="rsyslogd" swVersion="8.16.0" x-pid="613" x-info="http://www.rsyslog.com"] start <<<<-----this is a fake timestamp, the real hour is about 1 pm when power cycled, from here the bootstrap Apr 14 07:17:05 grace systemd-modules-load[290]: Inserted module 'brcmfmac' Apr 14 07:17:05 grace systemd-modules-load[290]: Inserted module 'g_serial' Apr 14 07:17:05 grace systemd-modules-load[290]: Module 'i2c_dev' is builtin Apr 14 07:17:05 grace systemd-sysctl[316]: Couldn't write '1' to 'kernel/yama/ptrace_scope', ignoring: No such file or directory Apr 14 07:17:05 grace fake-hwclock[275]: Sat Apr 14 05:17:01 UTC 2018 Apr 14 07:17:05 grace systemd-udevd[349]: Process '/sbin/crda' failed with exit code 249. <<<<----- this is a long story, see the message P.S. Apr 14 07:17:05 grace systemd[1]: Starting Flush Journal to Persistent Storage... Apr 14 07:17:05 grace rsyslogd: rsyslogd's groupid changed to 108 Apr 14 07:17:05 grace rsyslogd: rsyslogd's userid changed to 104 Apr 14 07:17:05 grace loadkeys[278]: Loading /etc/console-setup/cached.kmap.gz Apr 14 07:17:05 grace haveged[367]: haveged: ver: 1.9.1; arch: generic; vend: ; build: (gcc 5.2.1 CTV); collect: 128K Apr 14 07:17:05 grace haveged[367]: haveged: cpu: (VC); data: 16K (D); inst: 16K (D); idx: 8/40; sz: 15748/76756 Apr 14 07:17:05 grace haveged[367]: haveged: tot tests(BA8): A:1/1 B:1/1 continuous tests(B): last entropy estimate 7.9947 Apr 14 07:17:05 grace haveged[367]: haveged: fills: 0, generated: 0 Apr 14 07:17:05 grace log2ram[372]: sending incremental file list If i look to my python script log something curious happens meanwhile, my scipt continues running until 7.35am (70 minutes after my log is muted) than stops without errors, it mutes like the syslog Here are my system infos: Quote Last login: Sat Apr 14 02:00:08 CEST 2018 from 192.168.123.150 on pts/0 ___ ____ _ _____ ____ _ ____ / _ \| _ \(_) |__ /___ _ __ ___ | _ \| |_ _ ___ |___ \ | | | | |_) | | / // _ \ '__/ _ \ | |_) | | | | / __| __) | | |_| | __/| | / /| __/ | | (_) | | __/| | |_| \__ \ / __/ \___/|_| |_| /____\___|_| \___/ |_| |_|\__,_|___/ |_____| Welcome to ARMBIAN 5.38 stable Ubuntu 16.04.3 LTS 4.14.14-sunxi64 System load: 0.13 0.15 0.15 Up time: 57 min Memory usage: 21 % of 482MB IP: 192.168.123.150 10.42.0.1 CPU temp: 34°C Usage of /: 56% of 7.0G [ 0 security updates available, 53 updates total: apt upgrade ] Last check: 2018-04-14 12:17 Here is my armbianmonitor -c "$HOME" output: Quote Starting to fill /dev/mmcblk0p1 with test patterns, please be patient this might take a very long time Free space: 3.11 GB Creating file 1.h2w ... OK! Creating file 2.h2w ... OK! Creating file 3.h2w ... OK! Creating file 4.h2w ... OK! Free space: 102.91 MB Average writing speed: 2.22 MB/s Now verifying the written data: SECTORS ok/corrupted/changed/overwritten Validating file 1.h2w ... 2097152/ 0/ 0/ 0 Validating file 2.h2w ... 2097152/ 0/ 0/ 0 Validating file 3.h2w ... 2097152/ 0/ 0/ 0 Validating file 4.h2w ... 16400/ 0/ 0/ 0 Data OK: 3.01 GB (6307856 sectors) Data LOST: 0.00 Byte (0 sectors) Corrupted: 0.00 Byte (0 sectors) Slightly changed: 0.00 Byte (0 sectors) Overwritten: 0.00 Byte (0 sectors) Average reading speed: 20.39 MB/s Starting iozone tests. Be patient, this can take a very long time to complete: Iozone: Performance Test of File I/O Version $Revision: 3.429 $ Compiled for 64 bit mode. Build: linux Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone, Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root, Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer, Vangel Bojaxhi, Ben England, Vikentsi Lapa. Run began: Sat Apr 14 14:32:19 2018 Include fsync in write timing O_DIRECT feature enabled Auto Mode File size set to 102400 kB Record Size 4 kB Record Size 512 kB Record Size 16384 kB Command line used: iozone -e -I -a -s 100M -r 4k -r 512k -r 16M -i 0 -i 1 -i 2 Output is in kBytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 kBytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride kB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 102400 4 995 975 6489 5785 4779 160 102400 512 4021 3632 22353 22398 15920 530 102400 16384 4319 4312 22859 22859 22844 4531 iozone test complete. The results from testing /dev/mmcblk0p1 (ext4): Data OK: 3.01 GB (6307856 sectors) Data LOST: 0.00 Byte (0 sectors) Average writing speed: 2.22 MB/s Average reading speed: 20.39 MB/s random random reclen write rewrite read reread read write 4 995 975 6489 5785 4779 160 512 4021 3632 22353 22398 15920 530 16384 4319 4312 22859 22859 22844 4531 Health summary: OK Performance summary: Sequential reading speed: 20.39 MB/s 4K random reading speed: 4779 KB/s Sequential writing speed: 2.22 MB/s (way too low) 4K random writing speed: 160 KB/s (way too low) The device you tested seems to perform too slow to be used with Armbian. This applies especially to desktop images where slow storage is responsible for sluggish behaviour. If you want to have fun with your device do NOT use this media to put the OS image or the user homedirs on. To interpret the results above correctly or search for better storage alternatives please refer to http://oss.digirati.com.br/f3/ and alsohttp://www.jeffgeerling.com/blogs/jeff-geerling/raspberry-pi-microsd-card and http://thewirecutter.com/reviews/best-microsd-card/ PS: about the long story I'm in Italy, it's great we have sea, sun and good meals but we also have wifi routers that use channels 12 and 13. My router has the "IT" as wifi regulation policy. So when my router that has the "auto" channel setting, switch to a channel forbidden to orange I loose connection and AP discovery. On orange country is not defined, i found it with: Quote iw reg get country 00 So I tried with: Quote iw reg set IT It works...until reboot then it goes again to 00. The workaround I found was to modify /etc/default/crda with REGULATION=IT from that moment I got that error.... I know that if i set the router to work on a fixed channel it works anyway but I can't go this way. thanks a lot for all of your work and efforts!
fabbronet Posted April 16, 2018 Author Posted April 16, 2018 Just a daily update: I checked voltages on the board testpoints with a digital oscilloscope and they're ok, now my suspects are on the external wifi dongle mounted on the breakout board, the board becomes hotter than the cpu (that's quite cool anyway) so maybe is the breakout board design that can't withstand currents drawn by the usb dongle generating a kind of "brown-out" on the main 5V voltage or other weird behaviours. Second suspect is on the micro-sd on wich I'm running the os, I'll give a try to nand-sata-install to see if something change. Today I verified also connections from my custom board and the orange pi board and everything is ok, now I'm trying running my script with usb-wifi dongle unplugged, if everything works fine for a couple of days (that never happened) is that my problem, if it resets randomly again I'll try with the emmc install. By the way, today I also mounted a 200uF 35V electrolytic capacitor on the 5V line, if my problems are from power glitches this solution could fit. Any help is greatly appreciated!
fabbronet Posted April 18, 2018 Author Posted April 18, 2018 ok, considering the large number of followers and repliers to my post above I think that is time to give you an update In the last two days I spent about 30 hours on log digging, test and errors and so on...I'm quite tired of this board. System keeps resetting randomly: a ) without the usb breakout board b ) without the external wifi dongle c ) powered from usb d ) powered from GPIO d) with another brand new sd card with a copy of the previous one e) with different power supplies (included a bench 0-30V 0-10A psu) f) with a complete armbian upgrade from 5.38 to 5.41 The only thing that I found is that resets happens most frequently when I'm using an external mongodb client to watch my DB data but from logs I found nothing that point on this. So...I took out the original mongodb installed from repositories (it's old and not mantained as written on mongodb site) and I installed a brand new 3.6.3 version. Now my opi is on and I'm at the 9th hour of uptime but without the external wifi dongle. Now my last three options are: 1) lower the cpu frequency that is almost nailed at 1152Mhz to maybe 816Mhz as I read on this forum to increase kernel stability...is this correct? 2) try another spare board...maybe this one is someway defective 3) try to put the system on the emmc but I already tried and the process failed, probably because I'm trying to put an 8gb sd on an 8gb mmc, I'll try to shrink my sd with gparted and repeat the nand-sata-install... After these three options, if nothing change I'll switch to a raspberry platform that costs more but seems more stable. I'm a first-hour raspberry user and I never had problems like these, under my desktop there's a raspberry up since 2014 so I think that If I'm looking for a reliable solution it's better to switch on rpi. Looking the whole experience as a "street-man" user I can't understand why sunxi put so much effort on building a complex SoC hardware and didn't support it with an adequate system distro like raspberry did with raspbian...in my view this is a big handicap that surely will cut orange market share. I appreciate a lot the armbian's team work but I know that if the whole thing is done freely in your spare time within a small community, the firmware-software support will be always difficult. Stay tuned! cheers
fabbronet Posted April 18, 2018 Author Posted April 18, 2018 It crashed while I was writing! System is still up but network interface is unreachable, I tried with serial and it's very slow. I managed to enter and htop says tath it's all ok but if I look at my journalctl there's something new: Quote Apr 18 15:11:46 grace.local kernel: [<ffff00000896aea0>] schedule_timeout+0x160/ Apr 18 15:11:46 grace.local kernel: [<ffff000008106f6c>] rcu_gp_kthread+0x4ac/0x Apr 18 15:11:46 grace.local kernel: [<ffff0000080cbbd4>] kthread+0x12c/0x130 Apr 18 15:11:46 grace.local kernel: [<ffff000008084290>] ret_from_fork+0x10/0x18 Apr 18 15:12:07 grace.local kernel: INFO: rcu_sched detected stalls on CPUs/task Apr 18 15:12:07 grace.local kernel: 3-...: (0 ticks this GP) idle=a3c/0/ Apr 18 15:12:07 grace.local kernel: (detected by 0, t=5252 jiffies, g=22 Apr 18 15:12:07 grace.local kernel: Task dump for CPU 3: Apr 18 15:12:07 grace.local kernel: swapper/3 R running task 0 Apr 18 15:12:07 grace.local kernel: Call trace: Apr 18 15:12:07 grace.local kernel: [<ffff000008084e48>] __switch_to+0x98/0xb0 Apr 18 15:12:07 grace.local kernel: [<ffff000008c38000>] 0xffff000008c38000 Apr 18 15:12:07 grace.local kernel: rcu_sched kthread starved for 5252 jiffies! Apr 18 15:12:07 grace.local kernel: rcu_sched I 0 8 2 0x000000 Apr 18 15:12:07 grace.local kernel: Call trace: Apr 18 15:12:07 grace.local kernel: [<ffff000008084e48>] __switch_to+0x98/0xb0 Apr 18 15:12:07 grace.local kernel: [<ffff0000089673d4>] __schedule+0x19c/0x5a0 Apr 18 15:12:07 grace.local kernel: [<ffff0000089677fc>] schedule+0x24/0x80 Apr 18 15:12:07 grace.local kernel: [<ffff00000896aea0>] schedule_timeout+0x160/ Apr 18 15:12:07 grace.local kernel: [<ffff000008106f6c>] rcu_gp_kthread+0x4ac/0x Apr 18 15:12:07 grace.local kernel: [<ffff0000080cbbd4>] kthread+0x12c/0x130 Apr 18 15:12:07 grace.local kernel: [<ffff000008084290>] ret_from_fork+0x10/0x18 ESC[C ff0000089677fc>] schedule+0x24/0x80 ff00000896aea0>] schedule_timeout+0x160/0x280 ff000008106f6c>] rcu_gp_kthread+0x4ac/0x740 ff0000080cbbd4>] kthread+0x12c/0x130 ff000008084290>] ret_from_fork+0x10/0x18 : rcu_sched detected stalls on CPUs/tasks: 3-...: (0 ticks this GP) idle=a3c/0/0 softirq=218752/218752 fqs=0 (detected by 0, t=5252 jiffies, g=223025, c=223024, q=8686) dump for CPU 3: per/3 R running task 0 0 1 0x00000000 trace: ff000008084e48>] __switch_to+0x98/0xb0 ff000008c38000>] 0xffff000008c38000 sched kthread starved for 5252 jiffies! g223025 c223024 f0x0 RCU_GP_WAIT_FQS(3) sched I 0 8 2 0x00000000 Can someone tell me if these lines that I found are someway related to my problem? Now I lowered the CPU frequency to max 816MHz ad started up again....yeee
teenytinycactus Posted May 4, 2018 Posted May 4, 2018 Hi @fabbronet, I think you will find your issue is related to this: Have you tried setting the CPU frequency to 800MHz and then try to crash it?
fabbronet Posted May 7, 2018 Author Posted May 7, 2018 Hi @teenytinycactus, I lowered maximum frequency down to 816MHz and I didn't experienced any random crash from that moment. I think that lowering frequency did the trick but in the same "moment" I found some crashes of mongodb server that's installed on my system. I think that mongo was not the problem but that's it. At this time I'm fighting with log2ram that fills up to 100% messing up everything on reboot because most services see that as a write protection for their own log files, actually I don't know if tweak the logrotate or the log2ram maximum storage dimension....any advice?
teenytinycactus Posted May 14, 2018 Posted May 14, 2018 On 5/7/2018 at 11:58 PM, fabbronet said: Hi @teenytinycactus, I lowered maximum frequency down to 816MHz and I didn't experienced any random crash from that moment. I think that lowering frequency did the trick but in the same "moment" I found some crashes of mongodb server that's installed on my system. I think that mongo was not the problem but that's it. At this time I'm fighting with log2ram that fills up to 100% messing up everything on reboot because most services see that as a write protection for their own log files, actually I don't know if tweak the logrotate or the log2ram maximum storage dimension....any advice? Sorry I don't really know much at all about those issues.
Recommended Posts