Jump to content

Orange pi zero2+ random hangs


fabbronet

Recommended Posts

Hi everyone, 

 

I have an armbian 5.38 headless system on an orange pi zero2plus board.  Attached to this board there's the expansion usb board with a wifi dongle and another board connected to twi interface of the gpio header. I power supply the board from the same header with a 5V 5A power supply, I verified and I have no holes on 5V and 3.3V voltage. 

The external usb dongle is configured as AP with nmcli, the internal wifi board is configured as w-station and is hooked to my home network. On this system is running a python-flask-uwsgi "daemon" that I'm developing. 

Well, randomly my system hangs, I can't connect to my flask web page, I can't connect via ssh from putty, if I plug an usb cable my windows system says that device is not recognized so I can't access from serial port with ssh too. The only way is cycle power and start again. 

I've tried without the usb dongle, with other PSUs and I checked the ssd integrity...everything seems fine.  This can happen one hour or 30 hours after boot, it's random.

If I look to my journalctl I can't find anything beacuse it flushes every startup so I looked at my syslog, here is an extract of what happens:

 

Quote

Apr 14 05:34:58 grace dnsmasq-dhcp[1201]: DHCPREQUEST(wlx00304f8b17e8) 10.42.0.240 5c:cf:7f:c6:da:52
Apr 14 05:34:58 grace dnsmasq-dhcp[1201]: DHCPACK(wlx00304f8b17e8) 10.42.0.240 5c:cf:7f:c6:da:52 ESP_C6DA52
Apr 14 05:35:01 grace CRON[3549]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Apr 14 05:45:01 grace CRON[3613]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Apr 14 05:55:01 grace CRON[3670]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Apr 14 06:02:11 grace dnsmasq-dhcp[1201]: DHCPREQUEST(wlx00304f8b17e8) 10.42.0.240 5c:cf:7f:c6:da:52
Apr 14 06:02:11 grace dnsmasq-dhcp[1201]: DHCPACK(wlx00304f8b17e8) 10.42.0.240 5c:cf:7f:c6:da:52 ESP_C6DA52
Apr 14 06:05:01 grace CRON[3719]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Apr 14 06:15:01 grace CRON[3778]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Apr 14 06:17:01 grace CRON[3788]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Apr 14 06:17:37 grace systemd[1]: Starting Daily apt upgrade and clean activities...
Apr 14 06:18:16 grace systemd[1]: Started Daily apt upgrade and clean activities.
Apr 14 06:25:01 grace CRON[3902]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))
Apr 14 06:25:01 grace CRON[3901]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)   <<<<----- above here normal operations

--->>>>>>>>>####################here something happened, log is muted and restarts when I reboot the system!!!!!!!!!!!!!!!!!!!!!!###############################
Apr 14 07:17:05 grace rsyslogd: [origin software="rsyslogd" swVersion="8.16.0" x-pid="613" x-info="http://www.rsyslog.com"] start   <<<<-----this is a fake timestamp, the real hour is about 1 pm when power cycled, from here the bootstrap
Apr 14 07:17:05 grace systemd-modules-load[290]: Inserted module 'brcmfmac'
Apr 14 07:17:05 grace systemd-modules-load[290]: Inserted module 'g_serial'
Apr 14 07:17:05 grace systemd-modules-load[290]: Module 'i2c_dev' is builtin
Apr 14 07:17:05 grace systemd-sysctl[316]: Couldn't write '1' to 'kernel/yama/ptrace_scope', ignoring: No such file or directory
Apr 14 07:17:05 grace fake-hwclock[275]: Sat Apr 14 05:17:01 UTC 2018
Apr 14 07:17:05 grace systemd-udevd[349]: Process '/sbin/crda' failed with exit code 249.   <<<<----- this is a long story, see the message P.S. 
Apr 14 07:17:05 grace systemd[1]: Starting Flush Journal to Persistent Storage...
Apr 14 07:17:05 grace rsyslogd: rsyslogd's groupid changed to 108
Apr 14 07:17:05 grace rsyslogd: rsyslogd's userid changed to 104
Apr 14 07:17:05 grace loadkeys[278]: Loading /etc/console-setup/cached.kmap.gz
Apr 14 07:17:05 grace haveged[367]: haveged: ver: 1.9.1; arch: generic; vend: ; build: (gcc 5.2.1 CTV); collect: 128K
Apr 14 07:17:05 grace haveged[367]: haveged: cpu: (VC); data: 16K (D); inst: 16K (D); idx: 8/40; sz: 15748/76756
Apr 14 07:17:05 grace haveged[367]: haveged: tot tests(BA8): A:1/1 B:1/1 continuous tests(B):  last entropy estimate 7.9947
Apr 14 07:17:05 grace haveged[367]: haveged: fills: 0, generated: 0
Apr 14 07:17:05 grace log2ram[372]: sending incremental file list

 

If i look to my python script log something curious happens meanwhile, my scipt continues running until 7.35am (70 minutes after my log is muted) than stops without errors, it mutes like the syslog 

 

Here are my system infos:

 

Quote

Last login: Sat Apr 14 02:00:08 CEST 2018 from 192.168.123.150 on pts/0
  ___  ____  _   _____                ____  _             ____
 / _ \|  _ \(_) |__  /___ _ __ ___   |  _ \| |_   _ ___  |___ \
| | | | |_) | |   / // _ \ '__/ _ \  | |_) | | | | / __|   __) |
| |_| |  __/| |  / /|  __/ | | (_) | |  __/| | |_| \__ \  / __/
 \___/|_|   |_| /____\___|_|  \___/  |_|   |_|\__,_|___/ |_____|


Welcome to ARMBIAN 5.38 stable Ubuntu 16.04.3 LTS 4.14.14-sunxi64
System load:   0.13 0.15 0.15   Up time:       57 min
Memory usage:  21 % of 482MB    IP:            192.168.123.150 10.42.0.1
CPU temp:      34°C
Usage of /:    56% of 7.0G

[ 0 security updates available, 53 updates total: apt upgrade ]
Last check: 2018-04-14 12:17

 

 

Here is my armbianmonitor -c "$HOME" output:

 

Quote


Starting to fill /dev/mmcblk0p1 with test patterns, please be patient this might take a very long time
Free space: 3.11 GB
Creating file 1.h2w ... OK!
Creating file 2.h2w ... OK!
Creating file 3.h2w ... OK!
Creating file 4.h2w ... OK!
Free space: 102.91 MB
Average writing speed: 2.22 MB/s

Now verifying the written data:
                  SECTORS      ok/corrupted/changed/overwritten
Validating file 1.h2w ... 2097152/        0/      0/      0
Validating file 2.h2w ... 2097152/        0/      0/      0
Validating file 3.h2w ... 2097152/        0/      0/      0
Validating file 4.h2w ...   16400/        0/      0/      0

  Data OK: 3.01 GB (6307856 sectors)
Data LOST: 0.00 Byte (0 sectors)
               Corrupted: 0.00 Byte (0 sectors)
        Slightly changed: 0.00 Byte (0 sectors)
             Overwritten: 0.00 Byte (0 sectors)
Average reading speed: 20.39 MB/s

Starting iozone tests. Be patient, this can take a very long time to complete:
        Iozone: Performance Test of File I/O
                Version $Revision: 3.429 $
                Compiled for 64 bit mode.
                Build: linux

        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
                     Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
                     Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
                     Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
                     Vangel Bojaxhi, Ben England, Vikentsi Lapa.

        Run began: Sat Apr 14 14:32:19 2018

        Include fsync in write timing
        O_DIRECT feature enabled
        Auto Mode
        File size set to 102400 kB
        Record Size 4 kB
        Record Size 512 kB
        Record Size 16384 kB
        Command line used: iozone -e -I -a -s 100M -r 4k -r 512k -r 16M -i 0 -i 1 -i 2
        Output is in kBytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 kBytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                              random    random     bkwd    record    stride
              kB  reclen    write  rewrite    read    reread    read     write     read   rewrite      read   fwrite frewrite    fread  freread
          102400       4      995      975     6489     5785     4779      160  
          102400     512     4021     3632    22353    22398    15920      530  
          102400   16384     4319     4312    22859    22859    22844     4531  

iozone test complete.

The results from testing /dev/mmcblk0p1 (ext4):
  Data OK: 3.01 GB (6307856 sectors)
Data LOST: 0.00 Byte (0 sectors)
Average writing speed: 2.22 MB/s
Average reading speed: 20.39 MB/s
                                            random    random
reclen    write  rewrite    read    reread    read     write
     4      995      975     6489     5785     4779      160                    
   512     4021     3632    22353    22398    15920      530                    
 16384     4319     4312    22859    22859    22844     4531                    

Health summary: OK

Performance summary:
Sequential reading speed: 20.39 MB/s
 4K random reading speed:  4779 KB/s
Sequential writing speed:  2.22 MB/s (way too low)
 4K random writing speed:   160 KB/s (way too low)

The device you tested seems to perform too slow to be used with Armbian.
This applies especially to desktop images where slow storage is responsible
for sluggish behaviour. If you want to have fun with your device do NOT use
this media to put the OS image or the user homedirs on.

To interpret the results above correctly or search for better storage
alternatives please refer to http://oss.digirati.com.br/f3/ and also
http://www.jeffgeerling.com/blogs/jeff-geerling/raspberry-pi-microsd-card
and http://thewirecutter.com/reviews/best-microsd-card/
 

 

 

 

 

PS: about the long story

I'm in Italy, it's great we have sea, sun and good meals but we also have wifi routers that use channels 12 and 13. My router has the "IT"  as wifi regulation policy. So when my router that has the "auto" channel setting, switch to a channel forbidden to orange I loose connection and AP discovery. On orange country is not defined, i found it with:

 

Quote

iw reg get

country 00 

 

So I tried with:

Quote

iw reg set IT

 

It works...until reboot then it goes again to 00. 

The workaround I found was to modify /etc/default/crda with REGULATION=IT

 

from that moment I got that error....

I know that if i set the router to work on a fixed channel it works anyway but I can't go this way.

 

thanks a lot for all of your work and efforts!

 

 

 

Link to comment
Share on other sites

Just a daily update:

 

I checked voltages on the board testpoints with a digital oscilloscope and they're ok, now my suspects are on the external wifi dongle mounted on the breakout board, the board becomes hotter than the cpu (that's quite cool anyway) so maybe is the breakout board design that can't withstand currents drawn by the usb dongle generating a kind of "brown-out" on the main 5V voltage or other weird behaviours. Second suspect is on the micro-sd on wich I'm running the os, I'll give a try to nand-sata-install to see if something change.

Today I verified also connections from my custom board and the orange pi board and everything is ok, now I'm trying running my script with usb-wifi dongle unplugged, if everything works fine for a couple of days (that never happened) is that my problem, if it resets randomly again I'll try with the emmc install. 

 

By the way, today I also mounted a 200uF 35V electrolytic capacitor on the 5V line, if my problems are from power glitches this solution could fit. 

 

Any help is greatly appreciated! :)

Link to comment
Share on other sites

ok, considering the large number of followers and repliers to my post above I think that is time to give you an update :)

 

In the last two days I spent about 30 hours on log digging, test and errors and so on...I'm quite tired of this board.  System keeps resetting randomly:

a ) without the usb breakout board

b ) without the external wifi dongle

c ) powered from usb

d ) powered from GPIO

d) with another brand new sd card with a copy of the previous one

e) with different power supplies (included a bench 0-30V 0-10A psu)

f) with a complete armbian upgrade from 5.38 to 5.41 

 

The only thing that I found is that resets happens most frequently when I'm using an external mongodb client to watch my DB data but from logs I found nothing that point on this. 

So...I took out the original mongodb installed from repositories (it's old and not mantained as written on mongodb site) and I installed a brand new 3.6.3 version. 

Now my opi is on and I'm at the 9th hour of uptime but without the external wifi dongle. 

 

Now my last three options are:

1) lower the cpu frequency that is almost nailed at 1152Mhz to maybe 816Mhz as I read on this forum to increase kernel stability...is this correct?

2) try another spare board...maybe this one is someway defective

3) try to put the system on the emmc but I already tried and the process failed, probably because I'm trying to put an 8gb sd on an 8gb mmc, I'll try to shrink my sd with gparted and repeat the nand-sata-install...

 

After these three options, if nothing change I'll switch to a raspberry platform that costs more but seems more stable. I'm a first-hour raspberry user and I never had problems like these, under my desktop there's a raspberry up since 2014 so I think that If I'm looking for a reliable solution it's better to switch on rpi. 

 

Looking the whole experience as a "street-man" user I can't understand why sunxi put so much effort on building a complex SoC hardware and didn't support it with an adequate system distro like raspberry did with raspbian...in my view this is a big handicap that surely will cut orange market share. 

I appreciate a lot the armbian's team work but I know that if the whole thing is done freely in your spare time within a small community, the firmware-software support will be always difficult.  

 

Stay tuned! cheers

 

  

Link to comment
Share on other sites

It crashed while I was writing! 

System is still up but network interface is unreachable, I tried with serial and it's very slow. I managed to enter and htop says tath it's all ok but if I look at my journalctl there's something new:

Quote

Apr 18 15:11:46 grace.local kernel: [<ffff00000896aea0>] schedule_timeout+0x160/
Apr 18 15:11:46 grace.local kernel: [<ffff000008106f6c>] rcu_gp_kthread+0x4ac/0x
Apr 18 15:11:46 grace.local kernel: [<ffff0000080cbbd4>] kthread+0x12c/0x130
Apr 18 15:11:46 grace.local kernel: [<ffff000008084290>] ret_from_fork+0x10/0x18
Apr 18 15:12:07 grace.local kernel: INFO: rcu_sched detected stalls on CPUs/task
Apr 18 15:12:07 grace.local kernel:         3-...: (0 ticks this GP) idle=a3c/0/
Apr 18 15:12:07 grace.local kernel:         (detected by 0, t=5252 jiffies, g=22
Apr 18 15:12:07 grace.local kernel: Task dump for CPU 3:
Apr 18 15:12:07 grace.local kernel: swapper/3       R  running task        0
Apr 18 15:12:07 grace.local kernel: Call trace:
Apr 18 15:12:07 grace.local kernel: [<ffff000008084e48>] __switch_to+0x98/0xb0
Apr 18 15:12:07 grace.local kernel: [<ffff000008c38000>] 0xffff000008c38000
Apr 18 15:12:07 grace.local kernel: rcu_sched kthread starved for 5252 jiffies!
Apr 18 15:12:07 grace.local kernel: rcu_sched       I    0     8      2 0x000000
Apr 18 15:12:07 grace.local kernel: Call trace:
Apr 18 15:12:07 grace.local kernel: [<ffff000008084e48>] __switch_to+0x98/0xb0
Apr 18 15:12:07 grace.local kernel: [<ffff0000089673d4>] __schedule+0x19c/0x5a0
Apr 18 15:12:07 grace.local kernel: [<ffff0000089677fc>] schedule+0x24/0x80
Apr 18 15:12:07 grace.local kernel: [<ffff00000896aea0>] schedule_timeout+0x160/
Apr 18 15:12:07 grace.local kernel: [<ffff000008106f6c>] rcu_gp_kthread+0x4ac/0x
Apr 18 15:12:07 grace.local kernel: [<ffff0000080cbbd4>] kthread+0x12c/0x130
Apr 18 15:12:07 grace.local kernel: [<ffff000008084290>] ret_from_fork+0x10/0x18
 ESC[C
ff0000089677fc>] schedule+0x24/0x80
ff00000896aea0>] schedule_timeout+0x160/0x280
ff000008106f6c>] rcu_gp_kthread+0x4ac/0x740
ff0000080cbbd4>] kthread+0x12c/0x130
ff000008084290>] ret_from_fork+0x10/0x18
: rcu_sched detected stalls on CPUs/tasks:
    3-...: (0 ticks this GP) idle=a3c/0/0 softirq=218752/218752 fqs=0
    (detected by 0, t=5252 jiffies, g=223025, c=223024, q=8686)
 dump for CPU 3:
per/3       R  running task        0     0      1 0x00000000
 trace:
ff000008084e48>] __switch_to+0x98/0xb0
ff000008c38000>] 0xffff000008c38000
sched kthread starved for 5252 jiffies! g223025 c223024 f0x0 RCU_GP_WAIT_FQS(3)
sched       I    0     8      2 0x00000000

 

Can someone tell me if these lines that I found are someway related to my problem? 

Now I lowered the CPU frequency to max 816MHz ad started up again....yeee 

 

 

 

 

Link to comment
Share on other sites

Hi @teenytinycactus, I lowered maximum frequency down to 816MHz and I didn't experienced any random crash from that moment. I think that lowering frequency did the trick but in the same "moment" I found some crashes of mongodb server that's installed on my system. I think that mongo was not the problem but that's it. At this time I'm fighting with log2ram that fills up to 100% messing up everything on reboot because most services see that as a write protection for their own log files, actually I don't know if tweak the logrotate or the log2ram maximum storage dimension....any advice?  

Link to comment
Share on other sites

On 5/7/2018 at 11:58 PM, fabbronet said:

Hi @teenytinycactus, I lowered maximum frequency down to 816MHz and I didn't experienced any random crash from that moment. I think that lowering frequency did the trick but in the same "moment" I found some crashes of mongodb server that's installed on my system. I think that mongo was not the problem but that's it. At this time I'm fighting with log2ram that fills up to 100% messing up everything on reboot because most services see that as a write protection for their own log files, actually I don't know if tweak the logrotate or the log2ram maximum storage dimension....any advice?  

Sorry I don't really know much at all about those issues.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines