Jump to content

Help solving repeated crashes


slambert

Recommended Posts

I’m running ARMBIAN 5.38 stable on a Banana Pro and having the board go unresponsive regularly – meaning I can't login via SSH (getting a "port 22: Host is down" error) and applications stop running.

 

I’ve used the Banana Pro board with an external 4TB harddrive via eSATA as a media/backup server with a similar software set up for several years (running regular updates). Occasionally it would go non-responsive, but I’d generally be able to run it for weeks without a restart. The need for a restart became more frequent until now it won't run for more than 12 hours without trying to log in via ssh and seeing “port 22: Host is down” and the programs like plex and syncthing not running.

 

Here's what I've done so far:

 

I did some searching through various logs and wasn’t able to find anything that stood out to me. Though that may mean nothing – I have some experience but I'm self taught and haven't had to track down this kind of issue before.

 

Next, I tried to rule out a corrupted SD card or inadequate power supply. I tested the power supply when it was running under load and it seemed ok. Below are the results of the monitoring from armbian-config. I saw the CPU go up and the voltage stay fairly consistent so I figured it was fine.

Time        CPU    load %cpu %sys %usr %nice %io %irq   CPU   PMIC   DC-IN  C.St.

12:18:00:  960MHz  0.65  27%   7%   4%  13%   0%   0% 41.6°C 50.8°C   4.94V  0/6
12:18:06:  528MHz  0.60  46%  10%   1%  32%   0%   0% 41.5°C 50.4°C   4.96V  0/6
12:18:11:  528MHz  0.55  12%   8%   0%   2%   0%   0% 41.0°C 50.3°C   4.97V  0/6
12:18:17:  960MHz  0.50  45%  10%   0%  33%   0%   0% 41.8°C 50.6°C   4.94V  0/6
12:18:22:  528MHz  0.54  27%   9%   0%  18%   0%   0% 41.0°C 50.4°C   4.94V  0/6
12:18:28:  960MHz  0.90  88%  17%  17%  27%  23%   1% 42.0°C 50.7°C   4.94V  0/6
12:18:34:  960MHz  1.59  99%  12%  63%  15%   6%   0% 43.4°C 51.8°C   4.92V  0/6
12:18:40:  960MHz  1.54  89%  16%  48%  16%   6%   0% 43.9°C 51.9°C   4.93V  0/6
12:18:45:  960MHz  1.74  96%   8%  70%  17%   0%   0% 44.4°C 52.2°C   4.93V  0/6
12:18:51:  960MHz  1.92  96%   7%  67%  21%   0%   0% 44.6°C 52.3°C   4.93V  0/6
12:18:57:  960MHz  2.01  95%   6%  67%  21%   0%   0% 45.1°C 52.7°C   4.93V  0/6
12:19:03:  960MHz  2.01  94%   8%  65%  20%   0%   0% 45.1°C 52.9°C   4.93V  0/6
12:19:08:  960MHz  2.33  93%   7%  67%  18%   0%   0% 45.2°C 53.7°C   4.94V  0/6
12:19:14:  960MHz  2.59  96%   9%  71%  14%   0%   0% 45.7°C 53.9°C   4.93V  0/6
Time        CPU    load %cpu %sys %usr %nice %io %irq   CPU   PMIC   DC-IN  C.St.
12:19:20:  960MHz  2.78 100%   7%  75%  16%   0%   0% 45.7°C 53.7°C   4.93V  0/6
12:19:26:  960MHz  3.04 100%   7%  76%  16%   0%   0% 45.9°C 54.1°C   4.93V  0/6
12:19:33:  960MHz  3.27 100%   7%  78%  13%   0%   0% 46.1°C 54.4°C   4.94V  0/6
12:19:39:  960MHz  3.91 100%   8%  69%  22%   0%   0% 46.1°C 54.5°C   4.93V  0/6
12:19:45:  960MHz  4.32 100%   8%  62%  28%   0%   0% 46.2°C 54.6°C   4.93V  0/6
12:19:50:  960MHz  4.69 100%   7%  60%  30%   0%   0% 46.6°C 54.5°C   4.93V  0/6
12:19:56:  960MHz  5.04 100%   6%  63%  29%   0%   0% 46.4°C 54.8°C   4.93V  0/6
12:20:02:  960MHz  5.11 100%   7%  58%  32%   0%   0% 46.6°C 55.1°C   4.93V  0/6
12:20:08:  960MHz  5.35 100%   6%  59%  33%   0%   0% 46.8°C 55.2°C   4.93V  0/6
12:20:14:  960MHz  6.06 100%   6%  63%  29%   0%   0% 46.8°C 55.2°C   4.93V  0/6
12:20:21:  960MHz  5.98  99%  15%  52%  31%   0%   0% 46.5°C 55.2°C   4.94V  0/6
12:20:26:  960MHz  5.66  99%  11%   0%  87%   0%   0% 46.8°C 55.3°C   4.93V  0/6
12:20:32:  960MHz  5.44  99%  14%   0%  83%   0%   0% 47.0°C 55.5°C   4.93V  0/6
12:20:38:  960MHz  5.33  99%  13%   0%  85%   0%   0% 47.1°C 55.6°C   4.93V  0/6
12:20:43:  960MHz  5.30  99%  13%   0%  84%   0%   0% 46.8°C 55.7°C   4.93V  0/6
Time        CPU    load %cpu %sys %usr %nice %io %irq   CPU   PMIC   DC-IN  C.St.
12:20:49:  960MHz  5.02  98%  10%   0%  86%   0%   0% 47.0°C 55.7°C   4.94V  0/6
12:20:55:  960MHz  5.25  85%  14%   0%  70%   0%   0% 46.7°C 55.6°C   4.93V  0/6
12:21:00:  960MHz  4.83  44%   9%  11%  22%   1%   0% 44.8°C 54.4°C   4.96V  0/6
12:21:06:  864MHz  4.45  12%   8%   0%   3%   0%   0% 43.6°C 54.0°C   4.97V  0/6
12:21:12:  960MHz  4.17  12%   7%   0%   3%   0%   0% 43.0°C 53.7°C   4.97V  0/6
12:21:18:  720MHz  3.84  16%   8%   0%   6%   0%   0% 42.8°C 52.9°C   4.97V  0/6
12:21:23:  864MHz  3.77  17%   8%   0%   7%   0%   0% 42.5°C 52.8°C   4.97V  0/6
12:21:29:  960MHz  3.41  16%   9%   2%   3%   1%   0% 42.6°C 52.9°C   4.94V  0/6
12:21:35:  528MHz  3.14  36%  11%   0%  24%   0%   0% 42.2°C 52.5°C   4.97V  0/6
12:21:41:  720MHz  2.97  12%   8%   0%   3%   0%   0% 41.8°C 52.2°C   4.96V  0/6

To rule out the SD card, I created a partition on the external harddrive I use, and successfully moved the OS to the HDD. It’s run off the HD since. I hoped this would solve the issue, but I’m still seeing the board go un-responsive periodically.

 

I ran diagnostics in armbian-config. This is the most recent:

http://ix.io/DKd

 

And these are from previous days:

http://ix.io/15rL
http://ix.io/15rW

 

I'm wondering what would be causing the failures.

 

Are there logs I can post that would be helpful? Which? Anything in particular I can search for?

 

I moved the OS to the HDD, but I know the SD card is still used. Could that still cause these issues? Should I replace it entirely?

 

Thanks in advance for your help.

Link to comment
Share on other sites

26 minutes ago, slambert said:

Could that still cause these issues? Should I replace it entirely?


Try this - go to armbian-config, switch to beta, reboot and move to alternative kernel "dev".  Then report, if that makes any change.

Link to comment
Share on other sites

Thank you.

 

Ok, I assumed when you wrote switching to beta, you meant the same thing as switching to nightly builds. So I did that.

 

Then Rebooted.

 

I had to reinstall armbian-config.

 

I switched to dev. I noticed some error messages that flashed on the screen quickly. I installed sudo and wonder if that would cause any issues?

 

Upon rebooting the kernel isn't starting. I think it's not finding the SATA drive. I shot a photo of the screen at startup....

 

 

IMG_0916.jpg

Link to comment
Share on other sites

7 hours ago, slambert said:

I switched to dev. I noticed some error messages that flashed on the screen quickly.


If you are referring to "file not found /boot/dtb/overlay/-fixup.scr" ... this safe to ignore.

 

7 hours ago, slambert said:

Upon rebooting the kernel isn't starting.


This worries me more. I booted this kernel a few days ago on Cubietruck and it seems to work fine. Next step would be to build an image from scratch and see if it helps. We notice some troubles with a certain A20 board on 4.14.y kernel and this might be the issue. I hope this is solved on 4.16.y ...  even there is still a lot of work to get there. In a matter of hours, there will be a beta image in here: https://dl.armbian.com/bananapipro/nightly/ Try it.

Link to comment
Share on other sites

Ok, I will give this a try.

 

I'm assuming I'm doing this with a different SD card, just to see if the kernel starts, and thinking about salvaging my install/configuration later? If I'm wrong, let me know. Will get going on this in the meantime.

 

S

Link to comment
Share on other sites

1 hour ago, slambert said:

I'm assuming I'm doing this with a different SD card, just to see if the kernel starts, and thinking about salvaging my install/configuration later?


Yes, this image is not intended for deployment. For testing only so rather install it on another SD card and try to crash the board. If it does, describe what you did and try to get some logs if possible. 

Link to comment
Share on other sites

4 hours ago, slambert said:

What’s next?


- adjust the rest of the patches/drivers https://github.com/armbian/build/tree/development/patch/kernel/sunxi-dev/unresolved

- test many other boards

- find and fix problems

- test them again

- find and fix problems

- release an update 


Month(s) of work for a small team to put out a stable release.  I already find out that some boards do not boot at all with this kernel.

If this kernel works for you, freeze it with armbian-config to prevent overwriting/upgrading and unfreeze and move to "next" branch, unfreeze and update again until something better / stable build is released. In case it still crashes, please report.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines