Jump to content

How to provide and interpret Debug output


tkaiser

Recommended Posts

Armbian implements some basic 'system logging' at every startup and shutdown and contains a little utility to provide this collected information combined with some more useful debug info from user installations. All that's needed is executing 'sudo armbianmonitor -u' (or armbian-config --> Software --> Diagnostics) and then all this support related information is uploaded to an online pasteboard service automagically (see example output

 

e2f9f922661bb41d0d8d14a7c5d3ec151fc8a2fb.png

 

How to interpret this wall of text? The output is best read from bottom to top since the most important information is collected during upload:

 

  • At the bottom /proc/interrupts contents to check for IRQ affinity problems (interrupt collissions on some CPU cores negatively affecting performance)
  • Then last 250 lines dmesg output are included. Here you might find important information wrt the last (kernel) events that happened on the machine
  • uptime output including average load statistics (1, 5 and 15 min)
  • free output telling how much physical memory is available and how much swap and/or zram is used (you need to look directly above whether zram is active or not to interpret the 'Swap:' line)
  • vmstat output contains virtual memory usage information since last reboot
  • iostat output contains the same but allows for a 'per device' view since all devices are listed with individual statistics (so it's easy to spot IO bottlenecks by looking at these numbers and also looking at %iowait value)
  • 'Current system health' displays what the system is actually doing while uploading the debug log (on systems where DC-IN monitoring is available also allowing for underpowering diagnosis -- if you read here numbers below 5.0V stop reading the log and tell the user to fix his underpowering issues first)
  • In case the installation has been moved from SD card to other storage nand-sata-install.log will be included in the output
  • 'Loaded modules' allow to look for module related problems
  • If it's an Allwinner board running legacy kernel the whole script.bin contents are included
  • 'Installed packages' shows version numbers of relevant Armbian packages
  • 'Group membership of' should list all groups the user is member of. If this line is missing ignore the whole contents and ask the user to re-submit debug info, this time doing it correctly not as root but using 'sudo armbianmonitor -u' (group memberships are important to understand certain problems, eg. users not being member of audio group won't have success getting noise out of their devices)
  • If the board is PCIe capable list of attached PCIe devices is included
  • The lsusb output lists all connected USB devices and also information about speed (12M, 480M, 5000M) and protocol/connection details (mass-storage vs uas for example)
  • If the user installed the lshw utility and verbosity is set to 4 or above in /boot/armbianEnv.txt some more disk related information will be included

 

Important: The debug output also contains all collected support files that follow this naming scheme: /tmp/armbianmonitor_checks_* -- so if a user complains about 'transmission so slow' or 'latest files are always missing' ask him to run 'armbianmonitor -c /path/to/torrent-storage' and afterwards 'sudo armbianmonitor -u' without a reboot in between since then the checking results will also be contained.

 

Everything above of this information at the output's bottom is result of regular logging at startup and shutdown (the contents of /var/log/armhwinfo.log). 

 

At startup the following items are logged: dmesg output, /etc/armbian-release and /boot/armbianEnv.txt contents, lsusb and lscpu output, /proc/cpuinfo and /proc/meminfo contents, network interface information, available partitions and filesystems, on Allwinner boards where /boot/script.bin points to, some metadata information for all MMC media connected to the host (eg. SD card and/or eMMC) and some system health information.

 

At shutdown iostat, vmstat and free output are added to /var/log/armhwinfo.log as well as the last 100 lines from dmesg output. If these '### shutdown' entries are missing after reboots the system crashed while shutting down.

Edited by Werner
highlight command, add prefix
Link to comment
Share on other sites

1 hour ago, tkaiser said:

The debug output also contains all collected support files that follow this naming scheme: /tmp/armbianmonitor_checks_* -- so if a user complains about 'transmission so slow' or 'latest files are always missing' ask him to run 'armbianmonitor -c /path/to/torrent-storage' and afterwards 'sudo armbianmonitor -u' without a reboot in between since then the checking results will also be contained.

 

To elaborate on that I just let this check run on two SD cards. Please search in new debug output for the two occurences of '/tmp/armbianmonitor_checks'

  • armbianmonitor_checks_mmcblk1p1_ext4 is a 16 GB SanDisk Extreme Plus the OS is running from. Full capacity is usable and performance pretty nice (also due to ODROIDs having less SDIO bus limitations compared to most other boards we support)
  • armbianmonitor_checks_sda2_btrfs is an 8 GB SanDisk Extreme Pro which is usually also very fast but limited here by the external USB card reader.

So if people complain about strange storage related stuff simply always encourage them to run 'armbianmonitor -c' (over night since on slow SD cards this can take ages) followed by 'sudo armbianmonitor -u' to provide results to the forum.

Link to comment
Share on other sites

After latest commit on those crappy Bananas and other A20 boards a little bit more underpowering monitoring should be possible when our next major release is out soon. Example output (from a Lime2 with a dying PSU and only protected by a huge battery and AXP209 PMIC any more):

 

### Current system health:

Time        CPU    load %cpu %sys %usr %nice %io %irq   CPU   PMIC   DC-IN  C.St.
21:57:46:  720MHz  0.33   4%   1%   1%   0%   0%   0% 39.6°C 40.3°C   4.44V  0/6
21:57:47:  528MHz  0.33  17%  13%   3%   0%   0%   0% 39.6°C 40.0°C   4.44V  0/6
21:57:47:  528MHz  0.33  18%  13%   5%   0%   0%   0% 39.6°C 40.1°C   4.44V  0/6
21:57:48:  528MHz  0.33  17%  14%   2%   0%   0%   0% 39.3°C 40.1°C   4.34V  0/6
21:57:49:  528MHz  0.33  29%  23%   2%   2%   0%   0% 39.3°C 40.1°C   4.44V  0/6
21:57:50:  960MHz  0.54   4%   1%   1%   0%   0%   0% 39.4°C 41.1°C   4.19V  0/6
21:57:51:  960MHz  0.54 100%   0% 100%   0%   0%   0% 39.4°C 41.9°C   4.19V  0/6
21:57:52:  960MHz  0.54 100%   1%  98%   0%   0%   0% 40.3°C 42.1°C   4.19V  0/6
21:57:53:  960MHz  0.54  94%   2%  92%   0%   0%   0% 40.3°C 41.6°C   4.42V  0/6
21:57:53:  528MHz  0.54  17%  14%   2%   0%   0%   0% 40.3°C 40.8°C   4.41V  0/6

The first 5 lines should show normal/idle behaviour while then a 'stress' task is fired up (as many threads in parallel as CPU cores available) which should show voltage drops on boards with insufficient power cables and PSUs (applies especially to those Micro USB equipped boards but of course can also show PSUs making problems as above -- without battery I would assume this Lime2 would be dead already since a long time)

Link to comment
Share on other sites

  • Igor unfeatured this topic
  • Werner featured this topic
  • Werner locked this topic
Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines