Jump to content

Dennboy

Members
  • Posts

    63
  • Joined

  • Last visited

Posts posted by Dennboy

  1. Hi Amir,

     

    It has been a while since your post, but anyway here are some pointers:

     

    The SPI master bus is enabled by default on the nanopi neo3, see e.g.:

    ls /sys/firmware/devicetree/base/spi@ff190000

    You can use the bus by attaching an SPI device to the bus and use it either via userland using the general purpose spidev kernel driver or load/write the appropriate driver kernel module and/or trigger driver loading via a device tree overlay.

     

    We use our own driver and device tree overlay for SPI with an ADC on the neo3, we see sporadic loss of SPI interrupts at 4KHz sampling and more at higher sampling rates, especially when there is high CPU load on the neo3.

     

    Kind regards,

    Dennis

  2. Since volatile journald logging is done in-memory in /run/log/journal, and made persistent in /var/log.hdd/journal, the rsync of /var/log/journal in nand-sata-install no longer works because it is a symlink to /var/log.hdd/journal in the live system, and will create a circular symlink when booting from NAND. On the target image /var/log is the folder that is directory-mounted to /var/log.hdd during boot.

     

    I propose to exclude /var/log from the rsync, and rsync /var/log.hdd to /var/log on the target image.

    Another solution could be to exclude only /var/log/journal from the rsync, and create an empty /var/log/journal in the target image.

     

    Kind regards,

    Dennis

  3. For the record, another correction. The statement for Storage=volatile journald works after changing to

     

    2. change to Storage=volatile in /etc/systemd/journald.conf, and optionally remove journal directories to save some space

    • sudo sh -c "echo Storage=volatile » /etc/systemd/journald.conf"
    • sudo systemctl stop systemd-journald.service
    • sudo rm -rf /var/log/journal /var/log.hdd/journal
    • sudo systemctl start systemd-journald.service
  4. After some experimenting I found a way to flush the journal and maintain logs in tmpfs using /run/log/journal, and only have a softlink from /var/log/journal to /var/log.hdd/journal. Additionally the cached journals are to be cleaned up when /var/log becomes too full.

     

     I created the attached patches for armbian-truncate-logs and armbian-ramlog: armbian-truncate-logs.patcharmbian-ramlog.patch 

     

    Some logic was needed in cases where the user decides to do Storage=volatile journal logging in /etc/systemd/journald.conf, or when the user or journald with Storage=persistent creates a /var/log/journal folder which will be moved to /var/log.hdd/journal and replaced by the softlink.

     

    I tested stop, start, write for armbian-ramlog, and truncate-logs. They seem to work fine.

  5. I've done some reading and experimenting with systemd journald, and there appear to be at least 2 ways to easily turn off persistent logging to help with your SBC issues :

    1. keep default Storage=auto in /etc/systemd/journald.conf, and remove journal directories
      • sudo systemctl stop systemd-journald.service
      • sudo rm -rf /var/log/journal /var.hdd/log.journal
      • sudo systemctl start systemd-journald.service
    2. change to Storage=volatile in /etc/systemd/journald.conf, and optionally remove journal directories to save some space
      • sudo su -c "echo Storage=volatile > /etc/systemd/journal"
      • sudo systemctl restart systemd-journald.service
      • sudo rm -rf /var/log/journal /var.hdd/log.journal

    In both cases the journald logs will still be available at runtime in /run/log/journal, but not written to the sdcard/emmc/disk. I think this would be a safe default for most users. You can always turn on persistent logging later when you for some reason want to diagnose boot problems.

     

    For re-enabling persistent journald logging, the following steps can be taken:

    1. with the default Storage=auto in /etc/systemd/journald.conf:
      • mkdir -p /var/log/journal /var.hdd/log.journal
    2. with Storage=volatile in /etc/systemd/journald.conf:
      • mkdir -p /var/log/journal /var.hdd/log.journal
      • sudo sed -i "/^Storage=/d" /etc/systemd/journal"
      • sudo systemctl restart systemd-journald.service
  6. It looks like the non-persistent journal on /run/log/journal/* is sometimes also active on my opi0 with Armbian 21.08.2.

     

    A new approach that lets journald control volatile and persistent storage, but lets armbian decide when and where to flush to persistent:

    • configure journald to write volatile logs to /run/log/journal
    • move journal from tmpfs space: 
      • mv /var/log.hdd/journal /var/log.hdd.journal
    • create softlink in both persistent and volatile memory
      • ln -s /var/log.hdd.journal /var/log/journal
      • ln -s /var/log.hdd.journal /var/log.hdd/journal
    • in armbian-ramlog
      • flush the volatile journal to persistent /var/log/journal (i.e. this points to /var/log.hdd.journal)
        •  [ -L /var/log/journal ] && journalctl --flush
      • use rsync flag --one-file-system armbian-truncate-logs, to prevent copying the journal files anyway

    Currently testing this approach, instead of the softlinks I am also trying mount --bind, but that seems tricky.

  7. looks like my corrupted system.journal is due to influxdb still busy during shutdown while it can no longer write to /var/lib/influxdb

    looks like the corrupted user journals are due to user processes still running (e.g. the login shell) while /var/log is no longer mounted

     

    bottom line: to avoid journal corruption we may have to:

    • flush or stop journald before umounting memory-mapped /var/log,
    • or use the in-memory and persistent storage of journald itself in /var/run/journal and /var/log/journal respectively.
  8. I noticed the following in dmesg, that may be the source of the cached/corrupted journal files. Looks like system journal got corrupted during shutdown and user journals are temporarilly not accessible due to a busy device.

    npin3bullVT:~:% dmesg|grep -i journal
    [   21.666206] systemd[1]: Listening on Journal Audit Socket.
    [   21.668883] systemd[1]: Listening on Journal Socket (/dev/log).
    [   21.682035] systemd[1]: Listening on Journal Socket.
    [   26.931365] EXT4-fs (zram1): mounted filesystem without journal. Opts: discard
    [   31.043188] systemd[1]: Starting Journal Service...
    [   31.273498] systemd[1]: Started Journal Service.
    [   31.327326] systemd-journald[588]: Received client request to flush runtime journal.
    [   31.389135] systemd-journald[588]: File /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system.journal corrupted or uncleanly shut down, renaming and replacing.
    [   47.628019] systemd-journald[588]: Failed to read journal file /var/log/journal/50bea5a2341c40588d32c8103dea6e71/user-1002.journal for rotation, trying to move it out of the way: Device or resource busy
    [   47.630134] systemd-journald[588]: Failed to read journal file /var/log/journal/50bea5a2341c40588d32c8103dea6e71/user-1000.journal for rotation, trying to move it out of the way: Device or resource busy
  9. I have tested my patch using :
     

    rsync -r --exclude 'system@*' --exclude 'user-*@*' --exclude='*~' /var/log/journal test

    and it appears to work from the commandline. However, I still get corrupt journal files at boot. They may appear because journald writes to the journal files during the rsync.

    npin3bullVT:backup:% uptime; ls -lsa /var/log/journal/50bea5a2341c40588d32c8103dea6e71
     09:57:09 up 21 min,  1 user,  load average: 2.15, 2.36, 1.83
    total 10280
       8 drwxr-sr-x+ 2 root systemd-journal    4096 Jun 29 09:36 .
       8 drwxr-sr-x+ 3 root systemd-journal    4096 Aug 26  2021 ..
    2564 -rw-r-----  1 root systemd-journal 2621440 Jun 29 09:36 system@0005e292e1d3c280-560a1a0a69b9966a.journal~
    2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 09:36 system@9cec3cbef79c442ab7f055042ddcbb75-0000000000000001-0005e292e1d0956a.journal
    2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 09:57 system.journal
    2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 09:41 user-1002.journal


    The timestamp of the cached/corrupt files "*~" and "*@*" is during boot, 09:57:09 - 21minutes = 09:36

     

    In /var/log/syslog I see that only the right journal files are copied:
     

    Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/
    Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/
    Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/system.journal
    Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/user-1000.journal
    Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/user-1002.journal

     

  10. For excluding the corrupted journal files, I propose the following patch to syncFromDisk in armbian-ramlog:

     

    --- /usr/lib/armbian/armbian-ramlog     2021-09-26 20:15:41.000000000 +0000
    +++ armbian-ramlog      2022-06-29 09:22:18.927488185 +0000
    @@ -69,6 +69,7 @@
                    ${NoCache} rsync -aXWv --delete \
                            --exclude "lost+found" --exclude armbian-ramlog.log \
                            --exclude *.gz --exclude *.xz --exclude='*.[0-9]' \
    +                       --exclude 'system@*' --exclude 'user-*@*' --exclude='*~' \
                            --links \
                            ${XTRA_RSYNC_FROM[@]+"${XTRA_RSYNC_FROM[@]}"} \
                            $HDD_LOG $RAM_LOG 2>&1 | $LOG_OUTPUT


     

  11. This /var/log issue is indeed very annoying. I have some work-arounds in the mentioned ticket.

    In my opinion we need:

    • to find and solve the root cause of the corrupted journal files, to reduce the likelihood that this happens
    • make sure corrupted journal files are not put back in /var/log during boot,
      • i.e. exclude journal/*/system@* and journal/*/user-*@* and journal/*/*~ in /usr/lib/armbian/armbian-ramlog : syncFromDisk
      • see my patch for armbian-ramlog in the mentioned ticket above

    Btw. the main problem is the memory mapped /var/log that fills up (which approx 50MB), the /var/log.hdd is filling up on emmc or sdcard, but this is small compared to the volume size

  12. I see the following in the syslog at night:
     

    Jun 29 00:00:07 npin3bullVT armbian-ramlog[10481]: Wed 29 Jun 2022 12:00:02 AM UTC: Syncing logs to storage
    Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: sending incremental file list
    Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: ./
    Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: alternatives.log
    Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: armbian-hardware-monitor.log
    Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: auth.log
    Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: bootstrap.log
    Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: btmp
    Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: daemon.log
    Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: debug
    Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: dpkg.log
    Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: faillog
    Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: kern.log
    Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: lastlog
    Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: messages
    Jun 29 00:01:07 npin3bullVT armbian-ramlog[10677]: cat: write error: No space left on device
    Jun 29 00:01:07 npin3bullVT armbian-ramlog[10703]: cat: write error: No space left on device
    Jun 29 00:01:07 npin3bullVT armbian-ramlog[10710]: cat: write error: No space left on device
    Jun 29 00:01:07 npin3bullVT armbian-ramlog[10711]: cat: write error: No space left on device
    Jun 29 00:01:08 npin3bullVT armbian-ramlog[10729]: cat: write error: No space left on device
    Jun 29 00:01:08 npin3bullVT armbian-ramlog[10746]: cat: write error: No space left on device
    Jun 29 00:01:08 npin3bullVT armbian-ramlog[10782]: cat: write error: No space left on device
    Jun 29 00:01:09 npin3bullVT armbian-ramlog[10789]: cat: write error: No space left on device
    Jun 29 00:01:09 npin3bullVT armbian-ramlog[10790]: cat: write error: No space left on device
    Jun 29 00:01:10 npin3bullVT armbian-ramlog[10848]: cat: write error: No space left on device
    Jun 29 00:01:10 npin3bullVT armbian-ramlog[10861]: cat: write error: No space left on device

     

    I removed the journal backup files (in var/log/journal/*/system@*) to get fresh again:

    npin3bullVT:flowprocessing:% ls -lsa /var/log/journal/50bea5a2341c40588d32c8103dea6e71                                             <config>
    total 18016
      40 drwxr-sr-x+ 2 root systemd-journal   36864 Jun 29 07:56 .
       8 drwxr-sr-x+ 3 root systemd-journal    4096 Aug 26  2021 ..
    2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 00:16 system@28483b061ebe4b9a828c4b3066920f5d-0000000000000001-0005e28affeda3ec.journal
    2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 00:23 system@28483b061ebe4b9a828c4b3066920f5d-0000000000000c5c-0005e28b0865be92.journal
    2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 03:49 system@28483b061ebe4b9a828c4b3066920f5d-0000000000008f6f-0005e28b28f775c6.journal
    2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 07:15 system@28483b061ebe4b9a828c4b3066920f5d-0000000000009a6b-0005e28e0a446064.journal
    2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 00:01 system@d4f3757edfbb4d17ad2866e73d1fd9c1-000000000000334d-0005e2899513914c.journal
    2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 07:59 system.journal
    2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 07:59 user-1000.journal


    For those interested I uploaded my armbianmonitor data http://ix.io/41aW, (using armbianmonitor -U > ~/armbianmonitor.out; cat  ~/armbianmonitor.out| curl -F 'f:1=<-' ix.io because it didn't pass the ping check).

  13. After some time also systemd-journal stores a number of recovery files, some of them truncated, and they are not removed at boot:

     

    % ls -lsa /var/log/journal/50bea5a2341c40588d32c8103dea6e71/*@*
    2548 -rw-r--r-- 1 root systemd-journal 2609152 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@0005cf6abd179fba-93695493c7232496.journal~
    1716 -rw-r--r-- 1 root systemd-journal 1757184 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@0005dfd3a2da0b28-85157396073fd2e2.journal~
     180 -rw-r--r-- 1 root systemd-journal  184320 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@36be248beb2849f0b9dc25a1a84098ef-000000000004c701-0005cf37b3036796.journal
    2216 -rw-r--r-- 1 root systemd-journal 2269184 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@36be248beb2849f0b9dc25a1a84098ef-000000000004da0b-0005cf54d29ac2e7.journal
     524 -rw-r--r-- 1 root systemd-journal  536576 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@5c31986800b04c39a817e02f2abf0502-0000000000000001-0005dfc277cb07ae.journal
    2560 -rw-r--r-- 1 root systemd-journal 2621440 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@5c31986800b04c39a817e02f2abf0502-0000000000000baa-0005dfcaef17af28.journal

     

    I think it would be a good idea to always start with an empty journal at startup, and keep the /var/log.hdd/journal just for history of previous boots. This may also be cleaned periodically, e.g. mine now contains 295 files, most of them empty.

  14. some problems I had with network manager on Armbian on SBCs in the field (usually miles away with only a reverse ssh tunnel to reach it):

    • it creates /etc/resolv.conf as textfile instead of linking to the runtime /run/resolvconf/resolv.conf
      • this often results in an SBC that no longer use DNS when the remaining network connection is using another service like Networking
    • it has poor support for multi-homing (having multiple networks like wifi+eth simultaneously to the same or different routers), it sometimes takes away the default route of the other connection (e.g. when pulling the Ethernet cable)
    • the nmtui tool is unsuitable to switch networks over an ssh session, it just freezes since the existing network connection is first dropped.
  15. Another thing I noticed is that after a few reboots, journal fails are reported for some users:

     

    [   44.505361] systemd-journald[525]: File /var/log/journal/078479a1b0a4459f80590dd571c3d6de/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.                                                                                                                                                    
    [   63.921282] systemd-journald[525]: File /var/log/journal/078479a1b0a4459f80590dd571c3d6de/user-1004.journal corrupted or uncleanly shut down, renaming and replacing.

     

    This also fills /var/log, and the renamed journal files seem to be persistent over reboots:

     

    dennis@SBC5037:~$ ls -lsa /var/log/journal/078479a1b0a4459f80590dd571c3d6de/
    total 25660
       8 drwxr-sr-x+ 2 root systemd-journal    4096 Jun  1 15:35 .                                                                    
       8 drwxr-sr-x+ 3 root systemd-journal    4096 Oct 18  2021 ..                                                                   
    2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:34 system@0005e062f36ff9d9-3c70b6ef958abbf0.journal~                    
    2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun  1 15:37 system.journal                                                       
    2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:34 user-1000@0005e062f48c3d16-92482727877728b0.journal~                 
    2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:16 user-1000@3fb2bae4805e4f9188406e3f40cb4ebd-0000000000000334-0005e0628
    0a3e785.journal                                                                                                                   
    2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun  1 15:34 user-1000.journal                                                    
    2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:34 user-1003@0005e062f3bb855f-b21d739a16f87609.journal~                 
    2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun  1 15:34 user-1003.journal                                                    
    2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:35 user-1004@0005e062f5b47c59-b74bb7b9498fbd75.journal~                 
    2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:16 user-1004@4fb34549fd9b4e71b64a6e556942b437-000000000000034a-0005e0628
    162ea3d.journal                                                                                                                   
    2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun  1 15:36 user-1004.journal  

     

    Could this be a result of user jobs that are still running when the systemd user journal is stopped?

  16. As a work-around, I managed to reuse dtb-5.10.63-rockchip64/rockchip/rk3328-nanopi-neo3-rev02.dtb from Armbian-21.08 so that I could boot the image again.

    Just learned that the board is no longer supported, that's probably why the devicetree file got removed.

  17. Dear maintainers,

     

    Guess some device tree issues with neo3 got me into this mess, since rk3328-nanopi-neo3-rev02.dtb is missnig.

     

    Scanning mmc 1:1...                                         
    Found U-Boot script /boot/boot.scr                          
    3185 bytes read in 7 ms (444.3 KiB/s)
    ## Executing script at 00500000
    Boot script loaded from mmc 1
    428 bytes read in 6 ms (69.3 KiB/s)                                                                                               
    12514638 bytes read in 556 ms (21.5 MiB/s)                                                                                        
    30083584 bytes read in 1326 ms (21.6 MiB/s)                                                                                       
    Failed to load '/boot/dtb/rockchip/rk3328-nanopi-neo3-rev02.dtb'                                                                  
    libfdt fdt_check_header(): FDT_ERR_BADMAGIC                                                                                       
    No FDT memory address configured. Please configure                                                                                
    the FDT address via "fdt addr <address>" command.                                                                                 
    Aborting!                                        

     

    Kind regards,
    Dennis

  18. Guess I cheered a bit too early, the vacuum part of /var/log.hdd may need to be done a bit earlier than cron @reboot.

     

    Still occasionally get:

    $ systemctl --failed
      UNIT            LOAD   ACTIVE SUB    DESCRIPTION
    ● sysstat.service loaded failed failed Resets System Activity Logs
     

    And a lot of errors that syslog is not able to write:

    May 23 17:05:31 SBC5029 rsyslogd[597]: file '/var/log/syslog'[7] write error - >                                                  
    May 23 17:05:31 SBC5029 rsyslogd[597]: action 'action-1-builtin:omfile' (module>                                                  
    May 23 17:05:31 SBC5029 rsyslogd[597]: file '/var/log/syslog'[7] write error - >                                                  
    May 23 17:05:31 SBC5029 rsyslogd[597]: action 'action-1-builtin:omfile' (module>                                                  
    May 23 17:05:31 SBC5029 rsyslogd[597]: file '/var/log/syslog'[7] write error - >                                                  
     

  19. Dear all,

     

    I occasionally have a full /var/log at startup on various boards, probably because the logs are not rotated /vaccuumd at boot (we had a thread about this in the old opi1 forum). The rotating /vaccuum is currently done every 15 minutes, so I figured why not do it directly at boot as well?

     

    For this purpose I updated /etc/cron.d/armbian-truncate-logs with an extra @reboot line:

    PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin                                                               
                                                                                                                                    
    */15 * * * * root /usr/lib/armbian/armbian-truncate-logs                                                                        
    @reboot root /usr/lib/armbian/armbian-truncate-logs   

    I rebooted several times, and didn't get a full /var/log anymore.
    All subsequent problems like failing networking when filesystem full also vanished;-)

     

    Kind regards,
    Dennis

×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines