Jump to content

Recommended Posts

Posted

Dear all,

 

I occasionally have a full /var/log at startup on various boards, probably because the logs are not rotated /vaccuumd at boot (we had a thread about this in the old opi1 forum). The rotating /vaccuum is currently done every 15 minutes, so I figured why not do it directly at boot as well?

 

For this purpose I updated /etc/cron.d/armbian-truncate-logs with an extra @reboot line:

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin                                                               
                                                                                                                                
*/15 * * * * root /usr/lib/armbian/armbian-truncate-logs                                                                        
@reboot root /usr/lib/armbian/armbian-truncate-logs   

I rebooted several times, and didn't get a full /var/log anymore.
All subsequent problems like failing networking when filesystem full also vanished;-)

 

Kind regards,
Dennis

Posted

Guess I cheered a bit too early, the vacuum part of /var/log.hdd may need to be done a bit earlier than cron @reboot.

 

Still occasionally get:

$ systemctl --failed
  UNIT            LOAD   ACTIVE SUB    DESCRIPTION
● sysstat.service loaded failed failed Resets System Activity Logs
 

And a lot of errors that syslog is not able to write:

May 23 17:05:31 SBC5029 rsyslogd[597]: file '/var/log/syslog'[7] write error - >                                                  
May 23 17:05:31 SBC5029 rsyslogd[597]: action 'action-1-builtin:omfile' (module>                                                  
May 23 17:05:31 SBC5029 rsyslogd[597]: file '/var/log/syslog'[7] write error - >                                                  
May 23 17:05:31 SBC5029 rsyslogd[597]: action 'action-1-builtin:omfile' (module>                                                  
May 23 17:05:31 SBC5029 rsyslogd[597]: file '/var/log/syslog'[7] write error - >                                                  
 

Posted

Since /usr/lib/armbian/armbian-truncate-logs does not always seem to trigger the vacuum from cron, I changed cronjob to the vacuum itself. So far it seems to work.

@reboot root journalctl --vacuum-size=10M

 

Posted

Another thing I noticed is that after a few reboots, journal fails are reported for some users:

 

[   44.505361] systemd-journald[525]: File /var/log/journal/078479a1b0a4459f80590dd571c3d6de/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.                                                                                                                                                    
[   63.921282] systemd-journald[525]: File /var/log/journal/078479a1b0a4459f80590dd571c3d6de/user-1004.journal corrupted or uncleanly shut down, renaming and replacing.

 

This also fills /var/log, and the renamed journal files seem to be persistent over reboots:

 

dennis@SBC5037:~$ ls -lsa /var/log/journal/078479a1b0a4459f80590dd571c3d6de/
total 25660
   8 drwxr-sr-x+ 2 root systemd-journal    4096 Jun  1 15:35 .                                                                    
   8 drwxr-sr-x+ 3 root systemd-journal    4096 Oct 18  2021 ..                                                                   
2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:34 system@0005e062f36ff9d9-3c70b6ef958abbf0.journal~                    
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun  1 15:37 system.journal                                                       
2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:34 user-1000@0005e062f48c3d16-92482727877728b0.journal~                 
2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:16 user-1000@3fb2bae4805e4f9188406e3f40cb4ebd-0000000000000334-0005e0628
0a3e785.journal                                                                                                                   
2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun  1 15:34 user-1000.journal                                                    
2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:34 user-1003@0005e062f3bb855f-b21d739a16f87609.journal~                 
2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun  1 15:34 user-1003.journal                                                    
2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:35 user-1004@0005e062f5b47c59-b74bb7b9498fbd75.journal~                 
2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:16 user-1004@4fb34549fd9b4e71b64a6e556942b437-000000000000034a-0005e0628
162ea3d.journal                                                                                                                   
2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun  1 15:36 user-1004.journal  

 

Could this be a result of user jobs that are still running when the systemd user journal is stopped?

Posted

As a work-around to the failed user journals clogging /var/log, I now clear them in /etc/rc.local:
 

rm -f /var/log.hdd/journal/*/user-* /var/log/journal/*/user-*

 

Posted

After some time also systemd-journal stores a number of recovery files, some of them truncated, and they are not removed at boot:

 

% ls -lsa /var/log/journal/50bea5a2341c40588d32c8103dea6e71/*@*
2548 -rw-r--r-- 1 root systemd-journal 2609152 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@0005cf6abd179fba-93695493c7232496.journal~
1716 -rw-r--r-- 1 root systemd-journal 1757184 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@0005dfd3a2da0b28-85157396073fd2e2.journal~
 180 -rw-r--r-- 1 root systemd-journal  184320 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@36be248beb2849f0b9dc25a1a84098ef-000000000004c701-0005cf37b3036796.journal
2216 -rw-r--r-- 1 root systemd-journal 2269184 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@36be248beb2849f0b9dc25a1a84098ef-000000000004da0b-0005cf54d29ac2e7.journal
 524 -rw-r--r-- 1 root systemd-journal  536576 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@5c31986800b04c39a817e02f2abf0502-0000000000000001-0005dfc277cb07ae.journal
2560 -rw-r--r-- 1 root systemd-journal 2621440 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@5c31986800b04c39a817e02f2abf0502-0000000000000baa-0005dfcaef17af28.journal

 

I think it would be a good idea to always start with an empty journal at startup, and keep the /var/log.hdd/journal just for history of previous boots. This may also be cleaned periodically, e.g. mine now contains 295 files, most of them empty.

Posted

I see the following in the syslog at night:
 

Jun 29 00:00:07 npin3bullVT armbian-ramlog[10481]: Wed 29 Jun 2022 12:00:02 AM UTC: Syncing logs to storage
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: sending incremental file list
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: ./
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: alternatives.log
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: armbian-hardware-monitor.log
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: auth.log
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: bootstrap.log
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: btmp
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: daemon.log
Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: debug
Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: dpkg.log
Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: faillog
Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: kern.log
Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: lastlog
Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: messages
Jun 29 00:01:07 npin3bullVT armbian-ramlog[10677]: cat: write error: No space left on device
Jun 29 00:01:07 npin3bullVT armbian-ramlog[10703]: cat: write error: No space left on device
Jun 29 00:01:07 npin3bullVT armbian-ramlog[10710]: cat: write error: No space left on device
Jun 29 00:01:07 npin3bullVT armbian-ramlog[10711]: cat: write error: No space left on device
Jun 29 00:01:08 npin3bullVT armbian-ramlog[10729]: cat: write error: No space left on device
Jun 29 00:01:08 npin3bullVT armbian-ramlog[10746]: cat: write error: No space left on device
Jun 29 00:01:08 npin3bullVT armbian-ramlog[10782]: cat: write error: No space left on device
Jun 29 00:01:09 npin3bullVT armbian-ramlog[10789]: cat: write error: No space left on device
Jun 29 00:01:09 npin3bullVT armbian-ramlog[10790]: cat: write error: No space left on device
Jun 29 00:01:10 npin3bullVT armbian-ramlog[10848]: cat: write error: No space left on device
Jun 29 00:01:10 npin3bullVT armbian-ramlog[10861]: cat: write error: No space left on device

 

I removed the journal backup files (in var/log/journal/*/system@*) to get fresh again:

npin3bullVT:flowprocessing:% ls -lsa /var/log/journal/50bea5a2341c40588d32c8103dea6e71                                             <config>
total 18016
  40 drwxr-sr-x+ 2 root systemd-journal   36864 Jun 29 07:56 .
   8 drwxr-sr-x+ 3 root systemd-journal    4096 Aug 26  2021 ..
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 00:16 system@28483b061ebe4b9a828c4b3066920f5d-0000000000000001-0005e28affeda3ec.journal
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 00:23 system@28483b061ebe4b9a828c4b3066920f5d-0000000000000c5c-0005e28b0865be92.journal
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 03:49 system@28483b061ebe4b9a828c4b3066920f5d-0000000000008f6f-0005e28b28f775c6.journal
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 07:15 system@28483b061ebe4b9a828c4b3066920f5d-0000000000009a6b-0005e28e0a446064.journal
2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 00:01 system@d4f3757edfbb4d17ad2866e73d1fd9c1-000000000000334d-0005e2899513914c.journal
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 07:59 system.journal
2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 07:59 user-1000.journal


For those interested I uploaded my armbianmonitor data http://ix.io/41aW, (using armbianmonitor -U > ~/armbianmonitor.out; cat  ~/armbianmonitor.out| curl -F 'f:1=<-' ix.io because it didn't pass the ping check).

Posted (edited)

For excluding the corrupted journal files, I propose the following patch to syncFromDisk in armbian-ramlog:

 

--- /usr/lib/armbian/armbian-ramlog     2021-09-26 20:15:41.000000000 +0000
+++ armbian-ramlog      2022-06-29 09:22:18.927488185 +0000
@@ -69,6 +69,7 @@
                ${NoCache} rsync -aXWv --delete \
                        --exclude "lost+found" --exclude armbian-ramlog.log \
                        --exclude *.gz --exclude *.xz --exclude='*.[0-9]' \
+                       --exclude 'system@*' --exclude 'user-*@*' --exclude='*~' \
                        --links \
                        ${XTRA_RSYNC_FROM[@]+"${XTRA_RSYNC_FROM[@]}"} \
                        $HDD_LOG $RAM_LOG 2>&1 | $LOG_OUTPUT


 

Edited by Dennboy
Posted (edited)

I have tested my patch using :
 

rsync -r --exclude 'system@*' --exclude 'user-*@*' --exclude='*~' /var/log/journal test

and it appears to work from the commandline. However, I still get corrupt journal files at boot. They may appear because journald writes to the journal files during the rsync.

npin3bullVT:backup:% uptime; ls -lsa /var/log/journal/50bea5a2341c40588d32c8103dea6e71
 09:57:09 up 21 min,  1 user,  load average: 2.15, 2.36, 1.83
total 10280
   8 drwxr-sr-x+ 2 root systemd-journal    4096 Jun 29 09:36 .
   8 drwxr-sr-x+ 3 root systemd-journal    4096 Aug 26  2021 ..
2564 -rw-r-----  1 root systemd-journal 2621440 Jun 29 09:36 system@0005e292e1d3c280-560a1a0a69b9966a.journal~
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 09:36 system@9cec3cbef79c442ab7f055042ddcbb75-0000000000000001-0005e292e1d0956a.journal
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 09:57 system.journal
2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 09:41 user-1002.journal


The timestamp of the cached/corrupt files "*~" and "*@*" is during boot, 09:57:09 - 21minutes = 09:36

 

In /var/log/syslog I see that only the right journal files are copied:
 

Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/
Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/
Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/system.journal
Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/user-1000.journal
Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/user-1002.journal

 

Edited by Dennboy
Posted

I noticed the following in dmesg, that may be the source of the cached/corrupted journal files. Looks like system journal got corrupted during shutdown and user journals are temporarilly not accessible due to a busy device.

npin3bullVT:~:% dmesg|grep -i journal
[   21.666206] systemd[1]: Listening on Journal Audit Socket.
[   21.668883] systemd[1]: Listening on Journal Socket (/dev/log).
[   21.682035] systemd[1]: Listening on Journal Socket.
[   26.931365] EXT4-fs (zram1): mounted filesystem without journal. Opts: discard
[   31.043188] systemd[1]: Starting Journal Service...
[   31.273498] systemd[1]: Started Journal Service.
[   31.327326] systemd-journald[588]: Received client request to flush runtime journal.
[   31.389135] systemd-journald[588]: File /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system.journal corrupted or uncleanly shut down, renaming and replacing.
[   47.628019] systemd-journald[588]: Failed to read journal file /var/log/journal/50bea5a2341c40588d32c8103dea6e71/user-1002.journal for rotation, trying to move it out of the way: Device or resource busy
[   47.630134] systemd-journald[588]: Failed to read journal file /var/log/journal/50bea5a2341c40588d32c8103dea6e71/user-1000.journal for rotation, trying to move it out of the way: Device or resource busy
Posted

looks like my corrupted system.journal is due to influxdb still busy during shutdown while it can no longer write to /var/lib/influxdb

looks like the corrupted user journals are due to user processes still running (e.g. the login shell) while /var/log is no longer mounted

 

bottom line: to avoid journal corruption we may have to:

  • flush or stop journald before umounting memory-mapped /var/log,
  • or use the in-memory and persistent storage of journald itself in /var/run/journal and /var/log/journal respectively.
Posted

It looks like the non-persistent journal on /run/log/journal/* is sometimes also active on my opi0 with Armbian 21.08.2.

 

A new approach that lets journald control volatile and persistent storage, but lets armbian decide when and where to flush to persistent:

  • configure journald to write volatile logs to /run/log/journal
  • move journal from tmpfs space: 
    • mv /var/log.hdd/journal /var/log.hdd.journal
  • create softlink in both persistent and volatile memory
    • ln -s /var/log.hdd.journal /var/log/journal
    • ln -s /var/log.hdd.journal /var/log.hdd/journal
  • in armbian-ramlog
    • flush the volatile journal to persistent /var/log/journal (i.e. this points to /var/log.hdd.journal)
      •  [ -L /var/log/journal ] && journalctl --flush
    • use rsync flag --one-file-system armbian-truncate-logs, to prevent copying the journal files anyway

Currently testing this approach, instead of the softlinks I am also trying mount --bind, but that seems tricky.

Posted

After some experimenting I found a way to flush the journal and maintain logs in tmpfs using /run/log/journal, and only have a softlink from /var/log/journal to /var/log.hdd/journal. Additionally the cached journals are to be cleaned up when /var/log becomes too full.

 

 I created the attached patches for armbian-truncate-logs and armbian-ramlog: armbian-truncate-logs.patcharmbian-ramlog.patch 

 

Some logic was needed in cases where the user decides to do Storage=volatile journal logging in /etc/systemd/journald.conf, or when the user or journald with Storage=persistent creates a /var/log/journal folder which will be moved to /var/log.hdd/journal and replaced by the softlink.

 

I tested stop, start, write for armbian-ramlog, and truncate-logs. They seem to work fine.

Posted
8 minutes ago, Dennboy said:

After some experimenting I found a way to flush the journal and maintain logs in tmpfs using /run/log/journal, and only have a softlink from /var/log/journal to /var/log.hdd/journal. Additionally the cached journals are to be cleaned up when /var/log becomes too full.


This looks like as ready to put it here https://github.com/armbian/build/pulls 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines