Jump to content

work-around for /var/log full at startup


Recommended Posts

Dear all,

 

I occasionally have a full /var/log at startup on various boards, probably because the logs are not rotated /vaccuumd at boot (we had a thread about this in the old opi1 forum). The rotating /vaccuum is currently done every 15 minutes, so I figured why not do it directly at boot as well?

 

For this purpose I updated /etc/cron.d/armbian-truncate-logs with an extra @reboot line:

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin                                                               
                                                                                                                                
*/15 * * * * root /usr/lib/armbian/armbian-truncate-logs                                                                        
@reboot root /usr/lib/armbian/armbian-truncate-logs   

I rebooted several times, and didn't get a full /var/log anymore.
All subsequent problems like failing networking when filesystem full also vanished;-)

 

Kind regards,
Dennis

Link to comment
Share on other sites

Guess I cheered a bit too early, the vacuum part of /var/log.hdd may need to be done a bit earlier than cron @reboot.

 

Still occasionally get:

$ systemctl --failed
  UNIT            LOAD   ACTIVE SUB    DESCRIPTION
● sysstat.service loaded failed failed Resets System Activity Logs
 

And a lot of errors that syslog is not able to write:

May 23 17:05:31 SBC5029 rsyslogd[597]: file '/var/log/syslog'[7] write error - >                                                  
May 23 17:05:31 SBC5029 rsyslogd[597]: action 'action-1-builtin:omfile' (module>                                                  
May 23 17:05:31 SBC5029 rsyslogd[597]: file '/var/log/syslog'[7] write error - >                                                  
May 23 17:05:31 SBC5029 rsyslogd[597]: action 'action-1-builtin:omfile' (module>                                                  
May 23 17:05:31 SBC5029 rsyslogd[597]: file '/var/log/syslog'[7] write error - >                                                  
 

Link to comment
Share on other sites

Another thing I noticed is that after a few reboots, journal fails are reported for some users:

 

[   44.505361] systemd-journald[525]: File /var/log/journal/078479a1b0a4459f80590dd571c3d6de/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.                                                                                                                                                    
[   63.921282] systemd-journald[525]: File /var/log/journal/078479a1b0a4459f80590dd571c3d6de/user-1004.journal corrupted or uncleanly shut down, renaming and replacing.

 

This also fills /var/log, and the renamed journal files seem to be persistent over reboots:

 

dennis@SBC5037:~$ ls -lsa /var/log/journal/078479a1b0a4459f80590dd571c3d6de/
total 25660
   8 drwxr-sr-x+ 2 root systemd-journal    4096 Jun  1 15:35 .                                                                    
   8 drwxr-sr-x+ 3 root systemd-journal    4096 Oct 18  2021 ..                                                                   
2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:34 system@0005e062f36ff9d9-3c70b6ef958abbf0.journal~                    
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun  1 15:37 system.journal                                                       
2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:34 user-1000@0005e062f48c3d16-92482727877728b0.journal~                 
2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:16 user-1000@3fb2bae4805e4f9188406e3f40cb4ebd-0000000000000334-0005e0628
0a3e785.journal                                                                                                                   
2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun  1 15:34 user-1000.journal                                                    
2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:34 user-1003@0005e062f3bb855f-b21d739a16f87609.journal~                 
2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun  1 15:34 user-1003.journal                                                    
2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:35 user-1004@0005e062f5b47c59-b74bb7b9498fbd75.journal~                 
2564 -rw-r-----  1 root systemd-journal 2621440 Jun  1 15:16 user-1004@4fb34549fd9b4e71b64a6e556942b437-000000000000034a-0005e0628
162ea3d.journal                                                                                                                   
2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun  1 15:36 user-1004.journal  

 

Could this be a result of user jobs that are still running when the systemd user journal is stopped?

Link to comment
Share on other sites

After some time also systemd-journal stores a number of recovery files, some of them truncated, and they are not removed at boot:

 

% ls -lsa /var/log/journal/50bea5a2341c40588d32c8103dea6e71/*@*
2548 -rw-r--r-- 1 root systemd-journal 2609152 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@0005cf6abd179fba-93695493c7232496.journal~
1716 -rw-r--r-- 1 root systemd-journal 1757184 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@0005dfd3a2da0b28-85157396073fd2e2.journal~
 180 -rw-r--r-- 1 root systemd-journal  184320 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@36be248beb2849f0b9dc25a1a84098ef-000000000004c701-0005cf37b3036796.journal
2216 -rw-r--r-- 1 root systemd-journal 2269184 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@36be248beb2849f0b9dc25a1a84098ef-000000000004da0b-0005cf54d29ac2e7.journal
 524 -rw-r--r-- 1 root systemd-journal  536576 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@5c31986800b04c39a817e02f2abf0502-0000000000000001-0005dfc277cb07ae.journal
2560 -rw-r--r-- 1 root systemd-journal 2621440 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@5c31986800b04c39a817e02f2abf0502-0000000000000baa-0005dfcaef17af28.journal

 

I think it would be a good idea to always start with an empty journal at startup, and keep the /var/log.hdd/journal just for history of previous boots. This may also be cleaned periodically, e.g. mine now contains 295 files, most of them empty.

Link to comment
Share on other sites

I see the following in the syslog at night:
 

Jun 29 00:00:07 npin3bullVT armbian-ramlog[10481]: Wed 29 Jun 2022 12:00:02 AM UTC: Syncing logs to storage
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: sending incremental file list
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: ./
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: alternatives.log
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: armbian-hardware-monitor.log
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: auth.log
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: bootstrap.log
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: btmp
Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: daemon.log
Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: debug
Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: dpkg.log
Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: faillog
Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: kern.log
Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: lastlog
Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: messages
Jun 29 00:01:07 npin3bullVT armbian-ramlog[10677]: cat: write error: No space left on device
Jun 29 00:01:07 npin3bullVT armbian-ramlog[10703]: cat: write error: No space left on device
Jun 29 00:01:07 npin3bullVT armbian-ramlog[10710]: cat: write error: No space left on device
Jun 29 00:01:07 npin3bullVT armbian-ramlog[10711]: cat: write error: No space left on device
Jun 29 00:01:08 npin3bullVT armbian-ramlog[10729]: cat: write error: No space left on device
Jun 29 00:01:08 npin3bullVT armbian-ramlog[10746]: cat: write error: No space left on device
Jun 29 00:01:08 npin3bullVT armbian-ramlog[10782]: cat: write error: No space left on device
Jun 29 00:01:09 npin3bullVT armbian-ramlog[10789]: cat: write error: No space left on device
Jun 29 00:01:09 npin3bullVT armbian-ramlog[10790]: cat: write error: No space left on device
Jun 29 00:01:10 npin3bullVT armbian-ramlog[10848]: cat: write error: No space left on device
Jun 29 00:01:10 npin3bullVT armbian-ramlog[10861]: cat: write error: No space left on device

 

I removed the journal backup files (in var/log/journal/*/system@*) to get fresh again:

npin3bullVT:flowprocessing:% ls -lsa /var/log/journal/50bea5a2341c40588d32c8103dea6e71                                             <config>
total 18016
  40 drwxr-sr-x+ 2 root systemd-journal   36864 Jun 29 07:56 .
   8 drwxr-sr-x+ 3 root systemd-journal    4096 Aug 26  2021 ..
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 00:16 system@28483b061ebe4b9a828c4b3066920f5d-0000000000000001-0005e28affeda3ec.journal
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 00:23 system@28483b061ebe4b9a828c4b3066920f5d-0000000000000c5c-0005e28b0865be92.journal
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 03:49 system@28483b061ebe4b9a828c4b3066920f5d-0000000000008f6f-0005e28b28f775c6.journal
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 07:15 system@28483b061ebe4b9a828c4b3066920f5d-0000000000009a6b-0005e28e0a446064.journal
2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 00:01 system@d4f3757edfbb4d17ad2866e73d1fd9c1-000000000000334d-0005e2899513914c.journal
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 07:59 system.journal
2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 07:59 user-1000.journal


For those interested I uploaded my armbianmonitor data http://ix.io/41aW, (using armbianmonitor -U > ~/armbianmonitor.out; cat  ~/armbianmonitor.out| curl -F 'f:1=<-' ix.io because it didn't pass the ping check).

Link to comment
Share on other sites

For excluding the corrupted journal files, I propose the following patch to syncFromDisk in armbian-ramlog:

 

--- /usr/lib/armbian/armbian-ramlog     2021-09-26 20:15:41.000000000 +0000
+++ armbian-ramlog      2022-06-29 09:22:18.927488185 +0000
@@ -69,6 +69,7 @@
                ${NoCache} rsync -aXWv --delete \
                        --exclude "lost+found" --exclude armbian-ramlog.log \
                        --exclude *.gz --exclude *.xz --exclude='*.[0-9]' \
+                       --exclude 'system@*' --exclude 'user-*@*' --exclude='*~' \
                        --links \
                        ${XTRA_RSYNC_FROM[@]+"${XTRA_RSYNC_FROM[@]}"} \
                        $HDD_LOG $RAM_LOG 2>&1 | $LOG_OUTPUT


 

Edited by Dennboy
Link to comment
Share on other sites

I have tested my patch using :
 

rsync -r --exclude 'system@*' --exclude 'user-*@*' --exclude='*~' /var/log/journal test

and it appears to work from the commandline. However, I still get corrupt journal files at boot. They may appear because journald writes to the journal files during the rsync.

npin3bullVT:backup:% uptime; ls -lsa /var/log/journal/50bea5a2341c40588d32c8103dea6e71
 09:57:09 up 21 min,  1 user,  load average: 2.15, 2.36, 1.83
total 10280
   8 drwxr-sr-x+ 2 root systemd-journal    4096 Jun 29 09:36 .
   8 drwxr-sr-x+ 3 root systemd-journal    4096 Aug 26  2021 ..
2564 -rw-r-----  1 root systemd-journal 2621440 Jun 29 09:36 system@0005e292e1d3c280-560a1a0a69b9966a.journal~
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 09:36 system@9cec3cbef79c442ab7f055042ddcbb75-0000000000000001-0005e292e1d0956a.journal
2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 09:57 system.journal
2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 09:41 user-1002.journal


The timestamp of the cached/corrupt files "*~" and "*@*" is during boot, 09:57:09 - 21minutes = 09:36

 

In /var/log/syslog I see that only the right journal files are copied:
 

Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/
Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/
Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/system.journal
Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/user-1000.journal
Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/user-1002.journal

 

Edited by Dennboy
Link to comment
Share on other sites

I noticed the following in dmesg, that may be the source of the cached/corrupted journal files. Looks like system journal got corrupted during shutdown and user journals are temporarilly not accessible due to a busy device.

npin3bullVT:~:% dmesg|grep -i journal
[   21.666206] systemd[1]: Listening on Journal Audit Socket.
[   21.668883] systemd[1]: Listening on Journal Socket (/dev/log).
[   21.682035] systemd[1]: Listening on Journal Socket.
[   26.931365] EXT4-fs (zram1): mounted filesystem without journal. Opts: discard
[   31.043188] systemd[1]: Starting Journal Service...
[   31.273498] systemd[1]: Started Journal Service.
[   31.327326] systemd-journald[588]: Received client request to flush runtime journal.
[   31.389135] systemd-journald[588]: File /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system.journal corrupted or uncleanly shut down, renaming and replacing.
[   47.628019] systemd-journald[588]: Failed to read journal file /var/log/journal/50bea5a2341c40588d32c8103dea6e71/user-1002.journal for rotation, trying to move it out of the way: Device or resource busy
[   47.630134] systemd-journald[588]: Failed to read journal file /var/log/journal/50bea5a2341c40588d32c8103dea6e71/user-1000.journal for rotation, trying to move it out of the way: Device or resource busy
Link to comment
Share on other sites

looks like my corrupted system.journal is due to influxdb still busy during shutdown while it can no longer write to /var/lib/influxdb

looks like the corrupted user journals are due to user processes still running (e.g. the login shell) while /var/log is no longer mounted

 

bottom line: to avoid journal corruption we may have to:

  • flush or stop journald before umounting memory-mapped /var/log,
  • or use the in-memory and persistent storage of journald itself in /var/run/journal and /var/log/journal respectively.
Link to comment
Share on other sites

It looks like the non-persistent journal on /run/log/journal/* is sometimes also active on my opi0 with Armbian 21.08.2.

 

A new approach that lets journald control volatile and persistent storage, but lets armbian decide when and where to flush to persistent:

  • configure journald to write volatile logs to /run/log/journal
  • move journal from tmpfs space: 
    • mv /var/log.hdd/journal /var/log.hdd.journal
  • create softlink in both persistent and volatile memory
    • ln -s /var/log.hdd.journal /var/log/journal
    • ln -s /var/log.hdd.journal /var/log.hdd/journal
  • in armbian-ramlog
    • flush the volatile journal to persistent /var/log/journal (i.e. this points to /var/log.hdd.journal)
      •  [ -L /var/log/journal ] && journalctl --flush
    • use rsync flag --one-file-system armbian-truncate-logs, to prevent copying the journal files anyway

Currently testing this approach, instead of the softlinks I am also trying mount --bind, but that seems tricky.

Link to comment
Share on other sites

After some experimenting I found a way to flush the journal and maintain logs in tmpfs using /run/log/journal, and only have a softlink from /var/log/journal to /var/log.hdd/journal. Additionally the cached journals are to be cleaned up when /var/log becomes too full.

 

 I created the attached patches for armbian-truncate-logs and armbian-ramlog: armbian-truncate-logs.patcharmbian-ramlog.patch 

 

Some logic was needed in cases where the user decides to do Storage=volatile journal logging in /etc/systemd/journald.conf, or when the user or journald with Storage=persistent creates a /var/log/journal folder which will be moved to /var/log.hdd/journal and replaced by the softlink.

 

I tested stop, start, write for armbian-ramlog, and truncate-logs. They seem to work fine.

Link to comment
Share on other sites

8 minutes ago, Dennboy said:

After some experimenting I found a way to flush the journal and maintain logs in tmpfs using /run/log/journal, and only have a softlink from /var/log/journal to /var/log.hdd/journal. Additionally the cached journals are to be cleaned up when /var/log becomes too full.


This looks like as ready to put it here https://github.com/armbian/build/pulls 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines