Dennboy Posted May 23, 2022 Posted May 23, 2022 Dear all, I occasionally have a full /var/log at startup on various boards, probably because the logs are not rotated /vaccuumd at boot (we had a thread about this in the old opi1 forum). The rotating /vaccuum is currently done every 15 minutes, so I figured why not do it directly at boot as well? For this purpose I updated /etc/cron.d/armbian-truncate-logs with an extra @reboot line: PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin */15 * * * * root /usr/lib/armbian/armbian-truncate-logs @reboot root /usr/lib/armbian/armbian-truncate-logs I rebooted several times, and didn't get a full /var/log anymore. All subsequent problems like failing networking when filesystem full also vanished;-) Kind regards, Dennis 1 Quote
Dennboy Posted May 23, 2022 Author Posted May 23, 2022 Guess I cheered a bit too early, the vacuum part of /var/log.hdd may need to be done a bit earlier than cron @reboot. Still occasionally get: $ systemctl --failed UNIT LOAD ACTIVE SUB DESCRIPTION ● sysstat.service loaded failed failed Resets System Activity Logs And a lot of errors that syslog is not able to write: May 23 17:05:31 SBC5029 rsyslogd[597]: file '/var/log/syslog'[7] write error - > May 23 17:05:31 SBC5029 rsyslogd[597]: action 'action-1-builtin:omfile' (module> May 23 17:05:31 SBC5029 rsyslogd[597]: file '/var/log/syslog'[7] write error - > May 23 17:05:31 SBC5029 rsyslogd[597]: action 'action-1-builtin:omfile' (module> May 23 17:05:31 SBC5029 rsyslogd[597]: file '/var/log/syslog'[7] write error - > 0 Quote
Dennboy Posted May 23, 2022 Author Posted May 23, 2022 Since /usr/lib/armbian/armbian-truncate-logs does not always seem to trigger the vacuum from cron, I changed cronjob to the vacuum itself. So far it seems to work. @reboot root journalctl --vacuum-size=10M 0 Quote
Dennboy Posted June 1, 2022 Author Posted June 1, 2022 Another thing I noticed is that after a few reboots, journal fails are reported for some users: [ 44.505361] systemd-journald[525]: File /var/log/journal/078479a1b0a4459f80590dd571c3d6de/user-1000.journal corrupted or uncleanly shut down, renaming and replacing. [ 63.921282] systemd-journald[525]: File /var/log/journal/078479a1b0a4459f80590dd571c3d6de/user-1004.journal corrupted or uncleanly shut down, renaming and replacing. This also fills /var/log, and the renamed journal files seem to be persistent over reboots: dennis@SBC5037:~$ ls -lsa /var/log/journal/078479a1b0a4459f80590dd571c3d6de/ total 25660 8 drwxr-sr-x+ 2 root systemd-journal 4096 Jun 1 15:35 . 8 drwxr-sr-x+ 3 root systemd-journal 4096 Oct 18 2021 .. 2564 -rw-r----- 1 root systemd-journal 2621440 Jun 1 15:34 system@0005e062f36ff9d9-3c70b6ef958abbf0.journal~ 2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 1 15:37 system.journal 2564 -rw-r----- 1 root systemd-journal 2621440 Jun 1 15:34 user-1000@0005e062f48c3d16-92482727877728b0.journal~ 2564 -rw-r----- 1 root systemd-journal 2621440 Jun 1 15:16 user-1000@3fb2bae4805e4f9188406e3f40cb4ebd-0000000000000334-0005e0628 0a3e785.journal 2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 1 15:34 user-1000.journal 2564 -rw-r----- 1 root systemd-journal 2621440 Jun 1 15:34 user-1003@0005e062f3bb855f-b21d739a16f87609.journal~ 2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 1 15:34 user-1003.journal 2564 -rw-r----- 1 root systemd-journal 2621440 Jun 1 15:35 user-1004@0005e062f5b47c59-b74bb7b9498fbd75.journal~ 2564 -rw-r----- 1 root systemd-journal 2621440 Jun 1 15:16 user-1004@4fb34549fd9b4e71b64a6e556942b437-000000000000034a-0005e0628 162ea3d.journal 2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 1 15:36 user-1004.journal Could this be a result of user jobs that are still running when the systemd user journal is stopped? 0 Quote
Dennboy Posted June 5, 2022 Author Posted June 5, 2022 As a work-around to the failed user journals clogging /var/log, I now clear them in /etc/rc.local: rm -f /var/log.hdd/journal/*/user-* /var/log/journal/*/user-* 0 Quote
Dennboy Posted June 14, 2022 Author Posted June 14, 2022 After some time also systemd-journal stores a number of recovery files, some of them truncated, and they are not removed at boot: % ls -lsa /var/log/journal/50bea5a2341c40588d32c8103dea6e71/*@* 2548 -rw-r--r-- 1 root systemd-journal 2609152 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@0005cf6abd179fba-93695493c7232496.journal~ 1716 -rw-r--r-- 1 root systemd-journal 1757184 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@0005dfd3a2da0b28-85157396073fd2e2.journal~ 180 -rw-r--r-- 1 root systemd-journal 184320 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@36be248beb2849f0b9dc25a1a84098ef-000000000004c701-0005cf37b3036796.journal 2216 -rw-r--r-- 1 root systemd-journal 2269184 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@36be248beb2849f0b9dc25a1a84098ef-000000000004da0b-0005cf54d29ac2e7.journal 524 -rw-r--r-- 1 root systemd-journal 536576 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@5c31986800b04c39a817e02f2abf0502-0000000000000001-0005dfc277cb07ae.journal 2560 -rw-r--r-- 1 root systemd-journal 2621440 Jun 14 00:00 /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system@5c31986800b04c39a817e02f2abf0502-0000000000000baa-0005dfcaef17af28.journal I think it would be a good idea to always start with an empty journal at startup, and keep the /var/log.hdd/journal just for history of previous boots. This may also be cleaned periodically, e.g. mine now contains 295 files, most of them empty. 0 Quote
Dennboy Posted June 29, 2022 Author Posted June 29, 2022 I see the following in the syslog at night: Jun 29 00:00:07 npin3bullVT armbian-ramlog[10481]: Wed 29 Jun 2022 12:00:02 AM UTC: Syncing logs to storage Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: sending incremental file list Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: ./ Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: alternatives.log Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: armbian-hardware-monitor.log Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: auth.log Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: bootstrap.log Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: btmp Jun 29 00:00:09 npin3bullVT armbian-ramlog[10493]: daemon.log Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: debug Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: dpkg.log Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: faillog Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: kern.log Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: lastlog Jun 29 00:00:10 npin3bullVT armbian-ramlog[10493]: messages Jun 29 00:01:07 npin3bullVT armbian-ramlog[10677]: cat: write error: No space left on device Jun 29 00:01:07 npin3bullVT armbian-ramlog[10703]: cat: write error: No space left on device Jun 29 00:01:07 npin3bullVT armbian-ramlog[10710]: cat: write error: No space left on device Jun 29 00:01:07 npin3bullVT armbian-ramlog[10711]: cat: write error: No space left on device Jun 29 00:01:08 npin3bullVT armbian-ramlog[10729]: cat: write error: No space left on device Jun 29 00:01:08 npin3bullVT armbian-ramlog[10746]: cat: write error: No space left on device Jun 29 00:01:08 npin3bullVT armbian-ramlog[10782]: cat: write error: No space left on device Jun 29 00:01:09 npin3bullVT armbian-ramlog[10789]: cat: write error: No space left on device Jun 29 00:01:09 npin3bullVT armbian-ramlog[10790]: cat: write error: No space left on device Jun 29 00:01:10 npin3bullVT armbian-ramlog[10848]: cat: write error: No space left on device Jun 29 00:01:10 npin3bullVT armbian-ramlog[10861]: cat: write error: No space left on device I removed the journal backup files (in var/log/journal/*/system@*) to get fresh again: npin3bullVT:flowprocessing:% ls -lsa /var/log/journal/50bea5a2341c40588d32c8103dea6e71 <config ✗> total 18016 40 drwxr-sr-x+ 2 root systemd-journal 36864 Jun 29 07:56 . 8 drwxr-sr-x+ 3 root systemd-journal 4096 Aug 26 2021 .. 2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 00:16 system@28483b061ebe4b9a828c4b3066920f5d-0000000000000001-0005e28affeda3ec.journal 2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 00:23 system@28483b061ebe4b9a828c4b3066920f5d-0000000000000c5c-0005e28b0865be92.journal 2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 03:49 system@28483b061ebe4b9a828c4b3066920f5d-0000000000008f6f-0005e28b28f775c6.journal 2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 07:15 system@28483b061ebe4b9a828c4b3066920f5d-0000000000009a6b-0005e28e0a446064.journal 2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 00:01 system@d4f3757edfbb4d17ad2866e73d1fd9c1-000000000000334d-0005e2899513914c.journal 2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 07:59 system.journal 2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 07:59 user-1000.journal For those interested I uploaded my armbianmonitor data http://ix.io/41aW, (using armbianmonitor -U > ~/armbianmonitor.out; cat ~/armbianmonitor.out| curl -F 'f:1=<-' ix.io because it didn't pass the ping check). 0 Quote
Dennboy Posted June 29, 2022 Author Posted June 29, 2022 (edited) For excluding the corrupted journal files, I propose the following patch to syncFromDisk in armbian-ramlog: --- /usr/lib/armbian/armbian-ramlog 2021-09-26 20:15:41.000000000 +0000 +++ armbian-ramlog 2022-06-29 09:22:18.927488185 +0000 @@ -69,6 +69,7 @@ ${NoCache} rsync -aXWv --delete \ --exclude "lost+found" --exclude armbian-ramlog.log \ --exclude *.gz --exclude *.xz --exclude='*.[0-9]' \ + --exclude 'system@*' --exclude 'user-*@*' --exclude='*~' \ --links \ ${XTRA_RSYNC_FROM[@]+"${XTRA_RSYNC_FROM[@]}"} \ $HDD_LOG $RAM_LOG 2>&1 | $LOG_OUTPUT Edited June 29, 2022 by Dennboy 0 Quote
Dennboy Posted June 29, 2022 Author Posted June 29, 2022 (edited) I have tested my patch using : rsync -r --exclude 'system@*' --exclude 'user-*@*' --exclude='*~' /var/log/journal test and it appears to work from the commandline. However, I still get corrupt journal files at boot. They may appear because journald writes to the journal files during the rsync. npin3bullVT:backup:% uptime; ls -lsa /var/log/journal/50bea5a2341c40588d32c8103dea6e71 09:57:09 up 21 min, 1 user, load average: 2.15, 2.36, 1.83 total 10280 8 drwxr-sr-x+ 2 root systemd-journal 4096 Jun 29 09:36 . 8 drwxr-sr-x+ 3 root systemd-journal 4096 Aug 26 2021 .. 2564 -rw-r----- 1 root systemd-journal 2621440 Jun 29 09:36 system@0005e292e1d3c280-560a1a0a69b9966a.journal~ 2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 09:36 system@9cec3cbef79c442ab7f055042ddcbb75-0000000000000001-0005e292e1d0956a.journal 2568 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 09:57 system.journal 2564 -rw-r-----+ 1 root systemd-journal 2621440 Jun 29 09:41 user-1002.journal The timestamp of the cached/corrupt files "*~" and "*@*" is during boot, 09:57:09 - 21minutes = 09:36 In /var/log/syslog I see that only the right journal files are copied: Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/ Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/ Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/system.journal Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/user-1000.journal Jun 29 09:36:10 npin3bullVT armbian-ramlog[484]: journal/50bea5a2341c40588d32c8103dea6e71/user-1002.journal Edited June 29, 2022 by Dennboy 0 Quote
Dennboy Posted June 29, 2022 Author Posted June 29, 2022 I noticed the following in dmesg, that may be the source of the cached/corrupted journal files. Looks like system journal got corrupted during shutdown and user journals are temporarilly not accessible due to a busy device. npin3bullVT:~:% dmesg|grep -i journal [ 21.666206] systemd[1]: Listening on Journal Audit Socket. [ 21.668883] systemd[1]: Listening on Journal Socket (/dev/log). [ 21.682035] systemd[1]: Listening on Journal Socket. [ 26.931365] EXT4-fs (zram1): mounted filesystem without journal. Opts: discard [ 31.043188] systemd[1]: Starting Journal Service... [ 31.273498] systemd[1]: Started Journal Service. [ 31.327326] systemd-journald[588]: Received client request to flush runtime journal. [ 31.389135] systemd-journald[588]: File /var/log/journal/50bea5a2341c40588d32c8103dea6e71/system.journal corrupted or uncleanly shut down, renaming and replacing. [ 47.628019] systemd-journald[588]: Failed to read journal file /var/log/journal/50bea5a2341c40588d32c8103dea6e71/user-1002.journal for rotation, trying to move it out of the way: Device or resource busy [ 47.630134] systemd-journald[588]: Failed to read journal file /var/log/journal/50bea5a2341c40588d32c8103dea6e71/user-1000.journal for rotation, trying to move it out of the way: Device or resource busy 0 Quote
Dennboy Posted June 29, 2022 Author Posted June 29, 2022 looks like my corrupted system.journal is due to influxdb still busy during shutdown while it can no longer write to /var/lib/influxdb looks like the corrupted user journals are due to user processes still running (e.g. the login shell) while /var/log is no longer mounted bottom line: to avoid journal corruption we may have to: flush or stop journald before umounting memory-mapped /var/log, or use the in-memory and persistent storage of journald itself in /var/run/journal and /var/log/journal respectively. 0 Quote
Dennboy Posted July 1, 2022 Author Posted July 1, 2022 It looks like the non-persistent journal on /run/log/journal/* is sometimes also active on my opi0 with Armbian 21.08.2. A new approach that lets journald control volatile and persistent storage, but lets armbian decide when and where to flush to persistent: configure journald to write volatile logs to /run/log/journal move journal from tmpfs space: mv /var/log.hdd/journal /var/log.hdd.journal create softlink in both persistent and volatile memory ln -s /var/log.hdd.journal /var/log/journal ln -s /var/log.hdd.journal /var/log.hdd/journal in armbian-ramlog flush the volatile journal to persistent /var/log/journal (i.e. this points to /var/log.hdd.journal) [ -L /var/log/journal ] && journalctl --flush use rsync flag --one-file-system armbian-truncate-logs, to prevent copying the journal files anyway Currently testing this approach, instead of the softlinks I am also trying mount --bind, but that seems tricky. 0 Quote
Dennboy Posted July 6, 2022 Author Posted July 6, 2022 After some experimenting I found a way to flush the journal and maintain logs in tmpfs using /run/log/journal, and only have a softlink from /var/log/journal to /var/log.hdd/journal. Additionally the cached journals are to be cleaned up when /var/log becomes too full. I created the attached patches for armbian-truncate-logs and armbian-ramlog: armbian-truncate-logs.patcharmbian-ramlog.patch Some logic was needed in cases where the user decides to do Storage=volatile journal logging in /etc/systemd/journald.conf, or when the user or journald with Storage=persistent creates a /var/log/journal folder which will be moved to /var/log.hdd/journal and replaced by the softlink. I tested stop, start, write for armbian-ramlog, and truncate-logs. They seem to work fine. 1 Quote
Igor Posted July 6, 2022 Posted July 6, 2022 8 minutes ago, Dennboy said: After some experimenting I found a way to flush the journal and maintain logs in tmpfs using /run/log/journal, and only have a softlink from /var/log/journal to /var/log.hdd/journal. Additionally the cached journals are to be cleaned up when /var/log becomes too full. This looks like as ready to put it here https://github.com/armbian/build/pulls 1 Quote
Dennboy Posted July 7, 2022 Author Posted July 7, 2022 (edited) Thanks Igor, I've created a pull request (https://github.com/armbian/build/pull/3967) after some tweaks & tests with non-rsync ramlog. Edited July 7, 2022 by Dennboy 1 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.