Z11ntal33r Posted May 16, 2019 Posted May 16, 2019 Hi, I've two Tinker Board servers running latest Armbian stable Debian GNU/Linux 9 with 4.19.*-rockchip. In that regard, I've connected some HDDs and from the log I see that every day at a specific time all HDDs are spinning up as seen below in the spoiler. This happens for both servers at the same specific time every day.. Spoiler # Server 1 May 13 06:25:26 Tinkerboard hd-idle[29538]: sda spinup May 13 06:25:26 Tinkerboard hd-idle[29538]: sdc spinup May 13 06:25:26 Tinkerboard hd-idle[29538]: sdb spinup May 13 06:35:36 Tinkerboard hd-idle[29538]: sda spindown May 13 06:35:45 Tinkerboard hd-idle[29538]: sdc spindown May 13 06:35:45 Tinkerboard hd-idle[29538]: sdb spindown May 14 06:26:07 Tinkerboard hd-idle[29538]: sda spinup May 14 06:26:07 Tinkerboard hd-idle[29538]: sdc spinup May 14 06:26:07 Tinkerboard hd-idle[29538]: sdb spinup May 14 06:36:17 Tinkerboard hd-idle[29538]: sda spindown May 14 06:36:26 Tinkerboard hd-idle[29538]: sdc spindown May 14 06:36:26 Tinkerboard hd-idle[29538]: sdb spindown May 15 06:25:51 Tinkerboard hd-idle[29538]: sda spinup May 15 06:25:51 Tinkerboard hd-idle[29538]: sdc spinup May 15 06:25:51 Tinkerboard hd-idle[29538]: sdb spinup May 15 06:36:01 Tinkerboard hd-idle[29538]: sda spindown May 15 06:36:10 Tinkerboard hd-idle[29538]: sdc spindown May 15 06:36:10 Tinkerboard hd-idle[29538]: sdb spindown May 16 06:25:33 Tinkerboard hd-idle[29538]: sda spinup May 16 06:25:33 Tinkerboard hd-idle[29538]: sdc spinup May 16 06:25:33 Tinkerboard hd-idle[29538]: sdb spinup May 16 06:35:43 Tinkerboard hd-idle[29538]: sda spindown May 16 06:35:52 Tinkerboard hd-idle[29538]: sdc spindown May 16 06:35:52 Tinkerboard hd-idle[29538]: sdb spindown # Server 2 May 13 06:25:37 Tinkerboard hd-idle[794]: sda spinup May 13 06:25:37 Tinkerboard hd-idle[794]: sdb spinup May 13 06:35:42 Tinkerboard hd-idle[794]: sda spindown May 13 06:35:43 Tinkerboard hd-idle[794]: sdb spindown May 14 06:25:45 Tinkerboard hd-idle[794]: sda spinup May 14 06:25:45 Tinkerboard hd-idle[794]: sdb spinup May 14 06:35:52 Tinkerboard hd-idle[794]: sda spindown May 14 06:35:53 Tinkerboard hd-idle[794]: sdb spindown May 15 06:25:48 Tinkerboard hd-idle[794]: sda spinup May 15 06:25:48 Tinkerboard hd-idle[794]: sdb spinup May 15 06:35:56 Tinkerboard hd-idle[794]: sda spindown May 15 06:35:56 Tinkerboard hd-idle[794]: sdb spindown May 16 06:25:44 Tinkerboard hd-idle[794]: sda spinup May 16 06:25:44 Tinkerboard hd-idle[794]: sdb spinup May 16 06:35:54 Tinkerboard hd-idle[794]: sda spindown May 16 06:35:54 Tinkerboard hd-idle[794]: sdb spindown I've been thinking of using fatrace to trace file access events around the specific time to find which service that spins up my HDDs, however, when I try to use fatrace by running the code below I get the message "Cannot initialize fanotify: Function not implemented". I've come across two bug reports which seems to be related to this issue: some archs/kernels define O_LARGEFILE and Hardcoded KERNEL_O_LARGEFILE does not work on ARM. So, I guess fatrace is not an option given that it's not enabled in the kernel configuration. $ sudo fatrace -o /tmp/trace -s 60 Cannot initialize fanotify: Function not implemented btrace doesn't seem to be working either as seen below. $ sudo btrace /dev/sdb BLKTRACESETUP(2) /dev/sdb failed: 5/Input/output error Neither is iotop as seen below. $ sudo iotop Could not run iotop as some of the requirements are not met: - Linux >= 2.6.20 with - I/O accounting support (CONFIG_TASKSTATS, CONFIG_TASK_DELAY_ACCT, CONFIG_TASK_IO_ACCOUNTING) Any recommendation for tools that I can use to find out which process that is spinning up my HDDs every day? Thanks in advance!
Tido Posted May 16, 2019 Posted May 16, 2019 back in 2016, it wouldn't run because of an option in the kernel configuration... https://forum.armbian.com/topic/896-fatrace-file-access-trace/
Z11ntal33r Posted May 16, 2019 Author Posted May 16, 2019 I suspected that "Function not implemented" was related to some kernel configuration, however, I'm looking for an alternative to find processes which are accessing all three HDDs every day at the same time without disabling processes one by one... I've found blktrace which seems to give some relevant information as seen below. So, I'm going to run the following command a minute before the timestamp my HDDs are spinning up $ sudo blktrace -d /dev/sdb -o - | blkparse -i - 8,16 1 1 0.000000000 17 C N [0] 8,16 2 1 1266874889.708500152 888 G N [smartd] 8,16 2 2 1266874889.708505985 888 I N 0 [smartd] 8,16 2 3 1266874889.708510360 888 D N 0 [smartd] 8,16 1 2 5.001279610 17 C N [0] 8,16 1 3 5.002521825 17 C N [65531] 8,16 2 4 5.000177978 888 G N [smartd] 8,16 2 5 5.000184978 888 I N 0 [smartd] 8,16 2 6 5.000188770 888 D N 0 [smartd] 8,16 2 7 5.001649445 30968 G N [kworker/2:2] 8,16 2 8 5.001652070 30968 I N 0 [kworker/2:2] 8,16 2 9 5.001653528 30968 D N 0 [kworker/2:2] I'll also run strace to see if I can get some more information regarding this sudo strace -f -e open -t ls 2>&1 Tracking syscalls with auditctl (Audit framework) is also a possibility I've found and might be possible if kernels CONFIG_AUDIT is enabled. I'll look more into auditctl tomorrow if I don't find some relevant information from blktrace or strace.
Z11ntal33r Posted June 16, 2019 Author Posted June 16, 2019 As I was in the middle of the exam period when I posted this thread, I stopped all daily cron jobs as a shortcut, which fixed this issue. I've since found out that the daily cron job etc/cron.daily/armbian-ram-logging is the problem. armbian-ram-logging contains #!/bin/sh /usr/lib/armbian/armbian-ramlog write >/dev/null 2>&1 /usr/lib/armbian/armbian-ramlog contains Spoiler #!/bin/bash # # Copyright (c) Authors: http://www.armbian.com/authors # # This file is licensed under the terms of the GNU General Public # License version 2. This program is licensed "as is" without any # warranty of any kind, whether express or implied. SIZE=50M USE_RSYNC=true ENABLED=false [ -f /etc/default/armbian-ramlog ] && . /etc/default/armbian-ramlog [ "$ENABLED" != true ] && exit 0 # Never touch anything below here. Only edit /etc/default/armbian-ramlog HDD_LOG=/var/log.hdd/ RAM_LOG=/var/log/ LOG2RAM_LOG="${HDD_LOG}armbian-ramlog.log" LOG_OUTPUT="tee -a $LOG2RAM_LOG" isSafe () { [ -d $HDD_LOG ] || (echo "ERROR: $HDD_LOG doesn't exist! Can't sync." >&2 ; exit 1) NoCache=$(which nocache 2>/dev/null) } RecreateLogs (){ # in case of crash those services don't start if there are no dirs & logs check_if_installed apache2 && [ ! -d /var/log/apache2 ] && mkdir -p /var/log/apache2 check_if_installed cron-apt && [ ! -d /var/log/cron-apt ] && \ (mkdir -p /var/log/cron-apt ; touch /var/log/cron-apt/log) check_if_installed proftpd-basic && [ ! -d /var/log/proftpd ] && \ (mkdir -p /var/log/proftpd ; touch /var/log/proftpd/controls.log) check_if_installed nginx && [ ! -d /var/log/nginx ] && \ (mkdir -p /var/log/nginx ; touch /var/log/nginx/access.log ; touch /var/log/nginx/error.log) check_if_installed samba && [ ! -d /var/log/samba ] && mkdir -p /var/log/samba check_if_installed unattended-upgrades && [ ! -d /var/log/unattended-upgrades ] && mkdir -p /var/log/unattended-upgrades return 0 } syncToDisk () { isSafe echo -e "\n\n$(date): Syncing logs from $LOG_TYPE to storage\n" | $LOG_OUTPUT if [ "$USE_RSYNC" = true ]; then ${NoCache} rsync -aXWv --delete --exclude armbian-ramlog.log --links $RAM_LOG $HDD_LOG 2>&1 | $LOG_OUTPUT else ${NoCache} cp -rfup $RAM_LOG -T $HDD_LOG 2>&1 | $LOG_OUTPUT fi sync } syncFromDisk () { isSafe echo -e "\n\n$(date): Loading logs from storage to $LOG_TYPE\n" | $LOG_OUTPUT if [ "$USE_RSYNC" = true ]; then ${NoCache} rsync -aXWv --delete --exclude armbian-ramlog.log --exclude *.gz --exclude='*.[0-9]' --links $HDD_LOG $RAM_LOG 2>&1 | $LOG_OUTPUT else ${NoCache} find $HDD_LOG* -maxdepth 1 -type f -not \( -name '*.[0-9]' -or -name '*.xz*' -or -name '*.gz' \) | xargs cp -ut $RAM_LOG fi sync } check_if_installed () { local DPKG_Status="$(dpkg -s "$1" 2>/dev/null | awk -F": " '/^Status/ {print $2}')" if [[ "X${DPKG_Status}" = "X" || "${DPKG_Status}" = *deinstall* ]]; then return 1 else return 0 fi } # Check whether zram device is available or we need to use tmpfs if [ "$(blkid -s TYPE /dev/zram0 | awk ' { print $2 } ' | grep ext4)" ]; then LOG_TYPE="zram" else LOG_TYPE="tmpfs" fi case "$1" in start) [ -d $HDD_LOG ] || mkdir -p $HDD_LOG mount --bind $RAM_LOG $HDD_LOG mount --make-private $HDD_LOG case $LOG_TYPE in zram) echo -e "Mounting /dev/zram0 as $RAM_LOG \c" | $LOG_OUTPUT mount -o discard /dev/zram0 $RAM_LOG 2>&1 | $LOG_OUTPUT ;; tmpfs) echo -e "Setting up $RAM_LOG as tmpfs \c" | $LOG_OUTPUT mount -t tmpfs -o nosuid,noexec,nodev,mode=0755,size=$SIZE armbian-ramlog $RAM_LOG 2>&1 | $LOG_OUTPUT ;; esac syncFromDisk RecreateLogs ;; stop) syncToDisk umount -l $RAM_LOG umount -l $HDD_LOG ;; write) syncToDisk ;; *) echo "Usage: ${0##*/} {start|stop|write}" >&2 exit 1 ;; esac /etc/default/armbian-ramlog contains Spoiler # configuration values for the armbian-ram-logging service # # enable the armbian-ram-logging service? ENABLED=true # # size of the tmpfs mount -- please keep in mind to adjust /etc/default/armbian-zram-config too when increasing SIZE=50M # # use rsync instead of cp -r # requires rsync installed, may provide better performance # due to copying only new and changed files USE_RSYNC=true So, I'm going to modify the file /etc/default/armbian-ramlog to prevent it from waking up my external hard drives every day. In that regard, it's much appreciated if anyone could point out which lines in the file that might wake up the hard drives. I'm thinking about line: 81 with "... blkid -s TYPE /dev/zram0 ..." which I think wake up all drives. What do you think? 65 with the find command "... find $HDD_LOG ...", yet it should only search for folders in the folder $HDD_LOG (/var/log.hdd/). @Igor, this issue is likely something that others might find interesting as armbian-ramlog currently is spinning up all connected drives, which shouldn't be necessary.
Igor Posted June 16, 2019 Posted June 16, 2019 3 hours ago, Z11ntal33r said: which I think wake up all drives. What do you think? Yes, it looks like that is the problem so this part needs to be done differently.
Igor Posted June 16, 2019 Posted June 16, 2019 Something like this would probably be just fine: Spoiler --- a/packages/bsp/common/usr/lib/armbian/armbian-ramlog +++ b/packages/bsp/common/usr/lib/armbian/armbian-ramlog @@ -43,7 +43,7 @@ RecreateLogs (){ syncToDisk () { isSafe - echo -e "\n\n$(date): Syncing logs from $LOG_TYPE to storage\n" | $LOG_OUTPUT + echo -e "\n\n$(date): Syncing logs to storage\n" | $LOG_OUTPUT if [ "$USE_RSYNC" = true ]; then ${NoCache} rsync -aXWv --delete --exclude "lost+found" --exclude armbian-ramlog.log --links $RAM_LOG $HDD_LOG 2>&1 | $LOG_OUTPUT @@ -57,7 +57,7 @@ syncToDisk () { syncFromDisk () { isSafe - echo -e "\n\n$(date): Loading logs from storage to $LOG_TYPE\n" | $LOG_OUTPUT + echo -e "\n\n$(date): Loading logs from storage\n" | $LOG_OUTPUT if [ "$USE_RSYNC" = true ]; then ${NoCache} rsync -aXWv --delete --exclude "lost+found" --exclude armbian-ramlog.log --exclude *.gz --exclude='*.[0-9]' --links $HDD_LOG $RAM_LOG 2>&1 | $LOG_OUTPUT @@ -77,19 +77,19 @@ check_if_installed () { fi } -# Check whether zram device is available or we need to use tmpfs -if [ "$(blkid -s TYPE /dev/zram0 | awk ' { print $2 } ' | grep ext4)" ]; then - LOG_TYPE="zram" -else - LOG_TYPE="tmpfs" -fi - case "$1" in start) [ -d $HDD_LOG ] || mkdir -p $HDD_LOG mount --bind $RAM_LOG $HDD_LOG mount --make-private $HDD_LOG + # Check whether zram device is available or we need to use tmpfs + if [ "$(blkid -s TYPE /dev/zram0 | awk ' { print $2 } ' | grep ext4)" ]; then + LOG_TYPE="zram" + else + LOG_TYPE="tmpfs" + fi + case $LOG_TYPE in zram) echo -e "Mounting /dev/zram0 as $RAM_LOG \c" | $LOG_OUTPUT
Z11ntal33r Posted June 16, 2019 Author Posted June 16, 2019 I changed /usr/lib/armbian/armbian-ramlog directly given your changes (Bugfix: check LOG_TYPE only at script start #1417), made it executable and rebooted the server, yet it spun up the hard drives when I changed the time for daily cron jobs to trigger etc/cron.daily/armbian-ram-logging. So, we are still not there yet... Please let me know if you want more testing etc
Igor Posted June 16, 2019 Posted June 16, 2019 49 minutes ago, Z11ntal33r said: I changed the time for daily cron jobs to trigger etc/cron.daily/armbian-ram-logging. So, we are still not there yet... Please let me know if you want more testing etc But now it should not touch hard drive at all ... unless your /var/log is the hard drive. Edit: it works for me. HDD remain sleeping after executing command found in cronjob /usr/lib/armbian/armbian-ramlog write
Z11ntal33r Posted June 16, 2019 Author Posted June 16, 2019 I know, yet the hard drives spins up... /var/log is on my Micro SD card. My hard drives are mounted with fstab to folders in "/mnt/usb*". Everything else is on my Micro SD card, so I've no clue why armbian-ramlog is still spinning up my hard drives...
Igor Posted June 16, 2019 Posted June 16, 2019 22 minutes ago, Z11ntal33r said: so I've no clue why armbian-ramlog is still spinning up my hard drives... Perhaps restarting crond? service cron restart or by rebooting?
Z11ntal33r Posted June 16, 2019 Author Posted June 16, 2019 I rebooted after the changes last time, and tried again now by restarting cron, yet still the same result. All hard drives spin up during daily cron and spin down after 10 minutes. So there seems to be something we are missing.
Igor Posted June 17, 2019 Posted June 17, 2019 9 hours ago, Z11ntal33r said: All hard drives spin up during daily cron and spin down after 10 minutes. So there seems to be something we are missing. Armbian-ramlog part only execute: "armbian-ramlog write" and if that command doesn't spin the HDD ... ? Do you have anything else in daily cron?
Z11ntal33r Posted June 17, 2019 Author Posted June 17, 2019 I’ve several other jobs in daily cron, but none of them trigger HDD spin ups. I started to look into this by removing all jobs and adding one by one until HDDs are spun up during daily cron task. So commenting /etc/cron.daily/armbian-ram-logging without removing the file stops my HDDs from spinning up. I’ll look into this later today when I get home. The way you have tested this is by running “/usr/lib/armbian/armbian-ramlog write”, correct?
Igor Posted June 17, 2019 Posted June 17, 2019 8 minutes ago, Z11ntal33r said: The way you have tested this is by running “/usr/lib/armbian/armbian-ramlog write”, correct? Exactly. Because there was a problem in our script ... command blkid -s TYPE /dev/zram0 spins all hard drives up but now it only gets executed at boot time, when service is enabled.
Tido Posted June 17, 2019 Posted June 17, 2019 2 hours ago, Z11ntal33r said: I’ve several other jobs in daily cron, could it be a correlation? After updating the code you could try again by removing all of your jobs and adding one by one.
Z11ntal33r Posted June 17, 2019 Author Posted June 17, 2019 I've found the issue. As I did only check hd-idle log to see if the HDD's are spinning or not, it seems to be saying that the HDDs are spinning up when they actually are not. This is likely related to that hd-idle is using /proc/diskstats to read disk statistics and then writes the log to systemd as seen below. The hard drives are not spinning up and everything seems fine, disregarding the wrong logging. Thanks for the great support! Server 1 Spoiler | => sudo /usr/lib/armbian/armbian-ramlog write Mon Jun 17 17:36:44 CEST 2019: Syncing logs to storage sending incremental file list auth.log daemon.log lastlog messages syslog user.log wtmp sent 10,126,622 bytes received 159 bytes 20,253,562.00 bytes/sec total size is 27,876,771 speedup is 2.75 => journalctl -u hd-idle Jun 17 17:36:59 Tinkerboard hd-idle[774]: sda spinup Jun 17 17:36:59 Tinkerboard hd-idle[774]: sdb spinup Running the same command again gives | => sudo /usr/lib/armbian/armbian-ramlog write Mon Jun 17 18:06:37 CEST 2019: Syncing logs to storage sending incremental file list aptitude auth.log daemon.log dpkg.log fail2ban.log syslog apt/ apt/eipp.log.xz apt/history.log apt/term.log nginx/access.log => journalctl -u hd-idle Jun 17 17:36:59 Tinkerboard hd-idle[774]: sda spinup Jun 17 17:36:59 Tinkerboard hd-idle[774]: sdb spinup Jun 17 17:47:09 Tinkerboard hd-idle[774]: sda spindown Jun 17 17:47:09 Tinkerboard hd-idle[774]: sdb spindown Third time running the command gives | => sudo /usr/lib/armbian/armbian-ramlog write Mon Jun 17 18:11:57 CEST 2019: Syncing logs to storage sending incremental file list auth.log daemon.log syslog | => journalctl -u hd-idle Jun 17 17:36:59 Tinkerboard hd-idle[774]: sda spinup Jun 17 17:36:59 Tinkerboard hd-idle[774]: sdb spinup Jun 17 17:47:09 Tinkerboard hd-idle[774]: sda spindown Jun 17 17:47:09 Tinkerboard hd-idle[774]: sdb spindown Jun 17 18:07:29 Tinkerboard hd-idle[774]: sda spinup Jun 17 18:07:29 Tinkerboard hd-idle[774]: sdb spinup However, checking both drives with smartctl verifies that the hard drives are not spinning up, even when the log says they are. | => sudo smartctl -i -d sat -n standby /dev/sda smartctl 6.6 2017-11-05 r4594 [armv7l-linux-4.19.41-rockchip] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org Device is in STANDBY mode, exit(2) ________________________________________________________________________________ | => sudo smartctl -i -d sat -n standby /dev/sdb smartctl 6.6 2017-11-05 r4594 [armv7l-linux-4.19.41-rockchip] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org Device is in STANDBY mode, exit(2) | => journalctl -u hd-idle Jun 17 17:36:59 Tinkerboard hd-idle[774]: sda spinup Jun 17 17:36:59 Tinkerboard hd-idle[774]: sdb spinup Jun 17 17:47:09 Tinkerboard hd-idle[774]: sda spindown Jun 17 17:47:09 Tinkerboard hd-idle[774]: sdb spindown Jun 17 18:07:29 Tinkerboard hd-idle[774]: sda spinup Jun 17 18:07:29 Tinkerboard hd-idle[774]: sdb spinup Jun 17 18:22:44 Tinkerboard hd-idle[774]: sda spindown Jun 17 18:22:44 Tinkerboard hd-idle[774]: sdb spindown Server 2 Spoiler | => sudo /usr/lib/armbian/armbian-ramlog write Mon 17 Jun 2019 18:37:24 CEST: Syncing logs to storage sending incremental file list deleting user.log.1 deleting syslog.1 deleting php7.0-fpm.log.1 deleting messages.1 deleting kern.log.1 deleting fail2ban.log.1 deleting debug.1 deleting daemon.log.1 deleting auth.log.1 deleting armbian-hardware-monitor.log.1.gz deleting nginx/access.log.1 deleting samba/log.smbd.1 deleting samba/log.nmbd.1 ./ aptitude armbian-hardware-monitor.log auth.log daemon.log debug dpkg.log fail2ban.log kern.log lastlog messages openvpn.log php7.0-fpm.log syslog user.log wtmp apt/ apt/eipp.log.xz apt/history.log apt/term.log nginx/ nginx/access.log samba/ samba/log.nmbd samba/log.smbd sent 18,116,509 bytes received 681 bytes 36,234,380.00 bytes/sec total size is 18,110,220 speedup is 1.00 | => journalctl -u hd-idle Jun 17 18:37:32 Tinkerboard hd-idle[862]: sda spinup Jun 17 18:37:32 Tinkerboard hd-idle[862]: sdb spinup Jun 17 18:37:32 Tinkerboard hd-idle[862]: sdc spinup | => sudo smartctl -i -d sat -n standby /dev/sda smartctl 6.6 2016-05-31 r4324 [armv7l-linux-4.19.41-rockchip] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org Device is in STANDBY mode, exit(2) ________________________________________________________________________________ | => sudo smartctl -i -d sat -n standby /dev/sdb smartctl 6.6 2016-05-31 r4324 [armv7l-linux-4.19.41-rockchip] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org Device is in STANDBY mode, exit(2) ________________________________________________________________________________ | => sudo smartctl -i -d sat -n standby /dev/sdc smartctl 6.6 2016-05-31 r4324 [armv7l-linux-4.19.41-rockchip] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org Device is in STANDBY mode, exit(2) 1
bxm Posted September 24, 2019 Posted September 24, 2019 I've found, that SATA HDD are still spins-up during daily cron job. Short investigation showed, that the cause is the "sync" command in the syncToDisk() function. Replacing it with "sync /var" (or "sync /", to be safe), completely fixes this problem. 1
Z11ntal33r Posted November 10, 2019 Author Posted November 10, 2019 @bxm, In both functions, syncToDisk () and syncFromDisk (), it should be "sync /" instead of only "sync" (without the quotes) and the issue is fixed? If that's the case, could you add a PR request on Github for the change? Update I can confirm that changing "sync" to both "sync /" or "sync /var" did not wake up my hdds or trigger any hdd-spinup logging according to "systemctl status hd-idle". Thanks!
Recommended Posts