sibianul Posted April 18, 2021 Posted April 18, 2021 Hello everyone, I have a home dashboard (apache2 + php website) on this bananapi since 2018, when I installed a new SSD on it, now I noticed I can't open my dashboard anymore, and when I checked the logs I seen the drive was set to read-only mode. I started to copy my data from it, but I hope I won't need to change the ssd drive. Anyone around to give me some advices, how I could check and fix eventually the existing drive? Bellow are some logs. Thank you root@bananapi:/var/log# dmesg | grep sda [ 2.876748] sd 0:0:0:0: [sda] 234441648 512-byte logical blocks: (120 GB/112 GiB) [ 2.876842] sd 0:0:0:0: [sda] Write Protect is off [ 2.876852] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 2.876992] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 2.879218] sda: sda1 [ 2.881067] sd 0:0:0:0: [sda] Attached SCSI disk [ 5.535028] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) [ 6.807928] EXT4-fs (sda1): re-mounted. Opts: commit=600,errors=remount-ro [2962391.488423] EXT4-fs error (device sda1): ext4_validate_block_bitmap:376: comm apache2: bg 578: bad block bitmap checksum [2962391.500592] Aborting journal on device sda1-8. [2962391.501346] EXT4-fs (sda1): Remounting filesystem read-only [2962391.507343] EXT4-fs error (device sda1) in ext4_writepages:2884: IO failure [2969323.575865] EXT4-fs error (device sda1): ext4_remount:5257: Abort forced by user root@bananapi:/var/log# cat /etc/fstab # <file system> <mount point> <type> <options> <dump> <pass> tmpfs /tmp tmpfs defaults,nosuid 0 0 UUID=62fc7248-9a57-4024-90d9-b4767bd2c697 /media/mmcboot ext4 defaults,noatime,nodiratime,commit=600,errors=remount-ro,x-gvfs-hide 0 1 /media/mmcboot/boot /boot none bind 0 0 UUID=9fb21562-a6fa-4b60-8453-bcf5bdda898a / ext4 defaults,noatime,nodiratime,commit=600,errors=remount-ro,x-gvfs-hide 0 1 root@bananapi:/var/log# mount -o remount, rw / mount: cannot remount rw read-write, is write-protected root@bananapi:/var/log# fdisk -l Disk /dev/ram0: 4 MiB, 4194304 bytes, 8192 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk /dev/ram1: 4 MiB, 4194304 bytes, 8192 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk /dev/ram2: 4 MiB, 4194304 bytes, 8192 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk /dev/ram3: 4 MiB, 4194304 bytes, 8192 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk /dev/mmcblk0: 14.4 GiB, 15476981760 bytes, 30228480 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xf7477067 Device Boot Start End Sectors Size Id Type /dev/mmcblk0p1 8192 29926175 29917984 14.3G 83 Linux Disk /dev/sda: 111.8 GiB, 120034123776 bytes, 234441648 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0xce16e6cf Device Boot Start End Sectors Size Id Type /dev/sda1 2048 234441647 234439600 111.8G 83 Linux Disk /dev/zram0: 50 MiB, 52428800 bytes, 12800 sectors Units: sectors of 1 * 4096 = 4096 bytes Sector size (logical/physical): 4096 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk /dev/zram1: 498.8 MiB, 523026432 bytes, 127692 sectors Units: sectors of 1 * 4096 = 4096 bytes Sector size (logical/physical): 4096 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes
arox Posted April 18, 2021 Posted April 18, 2021 When that sort of things happen to me, I need to plug the disk on another computer and do an fsck with force (-f) option. Not fair an be aware that an fsck may have to remove half your files if the cause is a disk failure ... I don't know how to force an fsck elseway if you cannot switch to a shell during boot sequence ? You could try : mount -o remount, rw / But it surely will fail. Since you use a BPI M1, I guess the boot loader is on the sdcard. And if you do always have a valid root fs on sd card, you could modify the loader to mount the root fs from the sdcard.
sibianul Posted April 19, 2021 Author Posted April 19, 2021 root@bananapi:~# mount -o remount, rw / mount: cannot remount rw read-write, is write-protected I tried it but it doesn't work. I copied almost all my files, just the big mysql database wasn't backuped already, as phpMyAdmin doesn't load. I will try to copy it somehow trough putty, before I will try to shut it down and check the disk. What do you recommend regarding a "live image"? I could burn a distro you recommend on another sd card, and reboot it, with the SSD connected, and hopefully I could run the check disk commands from the same unit. All my other PC's have windows OS. Thank you
arox Posted April 19, 2021 Posted April 19, 2021 Of course, it is the simplest and so the best solution. Flashing a new SdCard take some times but at least, you should be able to repair your disk in 2 minits next time. For the distro, anything that can boot and is up-to-date with ext4 should do the trick. (This just mean not a five years old distro).
Solution arox Posted April 19, 2021 Solution Posted April 19, 2021 If you are not familiar with unix/linux : Do : - fsck /dev/sda1 If he seems to think nothing needs to be done, don't believe it, do : - fsck -f /dev/sda1 When he ask to confirm for inodes clear, say "y". You don't have much choice anyway. If he start asking for removing blocs, you may become worried, but you don't have much choice than agree. If the number of removal becomes important, you may become very upset. If he finished and leave a number of files without name in lost+found you have a problem. But generally you just have a pair of inodes and some pointers/counters that need to be fixed. In theory you should do a second fsck to be sure the fs is repaired. If this one needs action, your disk or controler has a problem. (The card or the disk ?) Before swapping sdcards and trying to boot on disk, do a mount - mount dev/sda1 /mnt - cat /mnt/etc/os-release (to reassure yourself) - umount /mnt (or you risk to damage your fs another time)
tparys Posted April 20, 2021 Posted April 20, 2021 Yes, run "fsck" the drive, but you probably want to use "fsck -y" which will just attempt to fix things and not ask permission at each step. Also do this with the filesystem unmounted, otherwise possible filesystem damage. Also, may want to add the " fsck.repair=yes " to your kernel arguments. That will direct SystemD to automatically attempt repairs like these. It may make things worse with failing hardware, but generally will do the right thing for users who want things to just work. https://www.linux.org/docs/man8/systemd-fsck.html
arox Posted April 20, 2021 Posted April 20, 2021 8 hours ago, tparys said: Yes, run "fsck" the drive, but you probably want to use "fsck -y" which will just attempt to fix things and not ask permission at each step. Also do this with the filesystem unmounted, otherwise possible filesystem damage. Also, may want to add the " fsck.repair=yes " to your kernel arguments. That will direct SystemD to automatically attempt repairs like these. It may make things worse with failing hardware, but generally will do the right thing for users who want things to just work. https://www.linux.org/docs/man8/systemd-fsck.html fsck.repair=yes is normally set in distros. Nevertheless, it seems to me that the corruption of the fs is sometimes unnoticed with ext4 ? So a force option is needed, but a "force" is a bad idea in an automatic procedure because it may remove half the files. A manual procedure is necessary to allow the administrator to backup what can be save. And I also noticed that the boot procedure may perform a full boot with a corrupted fs. Then the problem is likely to become worse. Also if the root fs is write-locked, there is no point starting a full service witch will hinder system maintenance : the boot sequence should escape to a maintenance shell if the remount rw fail. (I say "seems to me" or "noticed", because when it happens to me on a system with a SSD, it is my desktop or my file server that is unavailable and I am more in a hurry to solve that investigate the problem, and with a headless server or without main desktop working it is also not easy.)
sibianul Posted April 20, 2021 Author Posted April 20, 2021 Thank you guys, I'm curently away from home, I can connect to it remotely but I can't phisically change anything (swapping sd cards ). I will try to burn the same armbian distro on a new sd card, when I will arrive home, but untill than , I just tried now fsck and it seems it fixed a few things. Now would you think it is safe to reboot it root@bananapi:~# fsck -f /dev/sda1 fsck from util-linux 2.29.2 e2fsck 1.43.4 (31-Jan-2017) /dev/sda1: recovering journal Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Entry 'alarm_20210418T162929_0313.jpg' in /var/www/html/security_camera_storage/cam2 (268645) has deleted/unused inode 311335. Clear<y>? yes Entry 'alarm_20210418T162929_0690.jpg' in /var/www/html/security_camera_storage/cam2 (268645) has deleted/unused inode 311336. Clear<y>? yes Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached inode 365658 Connect to /lost+found<y>? yes Inode 365658 ref count is 2, should be 1. Fix<y>? yes Pass 5: Checking group summary information Block bitmap differences: -(18199362--18199491) -(18222129--18222259) -(18258312--18258464) -(18264714--18264858) -(18271789--18271933) -(18274837--18274981) -(18276125--18276269) -(18276878--18277014) -(18278829--18278969) -(18280586--18280732) -(18284169--18284311) -(18322306--18322440) -(18371739--18371873) -(18388110--18388244) -(18405315--18405454) -(18414482--18414603) -(18908826--18908905) Fix<y>? yes Free blocks count wrong for group #577 (6958, counted=7038). Fix<y>? yes Free blocks count wrong (6116773, counted=3943820). Fix<y>? yes Free inodes count wrong (7078589, counted=7056170). Fix<y>? yes /dev/sda1: ***** FILE SYSTEM WAS MODIFIED ***** /dev/sda1: ***** REBOOT SYSTEM ***** /dev/sda1: 275670/7331840 files (5.3% non-contiguous), 25361130/29304950 blocks And as you recommanded I run the fsck another time root@bananapi:~# fsck -f /dev/sda1 fsck from util-linux 2.29.2 e2fsck 1.43.4 (31-Jan-2017) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong (3936310, counted=3943820). Fix<y>? yes Free inodes count wrong (7056168, counted=7056170). Fix<y>? yes /dev/sda1: ***** FILE SYSTEM WAS MODIFIED ***** /dev/sda1: ***** REBOOT SYSTEM ***** /dev/sda1: 275670/7331840 files (5.3% non-contiguous), 25361130/29304950 blocks root@bananapi:~# mount dev/sda1 /mnt mount: special device dev/sda1 does not exist root@bananapi:~# mount /dev/sda1 /mnt mount: /dev/sda1 is already mounted or /mnt busy /dev/sda1 is already mounted on / /dev/sda1 is already mounted on /var/log.hdd root@bananapi:~# cat /mnt/etc/os-release cat: /mnt/etc/os-release: No such file or directory root@bananapi:~# umount /mnt umount: /mnt: not mounted I rebooted the sistem, will post back later, if it will successfully reboot, if it won't , I will have to check it phisically tonight when I'll arive home. Thank you very much for your help.
sibianul Posted April 20, 2021 Author Posted April 20, 2021 Guys, THANK YOU again it booted up successfully, I have my dashboard up and running again, without even touching the bananaPi board Everything fixed remotely with your help. This board is running flawlessly 24/7 for about 5 years, in 2018 I mounted an new SSD, but before that it worked an year or two, with the OS installed on a old HDD which failed afer an year or two. BananaPi rocks! Very stable. Tough, I think I will try to use an USB flash drive, to save the temporary security camera files, I already have an NVR that save the camera footage 24/7, but I setup my cameras to upload to bananaPi FTP a photo on every motion detection, just to be able to view WITH JUST A SCROLL from left to right, ALL the daily movements around my house. It's much easier than logging to NVR and fast forward the video. And it's also what I want most, to have all the information centralized in one dashboard, I have the cameras, the photovoltaic panels production, house energy consumption, Heat Pipe solar water heater tank monitoring, control of my HVAC unit, office lights, LAN devices .. and for all those I have alerts, in case some values are off, or in case some devices are offline, I get an Pushbullet alert on my phone. What I haven't done yet, and I have the device, I just don't have time, it's a backup for the alerting system , in case there is no internet connection to my house, I purchased an 3G Sim module, I will attach it to an arduino board that will constantly query the BananaPi for a status, if the bananaPi responds, and it doesn't have any unsent alerts, it will do nothing , but if the bananaPi doesn't respond than I will get an SMS alert that the dashboard is offline, and the same arduino can "ask for a status" all my other arduino sensors, it just needs the router to work, not an actual internet connection
arox Posted April 20, 2021 Posted April 20, 2021 Hum ! A bit worrying that your second fsck complained. I do not understand why you couldn't unmount the fs or /dev/sda1 was not present and appeared the second time ? Did you run fsck with the fs mounted ro ? The fs is probably repaired now but you should check your logs and have an up-to-date backup during some times. There are commands to check disk errors recorded by firmware (but not controller or cables errors). Anyway I experienced your problems a lot of times and still use the board and cards without problem. Yes, BPI M1 with armbian was and always is a good solution. That is why I still use it although it is completely out-of-date. But I got the same sort of fs corruption case (triggered by usb bugs ?) with raspbian on RPI4.
arox Posted April 20, 2021 Posted April 20, 2021 BTW A possible cause of repeated fs corruption is the power supply. (Or power cables).
tparys Posted April 22, 2021 Posted April 22, 2021 On 4/20/2021 at 6:23 AM, arox said: fsck.repair=yes is normally set in distros. Often times yes, but not always. For example, it is not set on my NanoPi M4V2 (rockchip64) by default ... https://github.com/armbian/build/blob/f2e9da1af5ef752bda42b8c9344a46fdf74d060d/config/bootscripts/boot-rockchip64.cmd#L34
sibianul Posted April 22, 2021 Author Posted April 22, 2021 On 4/20/2021 at 7:48 PM, arox said: Hum ! A bit worrying that your second fsck complained. I do not understand why you couldn't unmount the fs or /dev/sda1 was not present and appeared the second time ? Did you run fsck with the fs mounted ro ? The fs is probably repaired now but you should check your logs and have an up-to-date backup during some times. There are commands to check disk errors recorded by firmware (but not controller or cables errors). Anyway I experienced your problems a lot of times and still use the board and cards without problem. Yes, BPI M1 with armbian was and always is a good solution. That is why I still use it although it is completely out-of-date. But I got the same sort of fs corruption case (triggered by usb bugs ?) with raspbian on RPI4. The fs was still mounted, I havent use umount before running fsck but it was on read-only mode . To check the disk I did install smartmontool and run the fast check on it, the result is bellow, I haven't googled yet but do you guys have a script that can alert in case it senses an SMART error on the drive ? root@bananapi:~# sudo smartctl -t short -a /dev/sda smartctl 6.6 2016-05-31 r4324 [armv7l-linux-4.19.62-sunxi] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: KINGSTON SA400S37120G Serial Number: 50026B7682034AC1 LU WWN Device Id: 5 0026b7 682034ac1 Firmware Version: SBFK71E0 User Capacity: 120,034,123,776 bytes [120 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: Unknown(0x0ff8) (minor revision not indicated) SATA Version is: SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Thu Apr 22 11:09:10 2021 EEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (65535) seconds. Offline data collection capabilities: (0x79) SMART execute Offline immediate. No Auto Offline data collection support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 30) minutes. Conveyance self-test routine recommended polling time: ( 6) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x0032 000 100 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 27101 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 147 148 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0 149 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0 167 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0 168 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0 169 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 6 170 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 7 172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 173 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 46138104 181 Program_Fail_Cnt_Total 0x0032 100 100 000 Old_age Always - 0 182 Erase_Fail_Count_Total 0x0000 100 100 000 Old_age Offline - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 143 194 Temperature_Celsius 0x0022 075 069 000 Old_age Always - 25 (Min/Max 23/31) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0 218 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 231 Temperature_Celsius 0x0000 071 071 000 Old_age Offline - 29 233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 74821 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 47642 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 1600 244 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 704 245 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 760 246 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 4153344 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Thu Apr 22 11:11:10 2021 On 4/20/2021 at 8:19 PM, arox said: BTW A possible cause of repeated fs corruption is the power supply. (Or power cables). I power the Banana board from a PC power source, it should have plenty of power to run the board and hdd. But I will check that too, as it's been running for 5+ years 24/7 .. exactly like the Banana M1 board
arox Posted April 22, 2021 Posted April 22, 2021 4 hours ago, sibianul said: I power the Banana board from a PC power source, it should have plenty of power to run the board and hdd. But I will check that too, as it's been running for 5+ years 24/7 .. exactly like the Banana M1 board "it should have plenty of power to run the board and hdd" This is a recurrent question on this forum. But the voltage available to the board always depends on the source voltage and also on the cable(s), contacts of connectors and internal circuits resistance, protection diodes and fuses, (and of the PSU mileage which is difficult to evaluate) ... When the board needs too much current, the total resistance can make the input voltage drop below the minimum needed for a time too short to see on a multimeter but that may cause any sort of hardware failure. If you never changed anything and never moves cables, forget that : but if you replugged the power connector of the BPI M1, be sure to use the one next to the SCSI connector and check that half the strands of your cable are not cut. (As it happened to me a lot of time). "PC power source" : A PC Power Supply has plenty of power but not a USB port which has current limitation and protection circuitry. I do not think your card could boot if the SBC and the disk were powered thru a single USB port. As tparys said, you should also check that fsck is requested at boot (kernel parameter "fsck.repair=yes" in "modern" linux - but yours - as mine - is perhaps a little less modern), because you always will face a power outage some time that the journalization of the fs cannot handle.
sibianul Posted April 22, 2021 Author Posted April 22, 2021 I soldered the 5v wires from the PSU to the BananaPi board, I think there ware some pads dedicated to solder the DC input . I didn't use the USB connector of the board. The PSU also powers some led strips and a router in my office, you know if you join the black and green wire in the PSU connector, it stays on forever The bananaPi had this "read-only" problem , with this SSD , for the first time now, after 3 years of running 24/7, so the issue is not something repetitive, it never happen, we'll see if from now on will happen more offten, I hope not. "If you never changed anything and never moves cables, forget that" .. No , I haven't touched the cables or the Pi, an and also the PSU is connected to an UPS, there wasn't any main power failure, nor a reboot. I'm happy it works ok now, I don't know what was the cause, but I'll worry only if it will happen more often Anyway, I will change a few things as I mention, because there are many photos saved on the ssd by the cameras, but they are temporary, and are automatically deleted after 7 days, so it's information that is not important, and could be saved on a flash drive, which if it will fail, I'll throw it directly to the garbage bin, and mount another one. On the ssd I also have the mysql database, with the sensors data, which are more important. If it will happen again, I'll post back for sure. Have a nice evening. Thank you again.
Recommended Posts