devAtronia Posted Wednesday at 04:05 PM Posted Wednesday at 04:05 PM Hi, First of all, thank You for your work. On a project we are using Nanopi Neo Air boards with an Allwinner h3 processor and a 8GB eMMC flashed with an Armbian OS: $ uname -a Linux <DEVICE_HOSTNAME> 6.6.75-legacy-sunxi #1 SMP Sat Feb 1 17:37:57 UTC 2025 armv7l GNU/Linux $ lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 11 (bullseye) Release: 11 Codename: bullseye Recently we have been getting increasing reports of the boards being stuck on boot, on message 'Starting Kernel ...' (checked from logs available on COM port). Although this behavior is highly undesirable, since the eMMC is visible in device tree when booting from an SD-Card, it can be recovered by running command $ fsck /dev/mmcblk2p1 In order to reproduce this behavior we have set up a continuous power cycle test where, booting from eMMC: Device is powered on After 70 seconds (enough time for the system to boot), power supply is interrupted Device remains powered off for 10 seconds This would account for around 1000 power cycles per day and we were able to get the device bricked in the range of 68 to 5500 power cycles. We then repeated the test but by booting from an SD-Card and were not able to reproduce the issue, having reached power cycle counts of 34381. Does anyone have a clue on why the eMMC displays this behavior, or is someone able to provide some guidance on what we could do/check to try to prevent this from happening (or making the eMMC more resilient to power cycles)? 0 Quote
IBV Posted yesterday at 05:45 AM Posted yesterday at 05:45 AM Hi, you can try to set up a readonly root filesystem: https://wiki.debian.org/ReadonlyRoot Cheers 0 Quote
laibsch Posted yesterday at 06:40 AM Posted yesterday at 06:40 AM Is your rootfs formatted ext4? By bricked you mean, boot would be interrupted, but the brick situation is easily fixable for a skilled technician by doing an fsck? 0 Quote
eselarm Posted yesterday at 10:58 AM Posted yesterday at 10:58 AM What U-Boot version is used for eMMC resp. SD-card? 0 Quote
devAtronia Posted 7 hours ago Author Posted 7 hours ago @IBV Thanks! I will look into that info and check if we can do this conversion is runtime, and if so, how it can be automated. 0 Quote
devAtronia Posted 7 hours ago Author Posted 7 hours ago @laibsch Thanks for the reply. Yes. The filesystem is formatted as ext4. # df -h Filesystem Size Used Avail Use% Mounted on udev 186M 0 186M 0% /dev tmpfs 49M 5.7M 43M 12% /run /dev/mmcblk2p1 7.0G 5.2G 1.5G 79% / tmpfs 244M 0 244M 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 244M 8.0K 244M 1% /tmp /dev/zram1 49M 5.5M 40M 13% /var/log tmpfs 49M 0 49M 0% /run/user/0 tmpfs 49M 0 49M 0% /run/user/1000 # file -sL /dev/mmcblk2p1 /dev/mmcblk2p1: Linux rev 1.0 ext4 filesystem data, UUID=d3fe76cc-f19f-40a6-b8db-44a87f92714f (needs journal recovery) (extents) (64bit) (large files) (huge files) By bricked, I mean exactly that. Device boot is interrupted on the "Loading kernel" message. In these cases, the device can be recovered by booting from an SD Card and running fsck 0 Quote
devAtronia Posted 7 hours ago Author Posted 7 hours ago @eselarm Thanks for the reply. We are using U-Boot: 2024.01-armbian-2024.01-S866c-P00ff-Ha5c2-V4cad-Bb703-R448a (Jan 21 2025 - 02:21:57 +0000) Allwinner Technology U-Boot 2024.01-armbian-2024.01-S866c-P00ff-Ha5c2-V4cad-Bb703-R448a (Jan 21 2025 - 02:21:57 +0000) Allwinner Technology CPU: Allwinner H3 (SUN8I 1680) Model: FriendlyARM NanoPi NEO Air DRAM: 512 MiB Core: 64 devices, 16 uclasses, devicetree: separate WDT: Not starting watchdog@1c20ca0 MMC: mmc@1c0f000: 0, mmc@1c10000: 2, mmc@1c11000: 1 Loading Environment from FAT... Unable to use mmc 0:1... In: serial,usbkbd Out: serial Err: serial Net: No ethernet found. starting USB... No working controllers found Autoboot in 1 seconds, press <Space> to stop 0 Quote
laibsch Posted 3 hours ago Posted 3 hours ago OK, personally I like btrfs and have been using it exclusively for about a decade now, I guess. Reason are manifold, but the one applicable to your case is that it will automatically detect and if possible fix corruption even when mounted. This can help prevent it from mushrooming into a bricked situation. You can also do an online "btrfs scrub" while the system is mounted, akin to the fsck that requires taking the ext4 FS offline. Look into an AB option where you have a (readonly if you like) failsafe boot system somewhere accessible to uboot in addition to your main OS. And if uboot detects main OS boot failure, have it switch over to the failsafe, ssh in to it and fix the FS corruption. How to do that with uboot, I am not sure. @eselarm might know and quite possibly that is the reason he was asking what uboot version you have as it might depend on that. There are also commercial solutions available like qubee, rauc and mender that you might want to look into. 0 Quote
eselarm Posted 1 hour ago Posted 1 hour ago I was just wondering if there would be an issue because of different U-Boot and kernel combination in the 2 tests, but that does not seem to be the case. I think there might be some issue with the controller in the eMMC module that is starting to reveal when wear-level is or is getting higher. What fsck commands are done? So how is it fixed and is it fixed or is only metadata corrected and might there be corrupted data-blocks still without knowing. I don't use Ext4, I use Btrfs for rootfs and all other storage devices. In doubt, I use DUP profile for meta data, that is more for HDDs. As said Btrfs allows adhoc or regular scrubs, you will be able to detect where corrupt blocks are if that is the issue. It also might be that there is an issue in the 6.6 kernel that reveals itself when higher write delays or so, maybe update the OS. Maybe mmc-utils can show some issue (I have no experience with it). 0 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.