Jump to content

Recommended Posts

Posted

Hi,

 

First of all, thank You for your work.
On a project we are using Nanopi Neo Air boards with an Allwinner h3 processor and a 8GB eMMC flashed with an Armbian OS:

 

$ uname -a
Linux <DEVICE_HOSTNAME> 6.6.75-legacy-sunxi #1 SMP Sat Feb  1 17:37:57 UTC 2025 armv7l GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye


Recently we have been getting increasing reports of the boards being stuck on boot, on message 'Starting Kernel ...' (checked from logs available on COM port).


Although this behavior is highly undesirable, since the eMMC is visible in device tree when booting from an SD-Card, it can be recovered by running command

 

$ fsck /dev/mmcblk2p1

 

In order to reproduce this behavior we have set up a continuous power cycle test where, booting from eMMC:

  • Device is powered on
  • After 70 seconds (enough time for the system to boot), power supply is interrupted
  • Device remains powered off for 10 seconds

 

This would account for around 1000 power cycles per day and we were able to get the device bricked in the range of 68 to 5500 power cycles.

We then repeated the test but by booting from an SD-Card and were not able to reproduce the issue, having reached power cycle counts of 34381.

Does anyone have a clue on why the eMMC displays this behavior, or is someone able to provide some guidance on what we could do/check to try to prevent this from happening (or making the eMMC more resilient to power cycles)?

 

 

Posted

Is your rootfs formatted ext4?  By bricked you mean, boot would be interrupted, but the brick situation is easily fixable for a skilled technician by doing an fsck?

Posted

@laibsch

Thanks for the reply.

 

Yes. The filesystem is formatted as ext4.

 

# df -h

Filesystem      Size  Used Avail Use% Mounted on
udev            186M     0  186M   0% /dev
tmpfs            49M  5.7M   43M  12% /run
/dev/mmcblk2p1  7.0G  5.2G  1.5G  79% /
tmpfs           244M     0  244M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           244M  8.0K  244M   1% /tmp
/dev/zram1       49M  5.5M   40M  13% /var/log
tmpfs            49M     0   49M   0% /run/user/0
tmpfs            49M     0   49M   0% /run/user/1000

# file -sL /dev/mmcblk2p1
/dev/mmcblk2p1: Linux rev 1.0 ext4 filesystem data, UUID=d3fe76cc-f19f-40a6-b8db-44a87f92714f (needs journal recovery) (extents) (64bit) (large files) (huge files)

 

By bricked, I mean exactly that. Device boot is interrupted on the "Loading kernel" message. In these cases, the device can be recovered by booting from an SD Card and running fsck

Posted

@eselarm

 

Thanks for the reply.

 

We are using U-Boot: 2024.01-armbian-2024.01-S866c-P00ff-Ha5c2-V4cad-Bb703-R448a (Jan 21 2025 - 02:21:57 +0000) Allwinner Technology

 

U-Boot 2024.01-armbian-2024.01-S866c-P00ff-Ha5c2-V4cad-Bb703-R448a (Jan 21 2025 - 02:21:57 +0000) Allwinner Technology

CPU:   Allwinner H3 (SUN8I 1680)
Model: FriendlyARM NanoPi NEO Air
DRAM:  512 MiB
Core:  64 devices, 16 uclasses, devicetree: separate
WDT:   Not starting watchdog@1c20ca0
MMC:   mmc@1c0f000: 0, mmc@1c10000: 2, mmc@1c11000: 1
Loading Environment from FAT... Unable to use mmc 0:1...
In:    serial,usbkbd
Out:   serial
Err:   serial
Net:   No ethernet found.
starting USB...
No working controllers found
Autoboot in 1 seconds, press <Space> to stop

 

Posted

OK, personally I like btrfs and have been using it exclusively for about a decade now, I guess.  Reason are manifold, but the one applicable to your case is that it will automatically detect and if possible fix corruption even when mounted.  This can help prevent it from mushrooming into a bricked situation.  You can also do an online "btrfs scrub" while the system is mounted, akin to the fsck that requires taking the ext4 FS offline.

 

Look into an AB option where you have a (readonly if you like) failsafe boot system somewhere accessible to uboot in addition to your main OS.  And if uboot detects main OS boot failure, have it switch over to the failsafe, ssh in to it and fix the FS corruption.  How to do that with uboot, I am not sure.  @eselarm might know and quite possibly that is the reason he was asking what uboot version you have as it might depend on that.

 

There are also commercial solutions available like qubee, rauc and mender that you might want to look into.

Posted

I was just wondering if there would be an issue because of different U-Boot and kernel combination in the 2 tests, but that does not seem to be the case.

 

I think there might be some issue with the controller in the eMMC module that is starting to reveal when wear-level is or is getting higher. 

 

What fsck commands are done?

So how is it fixed and is it fixed or is only metadata corrected and might there be corrupted data-blocks still without knowing.

 

I don't use Ext4, I use Btrfs for rootfs and all other storage devices. In doubt, I use DUP profile for meta data, that is more for HDDs.

As said Btrfs allows adhoc or regular scrubs, you will be able to detect where corrupt blocks are if that is the issue.

 

It also might be that there is an issue in the 6.6 kernel that reveals itself when higher write delays or so, maybe update the OS. Maybe mmc-utils can show some issue (I have no experience with it).

 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines