3 3
DoubleHP

Where to report a bug ? /etc/init.d/armhwinfo corrupts /boot/armbianEnv.txt when followed by powerfailure

Recommended Posts

Took me over a year to isolate this bug.

 

Randomly, I have found that /boot/armbianEnv.txt had unconsistent content.

 

Examples:

 

.tty1tty1LOGIN.8
¹[fU7ttyS0tyS0LOGINusbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u
usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u
network={
        ssid="Demaine_40_Grd_Rue_Chambre-BP"
        #psk="1234567890"
        psk=693d195a3bf2ab6defbcc0b54604ced4286a067996138db6dc793d8648f67d79
}
usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u
usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u
network={
        ssid="fabste"
        #psk="1234567890"
        psk=95342031a4b28398045e1761f9c64f8454902e80d37102e5476fa79abd59e144
}
2
}
usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u
usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u

 

At last, I have found that "fabste" is the SSID of the network of my neighbour. How did this get here ?

 

The bug occurs only on Orange Pi Zero boards ... which need to perform wifi scan. Does not happen on boards that operate as WIFI AP, or use ethernet. The wifi scan is required to connect to the AP. The scan seem to produce a temp file somewhere (maybe in /tmp, not sure).

 

Then, later, /etc/init.d/armhwinfo runs this, which is very suspicious to me:

        read USBQUIRKS <${TMPFILE}
        sed -i '/^usbstoragequirks/d' /boot/armbianEnv.txt
        echo "usbstoragequirks=${USBQUIRKS}" >>/boot/armbianEnv.txt

 

and if this is followed by a power failure, you end up with a corrupted file:

 

https://lists.denx.de/pipermail/u-boot/2018-June/332375.html

https://github.com/ConnectBox/connectbox-pi/issues/220

 

It's a rare issue from Google point of view; but it's known, and it makes most of my opi0 unusable. Sooner or later, it affects 100% of my opi0 that work as wifi clients.

 

It rendends my pis unusable because I need very specific features which are set in /boot/armbianEnv.txt , usually w1-gpio. When the file is corrupted, and after reboot, I loose control over my 1W devices, and the whole project got broken.

 

Here is my fstab, in case it matters:

UUID=ebe9dacf-124f-486c-b6c1-08749e209374 / ext4 defaults,noatime,nodiratime,commit=600,errors=remount-ro 0 1
tmpfs /tmp tmpfs defaults,nosuid 0 0

 

I am not sure how to fix this at the distribution level:

- use something different than sed in armhwinfo to alter /boot/armbianEnv.txt

- perform a sync

- change ext4 features

- change mount options

 

For now, I am trying to write a personnal fix: record a backup copy of /boot/armbianEnv.txt and restaure it if critical words like "overlay_prefix=sun8i-h3" disappear.

 

Note that, when the file is heavily broken, for all 3 examples above, the pi can boot, establish network connexion, and remains reachable, mostly. I only loose the 1W or GPIO feature. There seem to be very good default settings set as fallback somewhere else.

 

Here are unaffected opis:


 

verbosity=1
logo=disabled
console=both
disp_mode=1920x1080p60
overlay_prefix=sun8i-h3
overlays=cir usbhost2 usbhost3 w1-gpio
param_w1_pin=PA15
rootdev=UUID=ebe9dacf-124f-486c-b6c1-08749e209374
rootfstype=ext4

usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u
usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u
verbosity=1
logo=disabled
console=both
disp_mode=1920x1080p60
overlay_prefix=sun8i-h3
overlays=usbhost2 usbhost3 w1-gpio
param_w1_pin=PA15
rootdev=UUID=ebe9dacf-124f-486c-b6c1-08749e209374
rootfstype=ext4
usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u
usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u

I am not sure why one has an empty line. But in both case, the usbstoragequirks line is doubled, and identical to my eyes. This means, /etc/init.d/armhwinfo tries to add already existing data. This script lacks checks !!!

 

 

.

Share this post


Link to post
Share on other sites

I was wrong ... I just found a corruption on my wifi master AP opi, an opi that never does scanning, since it's always in master mode:


 

# cat /boot/armbianEnv.txt.backup.2018-11-17_22-14-09
verbosity=1
logo=disabled
console=both
disp_mode=1920x1080p60
overlay_prefix=sun8i-h3
overlays=usbhost2 usbhost3 w1-gpio
param_w1_pin=PA15
rootdev=UUID=ebe9dacf-124f-486c-b6c1-08749e209374
rootfstype=ext4
usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u
usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u

# cat /boot/armbianEnv.txt | hexdump
0000000 6576 6272 736f 7469 3d79 0a31 6f6c 6f67
0000010 643d 7369 6261 656c 0a64 6f63 736e 6c6f
0000020 3d65 6f62 6874 640a 7369 5f70 6f6d 6564
0000030 313d 3239 7830 3031 3038 3670 0a30 766f
0000040 7265 616c 5f79 7270 6665 7869 733d 6e75
0000050 6938 682d 0a33 766f 7265 616c 7379 753d
0000060 6273 6f68 7473 2032 7375 6862 736f 3374
0000070 7720 2d31 7067 6f69 700a 7261 6d61 775f
0000080 5f31 6970 3d6e 4150 3531 720a 6f6f 6474
0000090 7665 553d 4955 3d44 6265 3965 6164 6663
00000a0 312d 3432 2d66 3834 6336 622d 6336 2d31
00000b0 3830 3437 6539 3032 3339 3437 720a 6f6f
00000c0 6674 7473 7079 3d65 7865 3474 000a 0000
00000d0 0000 0000 0000 0000 0000 0000 0000 0000
*
0000120 0000 0000 0000 0000 0000 0000 0000 7500
0000130 6273 7473 726f 6761 7165 6975 6b72 3d73
0000140 7830 3532 3733 303a 3178 3630 3a36 2c75
0000150 7830 3532 3733 303a 3178 3630 3a38 0a75
0000160 7375 7362 6f74 6172 6567 7571 7269 736b
0000170 303d 3278 3335 3a37 7830 3031 3636 753a
0000180 302c 3278 3335 3a37 7830 3031 3836 753a
0000190 000a
0000191

 

Tricky one ... was not visible in cat, only in vim ...

Share this post


Link to post
Share on other sites

I have just noticed the same issue when trying to use armbian-config on pi pc2, as have others here:

https://github.com/armbian/config/issues/33

 

My armbianEnv.txt looks like this


/var/log.hdd/aptitude {
  rotate 6
  monthly
  compress
  missingok
  notifempty
}
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u

 

How can I regenerate armbianEnv.txt without doing a reinstall?

 

edit: pulled it from the .img

Share this post


Link to post
Share on other sites

Filesystem corruption it can be... Power supply / SD cards are all the same? Have you tried one node with another SD  card / power supply?

Share this post


Link to post
Share on other sites
On 3/18/2019 at 5:11 PM, Magnets said:

 

How can I regenerate armbianEnv.txt without doing a reinstall?

 

edit: pulled it from the .img

 

Backup the whole system after install.

 

For this specific file, I keep local backups with dates, and diff them after each reboot. When it looks corrupted, the script restaures the last BU.

Share this post


Link to post
Share on other sites
On 3/22/2019 at 2:31 AM, pzw said:

Filesystem corruption it can be... Power supply / SD cards are all the same? Have you tried one node with another SD  card / power supply?

It could by a bad SD, or broken ext4 ... but fact is ... I had this issue on 100% opis, and, only on this specific file. So, yes there is an issue with ext4; but there is an issue with the way this file is modified. This file is savagely altered by boot or initscripts, without any check. To the point the things introduce twice the same line. The issue may even be in how temp files is handled.

Share this post


Link to post
Share on other sites

Here are core parts of my service:

 

Makefile:

 

TARGET = /etc/cron.d/armbian_env_txt_checker

all: install

install: $(TARGET)

$(TARGET): Makefile
        bash -c "echo -e \"# file generated by $$(pwd)/$<\" >$@"
        bash -c "echo -e \"@reboot\troot\t/srv/doublehp/share/armbianEnv_txt/armbianEnv_txt_checker.sh\" >>$@"
        touch $@

 

/srv/doublehp/share/armbianEnv_txt/armbianEnv_txt_checker.sh

 

#!/bin/bash

[ ! -f /usr/sbin/armbian-config ] && exit 0

sleep 90

ref=/srv/doublehp/share/armbianEnv_txt/armbianEnv.txt.ref
file=/boot/armbianEnv.txt
loc="${file}.local"
backup="${file}.backup.$(/bin/date +%Y-%m-%d_%H-%M-%S)"
broken="${file}.broken.$(/bin/date +%Y-%m-%d_%H-%M-%S)"
previous="$("ls" "${file}.backup."* | sort | tail -n1)"

# is current file valid ?
#cat "$file" | grep -e "overlay_prefix=sun8i-h3" >/dev/null && {
cat "$file" | grep -e "rootdev=UUID=" >/dev/null && {
        #file is good.
        # create local ref if none
        [ ! -e "$loc" ] && {
                cp -a "$file" "$loc"
        }

        # create backup if none
        [ "$previous" = "" ] && {
                cp -a "$file" "$backup"
        }
        diff "$file" "$previous" >/dev/null || cp -a "$file" "$backup"
        true
} || {
        # File is bad
        [ -e "$previous" ] && {
                cp -a "$file" "$broken"
                cp -a "$previous" "$file"
                reboot
                exit 0
        }
        [ -e "$loc" ] && {
                cp -a "$file" "$broken"
                cp -a "$loc" "$file"
                reboot
                exit 0
        }
        # Fallback on ref ...
        cp -a "$file" "$broken"
        cp -a "$ref" "$file"
        reboot
}

 

Share this post


Link to post
Share on other sites
29 minutes ago, DoubleHP said:

 

Backup the whole system after install.

 

For this specific file, I keep local backups with dates, and diff them after each reboot. When it looks corrupted, the script restaures the last BU.

What program did you use to write the image to the SD card? I hope Etcher or another program which actually verifies the data?

 

IF there would be an issue with the OS/scripts provided by Armbian, for sure you would not be the only one experiencing it... Better look at what you use, like common parts, how you do things, like writing the image to the SD card, etc.

I have a number of Opi0 / other Opis and now a few NanoPi Duo2 boards, which all run without any issue. For the SD cards I am only using the Samsung EVO cards,  (at the moment I just buy Samsung EVO select cards from Amazon), use only the power supply from the Orange Pi, or a supply which can deliver at least 2A  to the board. (Stable, tested on an oscilloscope to ensure it is stable without funny spikes etc)  I have not experienced a problem in this setup so far.. 

Share this post


Link to post
Share on other sites

dd

 

I am not. Many other people have issues with armbianEnv.txt, and with other files dues to sync issues.

 

After fixing the issue for this file using my scripts, I don't encontour any other issue on oPi0.

Share this post


Link to post
Share on other sites

I can confirm the bug.

Using 20+ OrangePi Zero Plus with the extension board usb ports I noticed the same strange content of /boot/armbianEnv.txt as Magnet  described.

The file system seems to be fine, except this file. 

Since my systems use the USB from the extension board, once the port is not present, my device doesn't work.

My workaround is based on DoubleHP. I added a simple service that checks the content of the armbianEnv.txt file and react accordingly to the result. Her are the three files.

It works well on my systems, but please feel free to criticise it an point out any possible errors I made.

 

/root/bootenv_check/bootenv_check.service

[Unit]
Description=Checking the validity of /boot/armbianEnv.txt file.
After=sysinit.target

[Service]
Type=oneshot
User=root
ExecStart=/bin/bash /root/bootenv_check/bootenv_check.sh
RemainAfterExit=yes

[Install]
WantedBy=basic.target

 

/root/bootenv_check/bootenv_check.sh

#!/bin/bash

file=/boot/armbianEnv.txt

overlays="overlays=usbhost0 usbhost1 usbhost2 usbhost3"

content="verbosity=1
console=serial
overlay_prefix=sun50i-h5
"$overlays"
rootdev=UUID=d59ef23a-f2e0-418c-85e9-e21611283218
rootfstype=ext4
usbstoragequirks=0x2537:0x1066:u,0x2537:0x1068:u"

cat "$file" | grep -x -F "$overlays" >/dev/null && {
	#file is good.
	echo "$file seems to be valid (includes '$overlays')"
} || {
	# File is bad
	echo -e "$file content is invalid: \n----"
	cat "$file"
	echo -e "----"
	# Fallback on the predefined content ...
	echo -e "$content" > "$file"
	echo "$file overwritten with default content"
	echo "REBOOT NOW"
	reboot
}

 

 

/root/bootenv_check/install_service.sh

 

#!/bin/bash
#echo "The script you are running has basename `basename "$0"`, dirname `dirname "$0"`"

echo "Installing bootenv_check.service in the system..."

file="bootenv_check.service"
target=`dirname "$(readlink -f "$0")"`"/"$file

echo "Service file: $target"
ln -s -T -v "$target"  /etc/systemd/system/"$file"
systemctl enable $file

echo "... done" 

 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
3 3