fromport

November 21, 2020

Armbianmonitor:

I have an array of HC2's running lizardfs.

The other units are running 5.4.72-odroidxu4 kernels, but I dared to upgrade one to 5.8.16-odroidxu4

Only this one unit threw a

Quote

swapper/0: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0

Since the other units didn't I show anything in their logs, I have to assume it isn't because a process running away or a memory leak.

line of fstab to mount the spinning rust drive:
UUID=be95e207-1768-4fd9-bda9-ddc126a47d50 /chunks/be95e207-1768-4fd9-bda9-ddc126a47d50 xfs rw,noexec,nodev,noatime,nodiratime,largeio,inode64 0 2

Can't think of any additional info that would be useful.

If someone has suggestions, i would be interested.

October 31, 2020

@gprovostmy armbian is currenly "stuck" on legacy since I went back to 4.4 kernel. With armbian-config there is no 'switch' option to switch from legacy to current or dev.

Any suggestion how I could switch so that I can install the new kernel when it becomes available?

October 31, 2020

16 hours ago, usual user said:
Out of curiosity, with mainline kernel and ondemand governor in place.
Apply this:
echo 40000 > /sys/devices/system/cpu/cpufreq/policy0/ondemand/sampling_rate
echo 465000 > /sys/devices/system/cpu/cpufreq/policy4/ondemand/sampling_rate
Does it still crash in your use case?
If it does not crash any longer I will explain what is going on.

I just switched to "ondemand' and your settings:

Quote

sh -x /usr/local/bin/switchcpuondemand
+ cpufreq-set -c 0 -g ondemand
+ cpufreq-set -c 4 -g ondemand
+ echo 40000
+ echo 465000

Now wait

*UPDATE*

Crashed within _minutes_

On the serial console

Quote

root@filer1:~# [70583.943650] BUG: spinlock lockup suspected on CPU#0, kswapd0/70
[70583.947652] lock: 0xffffffc0f7f3f200, .magic: dead4ead, .owner: swapper/5/0, .owner_cpu: 5
[70583.974491] BUG: spinlock lockup suspected on CPU#4, kworker/4:0/4392
[70583.977415] lock: 0xffffffc0f7f3f200, .magic: dead4ead, .owner: swapper/5/0, .owner_cpu: 5
[70583.978539] BUG: spinlock lockup suspected on CPU#2, kworker/2:1/12393
[70583.978545] lock: 0xffffffc0f7f3f200, .magic: dead4ead, .owner: swapper/5/0, .owner_cpu: 5

October 30, 2020

6 hours ago, usual user said:
Out of curiosity, with mainline kernel and ondemand governor in place.
Apply this:
echo 40000 > /sys/devices/system/cpu/cpufreq/policy0/ondemand/sampling_rate
echo 465000 > /sys/devices/system/cpu/cpufreq/policy4/ondemand/sampling_rate
Does it still crash in your use case?
If it does not crash any longer I will explain what is going on.

I am so glad my machine survived the night without crashing.

It's now part of my lizardfs distributed storage pool and is still synchronizing

Quote

Memory:        Total        Used        Free     Buffers
RAM:         3901088     3843136       57952        1328
Swap:        1950540       57468     1893072

Bootup: Thu Oct 29 22:20:18 2020   Load average: 1.26 1.61 1.68 1/275 7993

user :      01:00:25.07   1.8% page in :         81241566
nice :      00:00:01.07   0.0% page out:        410572768
system:      02:27:57.04   4.3% page act:          4335568
IOwait:      01:55:27.21   3.4% page dea:           226869
hw irq:      00:00:00.00   0.0% page flt:          2458237
sw irq:      00:37:46.36   1.1% swap in :             2239
idle :   2d 02:53:08.08 89.4% swap out:            15848
uptime:      09:49:06.91         context :        660800969

mtdblock0              70r               sdb            14593r          354626w
mmcblk0            8184r             5   sdc            24794r          344540w
mmcblk0p1            8036r               sdd            24769r          342203w
mmcblk1             193r                 sde            23975r          342134w
mmcblk1rpmb               4r             zram0            1541r            2040
mmcblk1boot1             116r            zram1            2802r           15849
mmcblk1boot0             116r            sdf          1061502r            2503w
sda            46103r          350811w

eth0        TX 194.79GiB     RX 302.53GiB     lo          TX 0.00B         RX 0.00B
eth1        TX 0.00B         RX 0.00B

Once it has ran 24 hours , i will reboot and change those parameters and report back.

October 30, 2020

@gprovost Was able to copy all info in parallel to SSD drive.

First time this machine felt stable under load!

I put those 2 commands in /etc/rc.local ;-)

Thank you for restoring my faith in the helios64, i had almost lost it

October 30, 2020

28 minutes ago, gprovost said:

@fromport First time we see this kind of crash message, but looking online it seems it isn't an unknown event on other rk3399 board.

Could do the same test that trigger this crash but first set the governor to performance.

cpufreq-set -c 0 -g performance
cpufreq-set -c 4 -g performance

cpufreq-info to check the governor has been change

Trying to figure out if it's still a DVFS config issue.

21:58:29 up 24 min, 7 users, load average: 9.06, 6.67, 3.94

[knock wood, but so far so good]

5 rsync in parallel at the moment

October 30, 2020

8 hours ago, fromport said:

I was on IRC on #armbian and they suggested using armbian-config to downgrade. Downgraded succesfully to 5.8.14.

Looked good in the beginning but as soon as it got some load on it, it crashed.

Then I used armbian-config to downgrade to 4.4 and like you predicted : it didn't go well

Serial console now shows during bootup

And that is from booting from the internal emmc

Will have to find which jumper to install and boot from sdcard I guess

My machine is really making me sweat.

I installed armbian-buster-legacy-4.4 on an SD card and managed to boot it (much slower performance than internal emmc)

I tried to install lizardfs-chunkserver again. In order to do that I am mounting the spinning rust partitions who I previously formatted with XFS

When trying to mount the partitions RW I get this error

Quote

XFS Superblock has unknown read-only compatible features (0x4) enabled

It says you can mount in RO mode.

I hooked up an external 256GB ssd to the front usb port and started copying the contents of the first HD to the SSD

I was using

Quote

rsync -av --info=progress2 [source] [dest]

It started copying and suddenly the machine stopped responding (after few minutes)

This is what I saw on the serial console

Quote

Armbian 20.08.17 Buster ttyFIQ0

filer1 login: [ 1690.059453] BUG: spinlock lockup suspected on CPU#0, swapper/0/0
[ 1690.063991] lock: 0xffffffc0f7f2b200, .magic: dead4ead, .owner: swapper/4/0, .owner_cpu: 4
[ 1690.070046] BUG: spinlock lockup suspected on CPU#5, rsync/4022
[ 1690.072860] lock: 0xffffffc0f7f2b200, .magic: dead4ead, .owner: swapper/4/0, .owner_cpu: 4
[ 1690.104336] BUG: spinlock lockup suspected on CPU#3, kworker/3:1/3325
[ 1690.109230] lock: 0xffffffc0f7f2b200, .magic: dead4ead, .owner: swapper/4/0, .owner_cpu: 4
[ 1691.144138] BUG: spinlock lockup suspected on CPU#2, kworker/2:2/656
[ 1691.144143] BUG: spinlock lockup suspected on CPU#1, kworker/1:2/164
[ 1691.144152] lock: 0xffffffc0f7f2b200, .magic: dead4ead, .owner: swapper/4/0, .owner_cpu: 4
[ 1691.159662] lock: 0xffffffc0f7f2b200, .magic: dead4ead, .owner: swapper/4/0, .owner_cpu: 4

I have tried no both legacy and the most up to date images.

And the only thing that is utterly consistent : it crashes on me all the times, no matter what I do.

Could this be a bad hardware version ?

October 29, 2020

3 hours ago, flower said:
Downgrading and pinning is possible through armbian-config. it is described in the latest kobol blog (that page seems offline for me atm).

i used this to downgrade:
apt install \
  linux-dtb-current-rockchip64=20.08.10 \
  linux-headers-current-rockchip64=20.08.10 \
  linux-image-current-rockchip64=20.08.10 \
  armbian-firmware=20.08.10 \
  linux-buster-root-current-helios64=20.08.10 \
  linux-u-boot-helios64-current=20.08.10
(but be carefull... next update would update them again)

afaik there is no way to go back to an 4.4 kernel. going from 4.4 to 5.8 is possible through armbian-config - but didnt work for me last time i tried

I was on IRC on #armbian and they suggested using armbian-config to downgrade. Downgraded succesfully to 5.8.14.

Looked good in the beginning but as soon as it got some load on it, it crashed.

Then I used armbian-config to downgrade to 4.4 and like you predicted : it didn't go well

Serial console now shows during bootup

Quote

U-Boot 2020.07-armbian (Oct 18 2020 - 23:38:26 +0200)

SoC: Rockchip rk3399
Reset cause: POR
DRAM: 3.9 GiB
PMIC: RK808
SF: Detected w25q128 with page size 256 Bytes, erase size 4 KiB, total 16 MiB
MMC:   mmc@fe320000: 1, sdhci@fe330000: 0
Loading Environment from MMC... *** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
Model: Helios64
Revision: 1.2 - 4GB non ECC
Net:   eth0: ethernet@fe300000
scanning bus for devices...
Hit any key to stop autoboot: 0
Card did not respond to voltage select!
switch to partitions #0, OK
mmc0(part 0) is current device
Scanning mmc 0:1...
Found U-Boot script /boot/boot.scr
3185 bytes read in 19 ms (163.1 KiB/s)
## Executing script at 00500000
Boot script loaded from mmc 0
117 bytes read in 15 ms (6.8 KiB/s)
7302624 bytes read in 716 ms (9.7 MiB/s)
22114312 bytes read in 2118 ms (10 MiB/s)
libfdt fdt_check_header(): FDT_ERR_BADMAGIC
No FDT memory address configured. Please configure
the FDT address via "fdt addr <address>" command.
Aborting!
2698 bytes read in 36 ms (72.3 KiB/s)
Applying kernel provided DT fixup script (rockchip-fixup.scr)
## Executing script at 09000000
## Loading init Ramdisk from Legacy Image at 06000000 ...
   Image Name:   uInitrd
   Image Type:   AArch64 Linux RAMDisk Image (gzip compressed)
   Data Size:    7302560 Bytes = 7 MiB
   Load Address: 00000000
   Entry Point: 00000000
   Verifying Checksum ... OK
ERROR: Did not find a cmdline Flattened Device Tree
   Loading Ramdisk to f57ef000, end f5ee5da0 ... OK
FDT and ATAGS support not compiled in - hanging
### ERROR ### Please RESET the board ###

And that is from booting from the internal emmc

Will have to find which jumper to install and boot from sdcard I guess

October 29, 2020

Have been testing my helios64 with 5x12TB drives in different setups.

omv & snapraid, but continuing a sync command uses so much ram that it become so slow that it might as well be described as unusable.

Next I tried omv & ZFS

Could get ZFS module compiled but then it wouldn't load the rest of utilities because missing dependencies (buster is real old)

So finally switched to try lizardfs. Crashed on me after a few hours, hooked up serial console.

Caught this error during the night: https://termbin.com/lsow

Is it easy to downgrade to 5.8.14 or even 4.x kernel ?

Sign In

fromport

Posts

Joined

Last visited

Content Type

Forums

Store

Crowdfunding

Applications

Events

Raffles

Community Map

Posts posted by fromport

kernel swap error with 5.8.16-odroidxu4 running on odroid HC2

Helios64 Support

Helios64 Support

Helios64 Support

Helios64 Support

Helios64 Support

Helios64 Support

Helios64 Support

Helios64 Support

Forums

My Activity Streams

Download

Store

Important Information