Meier

Members
  • Content Count

    17
  • Joined

  • Last visited


Reputation Activity

  1. Like
    Meier got a reaction from lanefu in Last effort regarding RockPro64 kernel panics   
    Thanks for all the hints, will try this out.
     
    To be honest, I can't imagine to be the only one having these issues. Maybe it's just that our usage of the RockPro64 board with a PCIe NVMe M.2 SSD is not very common? We suspected the PCIe / SSD combination to be the cause for a long time, for but cannot really confirm this. The issues are consistent over dozens of RockPro64 boards and many SSD brands. And I've heard from users that the PinebookPro (which uses similar hardware) also freezes under heavy load when run with an SSD.
  2. Like
    Meier got a reaction from Myy in Infrequent RockPro64 freeze (kernel NULL pointer)   
    The boot failure is no longer an issue currently, after updating to Armbian Ubuntu 18.04 and using overlayroot, but the nvme spinlock keeps happening frequently. I noticed that it is mostly during intense writing to the SSD, which heats up, the system freezes and never comes back up. Strangely enough, the SSD is then still under constant load and does not cool down.
     
    I ran a dedicted stresstest with the tool stressdisk with the following script (it includes our own fancontrol tool):
    #!/bin/bash apt update -y apt install -y tmux unzip smartmontools mkdir src && cd $_ wget https://github.com/ncw/stressdisk/releases/download/v1.0.12/stressdisk_1.0.12_linux_arm64.zip unzip stressdisk_1.0.12_linux_arm64.zip chmod +x stressdisk mv stressdisk /usr/sbin wget https://github.com/digitalbitbox/bitbox-base/releases/download/wip/bbbfancontrol.tar.gz tar xvf bbbfancontrol.tar.gz chmod +x bbbfancontrol mv bbbfancontrol /usr/sbin/ echo "/dev/nvme0n1p1 /mnt/ssd ext4 rw,nosuid,dev,noexec,noatime,nodiratime,auto,nouser,async,nofail 0 2" >> /etc/fstab mount -a mkdir -p /mnt/ssd/stressdisk tmux new-session -d 'watch smartctl -a /dev/nvme0n1' tmux split-window -h 'stressdisk cycle /mnt/ssd/stressdisk' tmux split-window -v 'htop' tmux split-window -v 'bbbfancontrol -v' tmux -2 attach-session -d Using the RockPro64 with Armbian and writing heavily on a Samsung SSD (connected with an PCIe M.2 adapter), I am able to consistently freeze the system within minutes. See video here:
     
    Jul 18 08:35:24 rockpro64 kernel: Unhandled fault: synchronous external abort (0x96000210) at 0xffffff8009cc801c Jul 18 08:35:24 rockpro64 kernel: Internal error: : 96000210 [#1] SMP Jul 18 08:35:24 rockpro64 kernel: Modules linked in: af_packet lz4hc lz4hc_compress zlib snd_soc_rockchip_hdmi_dp lzo rk_vcodec zram ip_tables x_tables autofs4 phy_rockchip_pcie Jul 18 08:35:24 rockpro64 kernel: CPU: 3 PID: 261 Comm: nvme Not tainted 4.4.182-rockchip64 #1 Jul 18 08:35:24 rockpro64 kernel: Hardware name: Pine64 RockPro64 (DT) Jul 18 08:35:24 rockpro64 kernel: task: ffffffc0e4e13800 task.stack: ffffffc0dfb9c000 Jul 18 08:35:24 rockpro64 kernel: PC is at nvme_kthread+0xac/0x1d8 Jul 18 08:35:24 rockpro64 kernel: LR is at nvme_kthread+0x78/0x1d8 Jul 18 08:35:24 rockpro64 kernel: pc : [<ffffff80087564cc>] lr : [<ffffff8008756498>] pstate: 20000145 Jul 18 08:35:24 rockpro64 kernel: sp : ffffffc0dfb9fd60 Jul 18 08:35:24 rockpro64 kernel: x29: ffffffc0dfb9fd60 x28: ffffffc0df92ed00 Jul 18 08:35:24 rockpro64 kernel: x27: 0000000002080020 x26: ffffffc0ebaa0228 Jul 18 08:35:24 rockpro64 kernel: x25: ffffff80091ff0f0 x24: ffffff80091ff0f0 Jul 18 08:35:24 rockpro64 kernel: x23: ffffff8008755310 x22: ffffff80091ff0d8 Jul 18 08:35:24 rockpro64 kernel: x21: 0000000000000007 x20: ffffff8009cc801c Jul 18 08:35:24 rockpro64 kernel: x19: ffffffc0ebd1d400 x18: 0000000000000000 ... ... Jul 18 08:35:24 rockpro64 kernel: Unhandled fault: synchronous external abort (0x96000210) at 0xffffff8009cc8000 Jul 18 08:35:24 rockpro64 kernel: Bad mode in Error handler detected, code 0xbf000002 -- SError Jul 18 08:35:24 rockpro64 kernel: BUG: spinlock lockup suspected on CPU#3, nvme/261 Jul 18 08:35:24 rockpro64 kernel: lock: 0xffffff8009141870, .magic: dead4ead, .owner: nvme/261, .owner_cpu: 3 Jul 18 08:35:24 rockpro64 kernel: CPU: 3 PID: 261 Comm: nvme Not tainted 4.4.182-rockchip64 #1 Jul 18 08:35:24 rockpro64 kernel: Hardware name: Pine64 RockPro64 (DT) Jul 18 08:35:24 rockpro64 kernel: Call trace: Jul 18 08:35:24 rockpro64 kernel: [<ffffff80080882b0>] dump_backtrace+0x0/0x1bc Jul 18 08:35:24 rockpro64 kernel: [<ffffff8008088490>] show_stack+0x24/0x30 Jul 18 08:35:24 rockpro64 kernel: [<ffffff8008587fec>] dump_stack+0x98/0xc0 Jul 18 08:35:24 rockpro64 kernel: [<ffffff8008106164>] spin_dump+0x84/0xa4 Jul 18 08:35:24 rockpro64 kernel: [<ffffff8008106300>] do_raw_spin_lock+0xdc/0x164 ... ... https://pastebin.com/1UeCiHDW
     
    Using the smartmontools and physical temperature measurements, we can observe that the first chip on the Samsung 970 EVO (1TB) SSD gets up to 95 degrees celsius hot before crashing. With the exact same image, board and adapter I tried other M.2 SSDs as well. Interestingly, the chip on a Samsung 970 EVO (500 GB) got up to 107 degrees celsius, but did not crash.
     
    Other SSD got not as hot and had no issues with stress tests:
    * Intel 660p (512 GB and 1 TB): ~80 degrees celsius, physical measurement only
    * Crucial P1 (512 GB and 1 TB): ~75 degrees celsius, physical measurement only
    * Western Digital Black (500 GB): ~70 degrees celsius, physical measurement only
     
    So this issue might be related to that specific SSD. I'm still surprised that the Samsung SSD do not throttle at all, IMHO they never should get that hot in the first place.
  3. Like
    Meier got a reaction from Igor in Infrequent RockPro64 freeze (kernel NULL pointer)   
    Unfortunately, I still keep getting the freezes from time to time. Two thinks I noticed:
    When running `stress -i 4 -d 4` I can crash the board in ~3 minutes very reliably. But not any board, just this one, but even without any additional peripherals like the SSD plugged in. As it's running on eMMC, it might be this particular eMMC that causes the crash. This let me to build a latest Armbian Ubuntu 18.04 image with `overlayroot` to eliminate all I/O to the eMMC. This board that has been crashing has now been running for ~2 days. I'll try to gather more data in case the boards crash with the build from the current master branch.
     
    Just FYI in case you're curious: this is the project I'm working on https://github.com/digitalbitbox/bitbox-base.
  4. Like
    Meier got a reaction from Igor in Integration of atomic update functionality (eg. with swupdate or rauc)   
    Just wanted to let you know that we collaborated with Mender and Armbian for RockPro64 is now fully supported by the Mender.io open source client-server manager for over-the-air software updates. It should be relatively straight-forward to exend this approach to other boards as well.
     
    https://mender.io/
    https://github.com/mendersoftware/mender-convert/pull/103
  5. Like
    Meier reacted to martinayotte in Build for RockPro64 works for BRANCH "dev", but fails for "default"   
    The error is rather this one :
    net/wireguard/ratelimiter.c:60:2: error: implicit declaration of function ‘call_rcu’; did you mean ‘call_srcu’? [-Werror=implicit-function-declaration] call_rcu(&entry->rcu, entry_free); ^~~~~~~~ call_srcu cc1: some warnings being treated as errors You can try to add at command line "WIREGUARD=no" , it will skip this part of the build ...
  6. Like
    Meier reacted to lanefu in Build for RockPro64 works for BRANCH "dev", but fails for "default"   
    See clean_level option

    https://docs.armbian.com/Developer-Guide_Build-Options/

    Sent from my SM-G950U using Tapatalk

  7. Like
    Meier got a reaction from lanefu in Build for RockPro64 works for BRANCH "dev", but fails for "default"   
    Thanks a ton!