Jump to content

Helios64 - Armbian 23.08 Bookworm issues (solved)


ebin-dev
Go to solution Solved by ebin-dev,

Recommended Posts

Posted (edited)

 

7 hours ago, mrjpaxton said:

Yes, I have also experienced a crash without the patch. It crashed when I tried to attach a HDD hard drive with a SATA-to-USB3 device for backup (I made sure the AC adapter was used for it), so it is more "stable" and responsive than before, but not completely yet. Would it be helpful to disable Armbian's ramlog and get journald to write any dumps to storage so that I can check the logs last boot, or are you guys already aware of *all* of the issues that cause these random crashes/freezes?

 

With the voltage changes proposed by @prahal (see 'opp-table 1' earlier in this thread) my system is 100% stable, absolutely reliable (using linux 6.6.x or 6.10.y). There are also others observing this.

 

Attaching drives is handled by the kernel. If that is not working, it may be that the relevant (hotplug) options are not configured.

Edited by ebin-dev
Link to comment
Share on other sites

Armbian & Khadas are rewarding contributors

On 8/4/2024 at 4:03 AM, mrjpaxton said:

Yes, I have also experienced a crash without the patch. It crashed when I tried to attach a HDD hard drive with a SATA-to-USB3 device for backup (I made sure the AC adapter was used for it), so it is more "stable" and responsive than before, but not completely yet. Would it be helpful to disable Armbian's ramlog and get journald to write any dumps to storage so that I can check the logs last boot, or are you guys already aware of *all* of the issues that cause these random crashes/freezes?

 

It might be of interest in case it is not the same random corruption, then we would be able to fix the kernel. The random corruption (I believe at the CPU stage) most of the time we get a weird unrecognized instruction, but the issue still looks random (even if way more likely when btrfs scrub or zfs check).

I really need to talk to hardware guys from armbian to sort out what to take note of (USB devices, power bank, PCI stress, ...).

 

Either way you are way better with a USB c to serial cable to a computer to get the logs. You can even save the output from your serial terminal application with the "script" command, maybe in tmux/screen session.

 

Link to comment
Share on other sites

Hello Everyone,

I have installed 6.6.39, and everything is running fine, except for one issue that I simply can't work out. 
My setup may be a bit unusual, instead of using the internal M2, I have an USB-SATA adapter plugged in to the front USB header on the board. This somehow prevents booting from SD card, and I simply can't work out what goes wrong. If I unplug the drive, all is well.

Did anyone encounter a similar issue?
 

disk.log no disk.log

Link to comment
Share on other sites

@Balog Dániel How did you install it?

 

Since `armbian-config` is undergoing maintenance and re-haul, the best way to do it may be using `sudo nand-sata-install` at the moment, which actually seems to be part of what `armbian-config` uses anyway.

 

Follow either System on USB or SATA (Including M.2 SATA) or Transfer rootfs from eMMC to SATA or USB depending if you want to transfer or install the OS from scratch, and skip to Step 3.

 

Otherwise, if that's what you did, and that doesn't help... after reading your logs at the end, seems like the I2C driver gets stuck. I've read that it might be possible to solve the problem by disabling I2C. How to do that specifically for the Kobol, I'm not exactly sure, but I don't even have I2C drivers on my install when I check `lsmod | grep rk3x-i2c`. Not sure what it's needed for, so it should be possible to disable it with a blacklist config in `/etc/modprobe.d/` I'm guessing.

Edited by mrjpaxton
Link to comment
Share on other sites

@mrjpaxton I flashed the SD card using etcher, (Armbian_24.5.3_Helios64_bookworm_current_6.6.36_minimal.img), apt-get update && upgrade.

 

Based on the schematics, my best guess is I2C is somehow related to the on-board battery, that's why I was hesitant to disable it (I'll take a look without the SSD drive if it's loaded).

 

I had a misadventure with flashing a corrupted disk image to MMC, that was fixed by the transfer process you have noted (the mmc currently has Armbian_21.08.2_Helios64_buster_current_5.10.63) but my main concern is to keep everything on the SD card, so I can make regular backups easily, and simply replace the card if it breaks (or revert to a backup if an upgrade somehow breaks the system, as that's what got me here in the first place). Conceptually, that should be something that should be doable right?

 

 

 

 

Link to comment
Share on other sites

@Balog Dániel Okay, just so I am 100% clear, you say you want to boot from SD card, but why is the USB-to-SATA adapter plugged in when you are booting? Are you also trying to boot from a HDD/SSD or something? Which one would you actually prefer to boot from? Unless the SD card is above Class 10 (whatever the SDXC specifications support), it's going to be slower unlike your other 3 options: eMMC, M.2 or a SATA device over USB3.

 

You could also expose your eMMC by putting a special SD card inside. See Step 1. This will allow Offline copies since you can mount the partitions to a folder on the computer, and all you need to do is boot it with that SD card and plug it using the USB-C cable for the back.

 

But since the Kobol runs Linux, normally any Linux backup solutions will work, including `rsync`. It should be good enough to do Online backups. If you choose to use `rsync` to backup, I recommend having a script that can do `fsfreeze -f /` followed with the folders you want to rsync over like /boot, /etc, /home, /root, /var, and then `fsfreeze -u /`. It is possible, but not advisable, to convert Ext4 to Btrfs, and then you could copy over read-only snapshots as backups, which means you don't have to do the `fsfreeze` parts, but it's a pain once you have to upgrade, because Armbian still recommends upgrading with reflashes. However, it will flash an Ext4 filesystem again. I really hope they start using Btrfs sometime on some of these larger capacity devices...

 

Link to comment
Share on other sites

@mrjpaxton

 

The setup is 5 disk RAID6 for 'cold' storage (with ~20 minute disk spin down) that I access rarely, and a 'hot' SSD to store downloads, temp files, and anything that's not OS related.
The OS should be separated from this (previously it was on MMC, but as I did not have a backup, I lost access due to something and had to reflash, thus losing all config)

Regarding the other boot option, Neither HDDs or SSD should not have any boot related files, just data.

As for the main question of why it's plugged in in the first place, I use the USB header for the front panel for an internal SSD, but as the board's SSD slot can't be used if you use all 5 disks, this was the best workaround I could find. I would really prefer to boot from SD, as I don't know the expected life of the eMMC, and how badly I may have abused it, so it's safer to expect SD card to fail and plan accordingly.

 

SD Card is UHS Class 3, and I am not doing anything that should need high IO for the OS. 

Link to comment
Share on other sites

On 8/14/2024 at 3:52 PM, Balog Dániel said:

The setup is 5 disk RAID6 for 'cold' storage (with ~20 minute disk spin down) that I access rarely, and a 'hot' SSD to store downloads, temp files, and anything that's not OS related.
The OS should be separated from this (previously it was on MMC, but as I did not have a backup, I lost access due to something and had to reflash, thus losing all config)

Regarding the other boot option, Neither HDDs or SSD should not have any boot related files, just data.

As for the main question of why it's plugged in in the first place, I use the USB header for the front panel for an internal SSD, but as the board's SSD slot can't be used if you use all 5 disks, this was the best workaround I could find. I would really prefer to boot from SD, as I don't know the expected life of the eMMC, and how badly I may have abused it, so it's safer to expect SD card to fail and plan accordingly.

 

SD Card is UHS Class 3, and I am not doing anything that should need high IO for the OS. 

 

Thanks for the logs.

I don't know what is wrong with your boot.

 

You told that your eMMC setup broke (what you call MMC, I guess)? It was probably the eMMC breakage that affected most rk3399 boards and requires a property to be added for eMMC hs400 to boot. Since then hs400 has been disabled. I will reenable it after adding the property to the helios64 Armbian dts.

 

If I understood correctly you also would like to boot from SD (and that you are currently booting from eMMC?). But when you plug a USB external SSD (that you use to store downloads, temp files, and anything not OS related) into the front USB socket boot fails (you said "I use the USB header for the front panel for an internal SSD", I guess you mean an external USB SSD, not an internal SSD).

And you have 5 disks in the internal SATA slots as a RAID6.

 

Mind the bootloader will stay on eMMC but you can move the OS to SD. This likely won't solve your boot with a USB external SSD plugged into the front USB socket...

 

Mind I have multiple USB external HDDs plugged into the back USB socket and the boot is working.

 

What is the amperage required by the external USB drive you plug into the front USB socket? (this socket can output max 900mA).

 

ff3d0000.i2c is related to usb-c:

[    6.412784] OF: graph: no port node found in /i2c@ff3d0000/typec-portc@22

 

Link to comment
Share on other sites

Currently I have Helios64_bullseye_current_5.10.63 on the eMMC, and if I remove the SD card, everything works, and I can boot with or without the SSD plugged in. The SD card has bookworm_6.6.36 (and upgrades), the logs are from OS on SD card with boot with and without SSD. 

The external\internal SSD is that under normal operation, I would have the SSD plugged in, within the closed case (thus 'internal' SSD), that's why it's not trivial to simply unplug it when I need to restart. 

 

I have also tested to plug in the SSD after boot (works flawlessly) and to plug this into the back USB (fails, but it just thew an error that COM is disconnected, so not really sure what the actual error is, will keep digging).
The SSD notes 1.5 A & 3.3 V, but I can't find any markings on the adapter itself (and the details on Amazon where I got it from are not specifying anything)

Link to comment
Share on other sites

On 8/20/2024 at 6:32 PM, Balog Dániel said:

I have also tested to plug in the SSD after boot (works flawlessly) and to plug this into the back USB (fails, but it just thew an error that COM is disconnected, so not really sure what the actual error is, will keep digging).
The SSD notes 1.5 A & 3.3 V, but I can't find any markings on the adapter itself (and the details on Amazon where I got it from are not specifying anything)

 

The USB board is USB 3.0 thus should be 0.9A max and 5V.

Your SSD is 1.5*3.3 ie 4.95W, vs USB 3.0 max 4.5W. Though I believe the SSD might not always consumes the maximum. Could be later kernel use the SSD to it's maximum thus makes this SSD consumes too much. It is not given there is a bug in newer kernel.

Also it could be this extra consumption only destabilize the board at boot because the board has other components also draining more current at boot.

 

There might be tests that can help sort this out.

I think there might be ways to lower the current consumption of the M.2 SSD (maybe by lowering the libata link speed via a kernel boot param).

 

We could also try to find another USB 3 device which also stretch the limit. And check if the behavior under 5.10 and 6.6 is reproducible.

 

Can you give a link to the SSD you put on the USB M.2 board? If it is not too expensive I could try to reproduce the setup. I have a USB multimeter and could check if the USB 3 limit is really overflowed (and if the current drawn is different for 5.1 and 6.6). But I won't be able to tell soon.

 

COM error is likely your serial console program. Might be related to the helios64 board crashing on the other side. Mind that hardware hang just freeze the board, no messages are outputted. In the case I encountered it means too low voltage for load.

 

When you tell you plugged the SSD in the back USB, do you mean you plugged it after boot or before boot when it failed?

Link to comment
Share on other sites

On 8/4/2024 at 3:03 AM, mrjpaxton said:

Yes, I have also experienced a crash without the patch. It crashed when I tried to attach a HDD hard drive with a SATA-to-USB3 device for backup (I made sure the AC adapter was used for it), so it is more "stable" and responsive than before, but not completely yet.

Did you mean you also experienced a crash "with" the patch when you plugged a SATA to USB3 device? (You wrote "without the patch").

Do you know the current drain of your device? ( Is the device an enclosure with a non bundled drive? Is it a 2.5 inch HDD?)

Link to comment
Share on other sites

$ grep I2C_RK3X /boot/config-6.9.3-edge-rockchip64
CONFIG_I2C_RK3X=y
On 8/14/2024 at 1:31 AM, mrjpaxton said:

I don't even have I2C drivers on my install when I check `lsmod | grep rk3x-i2c`. Not sure what it's needed for

 

I2C is a bus to let a microprocessore communicate with other circuits.

It is enabled and cannot be disable on armbuan without rebuilding your kernel.

If I remind well it is even required on helios64 for the PMIC (the circuitry that control power, reboot, etc).

Likely you get i2c errors because the board went unstable hardware wise.

 

Link to comment
Share on other sites

On 8/20/2024 at 10:52 PM, Balog Dániel said:

For the serial console, I am using Putty, I would assume that it's more of a board issue, but when I reconnect, i get some additional communication.

For the back USB, I plugged it in prior to power on.

 

Regarding the boot parameter, is that difficult to set up?

 

Which kind of additional communication from serial terminal?

 

The boot parameter I am thinking of might have no way to be applied to a USB SATA adapter.

This is libata.force=3.0Gbps or 1.5Gbos. But I only used it for PCIe SATA.

 

You might want ro try a dual power USB 3 cable to have enough current for the USB M.2 data drive.

 

Have you tried ebin-dev patched DTS with your USB m.2 SATA device attached (might require the edge kernel though)?

 

Edit: can you try with your USB adapter plugged to an USB 3 Power hub itself plugged to the helios64?

Link to comment
Share on other sites

10 hours ago, magostinelli said:

following this threat, I disabled armbian update and manually installed 6.6.29 kernel with custom frequency policy.

 

I read that now Armbian officially support helios64, what do you recommend now?

 

Don't expect much yet. If you were already on 6.6 you should not see changes.

By official support, it means I took maintainership.

I have a fix for eMMC hs400 pending but not in armbian, and a fix for the network interface MAC to fix a regression but only in 6.9 and up (armbian edge).

About the frequency policy no fix yet. Tentative hacks in the dtb voltage for the big CPUs seem to help but nothing in Armbian at this stage.

Link to comment
Share on other sites

I hope all fixes and hacks will be include soon because the result is amazing to my side 😉

 

Welcome to Armbian-unofficial 24.5.3 Bookworm with Linux 6.6.32-current-rockchip64

System load:   16%               Up time:       91 days 5:31

Edited by BipBip1981
Link to comment
Share on other sites

On 9/6/2024 at 11:42 AM, magostinelli said:

Your hacks will be included or not?

Seeing how well the voltage hacks works on your boards I will include them in armbian (even though I still get crashes on my own board with only this 75mV hack, even though way less).

But not upstream (I am close to sending the eMMC fix upstream, I only need to read the backlog there anew to avoid too much back and forth so the patch is up to the standard). at least until I sort out why they work (I was told to try them by a board designer that told me there was a design issue with the voltage regulator which I am not up to sort out. But I checked other rk3399 armbian boards'schematics and as far as I understand they have the same design. So either all of these boards are broken and are somewhat stable for an unknown reason (maybe less stress on the big CPUs) or I misunderstood what was wrong with helios64 hardware. I need to talk to an hardware engineer.

Also I try to sort out a few other issues with other softwares and hardwares. And a few other issues. But I expect to have those in for mid October, maybe earlier.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines