fri.K Posted September 14, 2020 Posted September 14, 2020 Hi, as of my problem with Helios4 rebooting on heavy NFS load I couldn't make any progress, even worse it even stopped booting. Disabling watchdog did not help as serial also freezes and I was unable to do anything, there was no even blinking LED as I remember. That forced me to use another SD card with fresh official Debian 10 installation image and configure it once again. There is only basic configuration, only SSSD for LDAP and NFS server, no docker and no http services but NAS runs rock solid for 6 days now even it's some cheep 4GB uSD card. So I still don't know if it's previous Samsung Evo 32GB sd card, or specific configuration, or some issues related to OS upgrades but as there is no spontaneous reboots any more I'm happy that it's not electronics ailing. Hope my new Helios64 will also work even more stable If I manage to debug more on my ex problem I'll write, but I treat it as super low important from now. My advice for people with spontaneous reboots on Helios4: try to use another SD card with fresh OS, do minimal configuration and try to stress it for few days.
Mangix Posted September 14, 2020 Posted September 14, 2020 On 9/5/2020 at 9:38 PM, Werner said: If they have the have the exact same ratings in voltage and current I'd say (and assumed you are a bit familiar with the basics) pickup a couple fitting connectors and solder them to the wires. Ah never mind. 19.5 volts vs 12 volts.
gprovost Posted September 16, 2020 Author Posted September 16, 2020 @PEW Honestly you should do a fresh install, we don't recommend upgrade from OMV4 to OMV5, it was too much of a big transition between the 2 version to result in a proper upgrade, without even taking in consideration OS upgrade (Stretch to Buster).
PEW Posted September 16, 2020 Posted September 16, 2020 @gprovost I agree with you but I just had the system freeze and had to do a hard reboot(pull power plug/reinsert). I turned watchdog service back on to auto recover.
PEW Posted September 16, 2020 Posted September 16, 2020 @gprovost I will limp along until helios64 has second batch out.
pekkal Posted September 17, 2020 Posted September 17, 2020 On 9/1/2020 at 7:22 AM, gprovost said: @fri.K That's very interesting info and not the first time NFS has been pointed as a possible root cause : Have you tried some different NFS block sizes (wsize and rsize) settings on your client side ? or maybe reduce the number of NFS daemon ? Because most probably under NFS load with default settings the system reach a unresponsive state and the hw watchdog kicks in and reset the system. Check if watchdog service is running (systemctl status watchdog.service), if yes then you could also disable it to remove it from the equation during your troubleshooting. @gprovost FYI: I've had zero issues past 6 months after starting umounting MacOS NFS mounts when no longer using them. When I forget to do this (which is often) & Mac goes to sleep a few times there will be tens of NFS daemons running on the Helios/Debian server. I do believe there is an issue related to NFS between MacOS & Debian. Issuing 'netstat | grep nfs' every now and then might help to confirm this. 1
soydemadrid Posted October 4, 2020 Posted October 4, 2020 Hi I wondered if anyone could please help me out. I have two Helios4 NAS drives and one seems to have blown and won't work. The other is fine. I've emailed Kobol too just to see if they can help but I hoped that maybe someone here may have a spare main board they can sell me or offer advice on how I can have the unit repaired? Thanks if anyone can please help. I'll be happy to buy your old unit etc if you're upgrading to the new Helios NAS or such anyway... Or if there is a way I can test where it is blown and replace the components with someone's help?? Thanks again
gprovost Posted October 5, 2020 Author Posted October 5, 2020 @soydemadrid By any chance you have a voltmeter ? Could you measure DC voltage on molex power connector shown on photo below? Expected measured value, on 5V rail: 4.90 V - 5.20 V on 12V rail: 11.90 V - 12.5 V If 12V is outside that range, that means the power supply is faulty.
soydemadrid Posted October 5, 2020 Posted October 5, 2020 7 hours ago, gprovost said: @soydemadrid By any chance you have a voltmeter ? Could you measure DC voltage on molex power connector shown on photo below? Expected measured value, on 5V rail: 4.90 V - 5.20 V on 12V rail: 11.90 V - 12.5 V If 12V is outside that range, that means the power supply is faulty. Hi thanks for helping. I've checked with a Multimeter and I'm only getting 0.3v or something very low on those pins. Basically no voltage at all - the PSU is ok though as it works with my other Helios4 no problem. So it does seem to be a dead Helios4 board if there is any way I can isolate the issue and fix it? Thanks again for any help
PEW Posted October 5, 2020 Posted October 5, 2020 @gprovost I rebuilt my helios4 with a clean buster build on another sd card and configured the dockers,services, etc. It seems to be a running better now. I would say to do not do a upgrade from omv4 to 5 even thou it is doable. The watchdog service will reboot a hardware watchdog for orion because it will crash somehow. My suggestion is for upgrade process is get another sd card and rebuilt using the old configs for the original sd card. Thanks
gprovost Posted October 6, 2020 Author Posted October 6, 2020 21 hours ago, soydemadrid said: Hi thanks for helping. I've checked with a Multimeter and I'm only getting 0.3v or something very low on those pins. Basically no voltage at all - the PSU is ok though as it works with my other Helios4 no problem. So it does seem to be a dead Helios4 board if there is any way I can isolate the issue and fix it? Thanks again for any help Humm if you still get 0V on 12V molex but you are sure that the PSU is ok, then it could the 0-ohm resistors R106 or R107 that are dead, maybe due to a power surge. Can you do a continuity test with your multi-meter on those 2 resistor (highlighted in yellow below) ? If you confirm R106 and R107 doesn't let juice go through then you can bypass them by doing 2x dirty solder bridges as shown with the 2 red lines below.
jimbolaya Posted October 15, 2020 Posted October 15, 2020 (edited) The sdcard in my Helios4 got corrupted on a recent upgrade, and I'm a few versions behind so I decided to get the latest (Armbian 20.08), write it to the card and move stuff over as needed. Unfortunately, I found that due to a combination of factors, that networking is totally bunged up due to a choice that doesn't make a lot of sense for something that is generally supposed to be a server in a static place. I was trying to use NetworkManager to set a static IP, and went through a guide out there to do that since it's not something I do every day. I get to "nmcli con reload" to see if it will take the new settings, but no go. Fine, I'll reboot. I found to my great surprise that the MAC address had changed. In the boot messages on the serial console was: Warning: ethernet@70000 (eth1) using random MAC address - 0e:95:58:4b:71:2b eth1: ethernet@70000 What!!!! So, for now, I've disabled NetworkManager and decided to go with /etc/network/interfaces. However, this will impact network monitoring. I feel like randomizing the MAC address is inappropriate for this application (though admittedly, not a bad idea for most boards that run armbian). Is there a setting in a file in /boot I can change to prevent randomization of the MAC address? Thank you for your help. Edited October 15, 2020 by jimbolaya Figured out what was going on.
IcerJo Posted October 21, 2020 Posted October 21, 2020 On 10/6/2020 at 4:08 AM, gprovost said: Humm if you still get 0V on 12V molex but you are sure that the PSU is ok, then it could the 0-ohm resistors R106 or R107 that are dead, maybe due to a power surge. Can you do a continuity test with your multi-meter on those 2 resistor (highlighted in yellow below) ? If you confirm R106 and R107 doesn't let juice go through then you can bypass them by doing 2x dirty solder bridges as shown with the 2 red lines below. Seems like my Helios4 may have the same issue, I've at least diagnosed it to the point that I know it's a power issue, If I mess with the cables, I can get 2 drives to boot, but the other 2 either wont show or if i get it going it seems like the system itself wont boot up. I am currently looking for my multimeter to test this part out. I don't currently own a soldering iron but can quickly get one, Any particular Solder that you recommend for the Dirty Solder?
gprovost Posted October 22, 2020 Author Posted October 22, 2020 8 hours ago, IcerJo said: Seems like my Helios4 may have the same issue, I've at least diagnosed it to the point that I know it's a power issue, If I mess with the cables, I can get 2 drives to boot, but the other 2 either wont show or if i get it going it seems like the system itself wont boot up. I am currently looking for my multimeter to test this part out. I don't currently own a soldering iron but can quickly get one, Any particular Solder that you recommend for the Dirty Solder? Before going any further, yes you need a multi-meter to narrow down the issue which is probably just a faulty PSU and need to be replaced. 1
IcerJo Posted October 23, 2020 Posted October 23, 2020 21 hours ago, gprovost said: Before going any further, yes you need a multi-meter to narrow down the issue which is probably just a faulty PSU and need to be replaced. I Have ordered a New Power Supply and will test the system with it when it comes in. It was kinda hard to find a 4pin with 12v 8 amps, I'm still searching for my multimeter but can't seem to find it. I do appreciate your guidance and support! If the new Power Supply has the same issue which (which I doubt), I will then order a new Multimeter and do some testing when I get back home on Sunday. Thanks Again!
gprovost Posted October 23, 2020 Author Posted October 23, 2020 Euhhh wait, can you share the link of the PSU you buy because it's not always guaranty it's the same pin out (unless it's PSU for synology). Here a good replacement : https://www.amazon.com/TAIFU-4-Pin-12V-8-33A-Replacement/dp/B07NCG1P8X
helios4noob Posted October 23, 2020 Posted October 23, 2020 I just pre- emptively changed out my PSU. It's been running for over a year (I setup the helios kind of late) and I didn't want to take any risks. I suggest anyone who hasn't already changed out their psu at this point to do so. Got the Taifu one as well. It works great and come with a decent warrant as well.
FredK Posted October 24, 2020 Posted October 24, 2020 LEDs no more working On Helios 4 I'm using Armbian Buster 20.08.13 (5.8.16-mvebu) and OMV 5.5.12 Usul. Shortly after the update to 20.08.13 I was notified about updates to 20.08.14 which were installed smoothly but "uname -a" still shows 20.08.13. Since that time the LEDs are not working anymore as you can see them normally blinking periodically (LED1) and on SATA activity (LED3 - LED6). Because http://dl.armbian.com/helios4/ doesn't mention 20.08.14 I fear that I installed updates, possibly later withdrawn (?).
Igor Posted October 24, 2020 Posted October 24, 2020 3 hours ago, FredK said: doesn't mention 20.08.14 I fear that I installed updates, possibly later withdrawn (?). That means some other kernel was updated or just board support package of this or some other board. Armbian support lots of different hardware https://www.armbian.com/download/ and each of them can have / have their own problems.
pekkal Posted October 25, 2020 Posted October 25, 2020 Re: potential NFS related MacOS issues, just for info: I've now had my Mac on all weekend, going to sleep maybe ten times: there are now 425 NFS connections to my Helios 4 / Debian Buster server, all allocated in the 'well known ports' (0-1023) area. The impact on memory is minimal (20-30MB), but I wonder what happens when the below 1024 port area is exhausted? Instead of finding out, I'll umount the three mounts...
IcerJo Posted October 25, 2020 Posted October 25, 2020 On 10/22/2020 at 9:21 PM, gprovost said: Euhhh wait, can you share the link of the PSU you buy because it's not always guaranty it's the same pin out (unless it's PSU for synology). Here a good replacement : https://www.amazon.com/TAIFU-4-Pin-12V-8-33A-Replacement/dp/B07NCG1P8X Thank you! I canceled delivery of the psu I ordered and then ordered the one you linked, It has arrived today and the issue is now resolved! Thank you for saving me from making a grave mistake! 1
Heisath Posted October 26, 2020 Posted October 26, 2020 On 10/24/2020 at 12:07 PM, FredK said: LEDs no more working On Helios 4 I'm using Armbian Buster 20.08.13 (5.8.16-mvebu) and OMV 5.5.12 Usul. Shortly after the update to 20.08.13 I was notified about updates to 20.08.14 which were installed smoothly but "uname -a" still shows 20.08.13. Since that time the LEDs are not working anymore as you can see them normally blinking periodically (LED1) and on SATA activity (LED3 - LED6). Because http://dl.armbian.com/helios4/ doesn't mention 20.08.14 I fear that I installed updates, possibly later withdrawn (?). Hi FredK, I confirmed your problem. There seems to be some issue on the pin assignment of the LEDs. Can you tell us which kernel version you are using? (output of uname -a and/or armbianmonitor -u) As a quick fix you can go back to 20.08.13 or try to just downgrade the kernel via armbian-config. Regards, Heisath
FredK Posted October 26, 2020 Posted October 26, 2020 vor 2 Stunden schrieb Heisath: Hi FredK, I confirmed your problem. There seems to be some issue on the pin assignment of the LEDs. Can you tell us which kernel version you are using? (output of uname -a and/or armbianmonitor -u) As a quick fix you can go back to 20.08.13 or try to just downgrade the kernel via armbian-config. Regards, Heisath Hi Heisath, thank you for your post. There are slightly contradictory informations. The welcome message says "Welcome to Armbian 20.08.17 Buster with Linux 5.8.16-mvebu". "uname -a" says "Linux helios4 5.8.16-mvebu #20.08.13 SMP Mon Oct 19 08:14:25 CEST 2020 armv7l GNU/Linux". i.e. 20.08.17 <-> 20.08.13 Using armbian-config to re-install "linux-image-dev-mvebu=20.08.13 5.8.16-mvebu" resulted in an unbootable system. I had to restore the last backup (SDcard). Now I'm back in business, but as before no LED support. Regards FredK
Heisath Posted October 27, 2020 Posted October 27, 2020 Hi FredK, The contradicting armbian version numbers are thing open to discussion. Basically armbian uses many packages and not all of them get updated everytime -> different version number exist. This is synced at every major release. The fact alone that the linux kernel 5.8 was released with 20.08.xx was a mistake on armbian side. Sorry for the problems we caused. If you can live with the LEDs not working a fix will be available at the latest for the 20.11 release (end of November). If you need the LEDs working now you can switch to kernel version 5.4.xx (via armbian-config, pick it by the kernel version on the right. Not the armbian version) there everything should work. ( I tested and can confirm the bug is not yet present in LK5.4.69). As said the release of 5.8 on 20.08.xx was a mistake. Sorry. Heisath 1
FredK Posted October 27, 2020 Posted October 27, 2020 vor 5 Minuten schrieb Heisath: The fact alone that the linux kernel 5.8 was released with 20.08.xx was a mistake on armbian side. Sorry for the problems we caused. If you can live with the LEDs not working a fix will be available at the latest for the 20.11 release (end of November). Hi Heisath, thank you for the clarifications. Sure, I can live with the LED problem. My original post was meant to be used as an indication that something must have gone wrong in the last few updates. Let's wait for 20.11. FredK
Igor Posted October 27, 2020 Posted October 27, 2020 31 minutes ago, FredK said: that something must have gone wrong in the last few updates The problem behind is sadly not "from yesterday" but is systematical, briefly described in https://armbian.atlassian.net/browse/AR-492 This upgrade should wait, but that would be an exception in a bugfix upgrade. Which has to be done manually and when here we have to rely on people (my fault), things easily go south. Thank you for your patience and sorry that you had troubles with.
FredK Posted October 27, 2020 Posted October 27, 2020 vor 49 Minuten schrieb Igor: The problem behind is sadly not "from yesterday" but is systematical, briefly described in https://armbian.atlassian.net/browse/AR-492 This upgrade should wait, but that would be an exception in a bugfix upgrade. Which has to be done manually and when here we have to rely on people (my fault), things easily go south. Thank you for your patience and sorry that you had troubles with. I appreciate the work and support regarding armbian. I'm not an advocate of "Never touch a running system!", rather the opposite. I want to learn and to gain experience. Then it might happen that I get into minor trouble sometimes. But I learn most in such situations. FredK
multi.flexi Posted October 27, 2020 Posted October 27, 2020 Hi, after update, I cannot copy files to SMB server running on Helios4. I get these kernel errors. Oct 27 11:52:50 helios4 kernel: [ 333.770181] ------------[ cut here ]------------ Oct 27 11:52:50 helios4 kernel: [ 333.770198] WARNING: CPU: 0 PID: 2101 at fs/btrfs/disk-io.c:516 btree_csum_one_bio+0x167/0x190 Oct 27 11:52:50 helios4 kernel: [ 333.770199] Modules linked in: softdog rfkill zstd zram zsmalloc orion_wdt at24 cpufreq_dt pwm_fan lm75 marvell_cesa libdes ip_tables x_tables autofs4 raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod Oct 27 11:52:50 helios4 kernel: [ 333.770229] CPU: 0 PID: 2101 Comm: btrfs-transacti Tainted: G W 5.8.16-mvebu #20.08.13 Oct 27 11:52:50 helios4 kernel: [ 333.770231] Hardware name: Marvell Armada 380/385 (Device Tree) Oct 27 11:52:50 helios4 kernel: [ 333.770244] [<c010c88d>] (unwind_backtrace) from [<c0108caf>] (show_stack+0xb/0xc) Oct 27 11:52:50 helios4 kernel: [ 333.770255] [<c0108caf>] (show_stack) from [<c04f11e9>] (dump_stack+0x75/0x84) Oct 27 11:52:50 helios4 kernel: [ 333.770264] [<c04f11e9>] (dump_stack) from [<c011b91b>] (__warn+0x8b/0x9c) Oct 27 11:52:50 helios4 kernel: [ 333.770271] [<c011b91b>] (__warn) from [<c011b96d>] (warn_slowpath_fmt+0x41/0x7c) Oct 27 11:52:50 helios4 kernel: [ 333.770277] [<c011b96d>] (warn_slowpath_fmt) from [<c03563cf>] (btree_csum_one_bio+0x167/0x190) Oct 27 11:52:50 helios4 kernel: [ 333.770283] [<c03563cf>] (btree_csum_one_bio) from [<c03569a9>] (btree_submit_bio_hook+0x81/0x90) Oct 27 11:52:50 helios4 kernel: [ 333.770293] [<c03569a9>] (btree_submit_bio_hook) from [<c03783b1>] (submit_one_bio+0x19/0x30) Oct 27 11:52:50 helios4 kernel: [ 333.770299] [<c03783b1>] (submit_one_bio) from [<c037bb4b>] (submit_extent_page+0xcb/0x16c) Oct 27 11:52:50 helios4 kernel: [ 333.770304] [<c037bb4b>] (submit_extent_page) from [<c037f9c3>] (write_one_eb+0x12f/0x204) Oct 27 11:52:50 helios4 kernel: [ 333.770309] [<c037f9c3>] (write_one_eb) from [<c037fbd3>] (btree_write_cache_pages+0x13b/0x29c) Oct 27 11:52:50 helios4 kernel: [ 333.770319] [<c037fbd3>] (btree_write_cache_pages) from [<c01d8daf>] (do_writepages+0x2f/0x94) Oct 27 11:52:50 helios4 kernel: [ 333.770326] [<c01d8daf>] (do_writepages) from [<c01d1dfd>] (__filemap_fdatawrite_range+0xa1/0xc8) Oct 27 11:52:50 helios4 kernel: [ 333.770331] [<c01d1dfd>] (__filemap_fdatawrite_range) from [<c01d1e79>] (filemap_fdatawrite_range+0x15/0x18) Oct 27 11:52:50 helios4 kernel: [ 333.770338] [<c01d1e79>] (filemap_fdatawrite_range) from [<c035bb8f>] (btrfs_write_marked_extents+0x77/0x144) Oct 27 11:52:50 helios4 kernel: [ 333.770344] [<c035bb8f>] (btrfs_write_marked_extents) from [<c035bc89>] (btrfs_write_and_wait_transaction+0x2d/0x88) Oct 27 11:52:50 helios4 kernel: [ 333.770349] [<c035bc89>] (btrfs_write_and_wait_transaction) from [<c035d14f>] (btrfs_commit_transaction+0x5fb/0x934) Oct 27 11:52:50 helios4 kernel: [ 333.770355] [<c035d14f>] (btrfs_commit_transaction) from [<c03593f5>] (transaction_kthread+0x125/0x160) Oct 27 11:52:50 helios4 kernel: [ 333.770362] [<c03593f5>] (transaction_kthread) from [<c0132a27>] (kthread+0xe7/0x108) Oct 27 11:52:50 helios4 kernel: [ 333.770368] [<c0132a27>] (kthread) from [<c0100159>] (ret_from_fork+0x11/0x38) Oct 27 11:52:50 helios4 kernel: [ 333.770370] Exception stack(0xdab5ffb0 to 0xdab5fff8) Oct 27 11:52:50 helios4 kernel: [ 333.770374] ffa0: 00000000 00000000 00000000 00000000 Oct 27 11:52:50 helios4 kernel: [ 333.770378] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Oct 27 11:52:50 helios4 kernel: [ 333.770382] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 Oct 27 11:52:50 helios4 kernel: [ 333.770384] ---[ end trace 1623043a52fda1e7 ]--- Oct 27 11:52:50 helios4 kernel: [ 333.778923] BTRFS: error (device md0) in btrfs_commit_transaction:2327: errno=-5 IO failure (Error while writing out transaction) Oct 27 11:52:50 helios4 kernel: [ 333.778931] BTRFS info (device md0): forced readonly Oct 27 11:52:50 helios4 kernel: [ 333.778935] BTRFS warning (device md0): Skipping commit of aborted transaction. Oct 27 11:52:50 helios4 kernel: [ 333.778939] BTRFS: error (device md0) in cleanup_transaction:1898: errno=-5 IO failure I also tried to do armbianmonitor -v: Starting package integrity check. This might take some time. Be patient please... It appears you may have corrupt packages. This is usually a symptom of filesystem corruption caused by SD cards or eMMC dying or burning the OS image to the installation media went wrong. The following changes from packaged state files were detected: /usr/share/initramfs-tools/hooks/mdadm IDK if this is related. Output of armbianmonitor -u http://ix.io/2CbK
Igor Posted October 27, 2020 Posted October 27, 2020 41 minutes ago, multi.flexi said: IDK if this is related. https://blog.kobol.io/2020/10/27/helios64-software-issue/
gprovost Posted October 28, 2020 Author Posted October 28, 2020 15 hours ago, Igor said: https://blog.kobol.io/2020/10/27/helios64-software-issue/ This blog post / issue report only applies to Helios64, not Helios4. 16 hours ago, multi.flexi said: after update, I cannot copy files to SMB server running on Helios4. I get these kernel errors. Maybe corruption as you point out, or BTRFS bug on latest kernel... hard to say. You can revert to previous kernel using armbian-config > System > Other 1
Recommended Posts