Jump to content

Helios4 Support


gprovost

Recommended Posts

Hi,
as of my problem with Helios4 rebooting on heavy NFS load I couldn't make any progress, even worse it even stopped booting. Disabling watchdog did not help as serial also freezes and I was unable to do anything, there was no even blinking LED as I remember. That forced me to use another SD card with fresh official Debian 10 installation image and configure it once again.

There is only basic configuration, only SSSD for LDAP and NFS server, no docker and no http services but NAS runs rock solid for 6 days now even it's some cheep 4GB uSD card.

So I still don't know if it's previous Samsung Evo 32GB sd card, or specific configuration, or some issues related to OS upgrades but as there is no spontaneous reboots any more I'm happy that it's not electronics ailing. Hope my new Helios64 will also work even more stable :)

If I manage to debug more on my ex problem I'll write, but I treat it as super low important from now.

 

My advice for people with spontaneous reboots on Helios4: try to use another SD card with fresh OS, do minimal configuration and try to stress it for few days.

 

Link to comment
Share on other sites

On 9/5/2020 at 9:38 PM, Werner said:

If they have the have the exact same ratings in voltage and current I'd say (and assumed you are a bit familiar with the basics) pickup a couple fitting connectors and solder them to the wires.

Ah never mind. 19.5 volts vs 12 volts.

Link to comment
Share on other sites

On 9/1/2020 at 7:22 AM, gprovost said:

@fri.K That's very interesting info and not the first time NFS has been pointed as a possible root cause :

 

Have you tried some different NFS block sizes (wsize and rsize) settings on your client side ? or maybe reduce the number of NFS daemon ?

 

Because most probably under NFS load with default settings the system reach a unresponsive state and the hw watchdog kicks in and reset the system.

Check if watchdog service is running (systemctl status watchdog.service), if yes then you could also disable it to remove it from the equation during your troubleshooting.

 

 

 

@gprovost FYI: I've had zero issues past 6 months after starting umounting MacOS NFS mounts when no longer using them. When I forget to do this (which is often) & Mac goes to sleep a few times there will be tens of NFS daemons running on the Helios/Debian server.  I do believe there is an issue related to NFS between MacOS & Debian.  Issuing 'netstat | grep nfs' every now and then might help to confirm this.

Link to comment
Share on other sites

Hi I wondered if anyone could please help me out. I have two Helios4 NAS drives and one seems to have blown and won't work. The other is fine. I've emailed Kobol too just to see if they can help but I hoped that maybe someone here may have a spare main board they can sell me or offer advice on how I can have the unit repaired?

 

Thanks if anyone can please help. I'll be happy to buy your old unit etc if you're upgrading to the new Helios NAS or such anyway... Or if there is a way I can test where it is blown and replace the components with someone's help?? Thanks again

Link to comment
Share on other sites

@soydemadrid By any chance you have a voltmeter ? Could you measure DC voltage on molex power connector shown on photo below?

 

image.png.651dac9f4ef4a7ce2df4ada3799534c1.png

 

Expected measured value,

on 5V rail: 4.90 V - 5.20 V

on 12V rail: 11.90 V - 12.5 V

 

If 12V is outside that range, that means the power supply is faulty.

Link to comment
Share on other sites

7 hours ago, gprovost said:

@soydemadrid By any chance you have a voltmeter ? Could you measure DC voltage on molex power connector shown on photo below?

 

image.png.651dac9f4ef4a7ce2df4ada3799534c1.png

 

Expected measured value,

on 5V rail: 4.90 V - 5.20 V

on 12V rail: 11.90 V - 12.5 V

 

If 12V is outside that range, that means the power supply is faulty.

 

 

Hi thanks for helping. I've checked with a Multimeter and I'm only getting 0.3v or something very low on those pins. Basically no voltage at all - the PSU is ok though as it works with my other Helios4 no problem. So it does seem to be a dead Helios4 board if there is any way I can isolate the issue and fix it? Thanks again for any help

Link to comment
Share on other sites

@gprovost I rebuilt my helios4 with a clean buster build on another sd card and configured the dockers,services, etc. 

 

 

It seems to be a running better now. I would say to do not do a upgrade from omv4 to 5 even thou it is doable. 

 

The watchdog service will reboot a hardware watchdog for orion because it will crash somehow.

 

My suggestion is for upgrade process is get another sd card and rebuilt using the old configs for the original sd card. 

 

Thanks

Link to comment
Share on other sites

21 hours ago, soydemadrid said:

Hi thanks for helping. I've checked with a Multimeter and I'm only getting 0.3v or something very low on those pins. Basically no voltage at all - the PSU is ok though as it works with my other Helios4 no problem. So it does seem to be a dead Helios4 board if there is any way I can isolate the issue and fix it? Thanks again for any help

 

Humm if you still get 0V on 12V molex but you are sure that the PSU is ok, then it could the 0-ohm resistors R106 or R107 that are dead, maybe due to a power surge. Can you do a continuity test with your multi-meter on those 2 resistor (highlighted in yellow below) ? If you confirm R106 and R107 doesn't let juice go through then you can bypass them by doing 2x dirty solder bridges as shown with the 2 red lines below.

 

image.png.8256bedd75ce381f62c9a0b896ec426f.png

 

Link to comment
Share on other sites

The sdcard in my Helios4 got corrupted on a recent upgrade, and I'm a few versions behind so I decided to get the latest (Armbian 20.08), write it to the card and move stuff over as needed.

 

Unfortunately, I found that due to a combination of factors, that networking is totally bunged up due to a choice that doesn't make a lot of sense for something that is generally supposed to be a server in a static place.

 

I was trying to use NetworkManager to set a static IP, and went through a guide out there to do that since it's not something I do every day. I get to "nmcli con reload" to see if it will take the new settings, but no go. Fine, I'll reboot.

 

I found to my great surprise that the MAC address had changed. In the boot messages on the serial console was:

 

Warning: ethernet@70000 (eth1) using random MAC address - 0e:95:58:4b:71:2b
eth1: ethernet@70000

 

What!!!! 

 

So, for now, I've disabled NetworkManager and decided to go with /etc/network/interfaces. However, this will impact network monitoring. I feel like randomizing the MAC address is inappropriate for this application (though admittedly, not a bad idea for most boards that run armbian). Is there a setting in a file in /boot I can change to prevent randomization of the MAC address?

 

Thank you for your help.

Edited by jimbolaya
Figured out what was going on.
Link to comment
Share on other sites

On 10/6/2020 at 4:08 AM, gprovost said:

 

Humm if you still get 0V on 12V molex but you are sure that the PSU is ok, then it could the 0-ohm resistors R106 or R107 that are dead, maybe due to a power surge. Can you do a continuity test with your multi-meter on those 2 resistor (highlighted in yellow below) ? If you confirm R106 and R107 doesn't let juice go through then you can bypass them by doing 2x dirty solder bridges as shown with the 2 red lines below.

 

image.png.8256bedd75ce381f62c9a0b896ec426f.png

 

Seems like my Helios4 may have the same issue, I've at least diagnosed it to the point that I know it's a power issue, If I mess with the cables, I can get 2 drives to boot, but the other 2 either wont show or if i get it going it seems like the system itself wont boot up. I am currently looking for my multimeter to test this part out. I don't currently own a soldering iron but can quickly get one, Any particular Solder that you recommend for the Dirty Solder?

Link to comment
Share on other sites

8 hours ago, IcerJo said:

Seems like my Helios4 may have the same issue, I've at least diagnosed it to the point that I know it's a power issue, If I mess with the cables, I can get 2 drives to boot, but the other 2 either wont show or if i get it going it seems like the system itself wont boot up. I am currently looking for my multimeter to test this part out. I don't currently own a soldering iron but can quickly get one, Any particular Solder that you recommend for the Dirty Solder?

 

Before going any further, yes you need a multi-meter to narrow down the issue which is probably just a faulty PSU and need to be replaced.

Link to comment
Share on other sites

21 hours ago, gprovost said:

 

Before going any further, yes you need a multi-meter to narrow down the issue which is probably just a faulty PSU and need to be replaced.

I Have ordered a New Power Supply and will test the system with it when it comes in. It was kinda hard to find a 4pin with 12v 8 amps, I'm still searching for my multimeter but can't seem to find it. I do appreciate your guidance and support! If the new Power Supply has the same issue which (which I doubt), I will then order a new Multimeter and do some testing when I get back home on Sunday.

 

Thanks Again!

Link to comment
Share on other sites

I just pre- emptively changed out my PSU. It's been running for over a year (I setup the helios kind of late) and I didn't want to take any risks. I suggest anyone who hasn't already changed out their psu at this point to do so.

 

Got the Taifu one as well. It works great and come with a decent warrant as well.

Link to comment
Share on other sites

LEDs no more working

 

On Helios 4 I'm using Armbian Buster 20.08.13 (5.8.16-mvebu) and OMV 5.5.12 Usul. Shortly after the update to 20.08.13 I was notified about updates to 20.08.14 which were installed smoothly but "uname -a" still shows 20.08.13. Since that time the LEDs are not working anymore as you can see them normally blinking periodically (LED1) and on SATA activity (LED3 - LED6).

 

Because http://dl.armbian.com/helios4/ doesn't mention 20.08.14 I fear that I installed updates, possibly later withdrawn (?).

 

 

Link to comment
Share on other sites

3 hours ago, FredK said:

doesn't mention 20.08.14 I fear that I installed updates, possibly later withdrawn (?).


That means some other kernel was updated or just board support package of this or some other board. Armbian support lots of different hardware https://www.armbian.com/download/ and each of them can have / have their own problems.

Link to comment
Share on other sites

Re: potential NFS related MacOS issues, just for info:

I've now had my Mac on all weekend,  going to sleep maybe ten times: there are now 425 NFS connections to my Helios 4 / Debian Buster server, all allocated in the  'well known ports' (0-1023)  area. 

The impact on memory is minimal (20-30MB),  but I wonder what happens when the below 1024 port area is exhausted?  Instead  of finding out, I'll umount the three mounts...

Link to comment
Share on other sites

On 10/22/2020 at 9:21 PM, gprovost said:

Euhhh wait, can you share the link of the PSU you buy because it's not always guaranty it's the same pin out (unless it's PSU for synology).

 

Here a good replacement https://www.amazon.com/TAIFU-4-Pin-12V-8-33A-Replacement/dp/B07NCG1P8X

Thank you! I canceled delivery of the psu I ordered and then ordered the one you linked, It has arrived today and the issue is now resolved! Thank you for saving me from making a grave mistake!

Link to comment
Share on other sites

On 10/24/2020 at 12:07 PM, FredK said:

LEDs no more working

 

On Helios 4 I'm using Armbian Buster 20.08.13 (5.8.16-mvebu) and OMV 5.5.12 Usul. Shortly after the update to 20.08.13 I was notified about updates to 20.08.14 which were installed smoothly but "uname -a" still shows 20.08.13. Since that time the LEDs are not working anymore as you can see them normally blinking periodically (LED1) and on SATA activity (LED3 - LED6).

 

Because http://dl.armbian.com/helios4/ doesn't mention 20.08.14 I fear that I installed updates, possibly later withdrawn (?).

 

 

 

Hi FredK,

 

I confirmed your problem. There seems to be some issue on the pin assignment of the LEDs. Can you tell us which kernel version you are using? (output of uname -a and/or armbianmonitor -u) 
As a quick fix you can go back to 20.08.13 or try to just downgrade the kernel via armbian-config.

 

Regards,

Heisath

Link to comment
Share on other sites

vor 2 Stunden schrieb Heisath:

 

Hi FredK,

 

I confirmed your problem. There seems to be some issue on the pin assignment of the LEDs. Can you tell us which kernel version you are using? (output of uname -a and/or armbianmonitor -u) 
As a quick fix you can go back to 20.08.13 or try to just downgrade the kernel via armbian-config.

 

Regards,

Heisath

Hi Heisath,

 

thank you for your post.

 

There are slightly contradictory informations. The welcome message says "Welcome to Armbian 20.08.17 Buster with Linux 5.8.16-mvebu".

"uname -a" says "Linux helios4 5.8.16-mvebu #20.08.13 SMP Mon Oct 19 08:14:25 CEST 2020 armv7l GNU/Linux".

i.e. 20.08.17 <-> 20.08.13

 

Using armbian-config to re-install "linux-image-dev-mvebu=20.08.13 5.8.16-mvebu" resulted in an unbootable system. I had to restore the last backup (SDcard). Now I'm back in business, but as before no LED support.

 

Regards

FredK

 

 

Link to comment
Share on other sites

Hi FredK,

 

The contradicting armbian version numbers are thing open to discussion. Basically armbian uses many packages and not all of them get updated everytime -> different version number exist. This is synced at every major release.

 

The fact alone that the linux kernel 5.8 was released with 20.08.xx was a mistake on armbian side. Sorry for the problems we caused. If you can live with the LEDs not working a fix will be available at the latest for the 20.11 release (end of November).

If you need the LEDs working now you can switch to kernel version 5.4.xx (via armbian-config, pick it by the kernel version on the right. Not the armbian version) there everything should work. ( I tested and can confirm the bug is not yet present in LK5.4.69).

 

As said the release of 5.8 on 20.08.xx was a mistake. Sorry.

 

Heisath

Link to comment
Share on other sites

vor 5 Minuten schrieb Heisath:

The fact alone that the linux kernel 5.8 was released with 20.08.xx was a mistake on armbian side. Sorry for the problems we caused. If you can live with the LEDs not working a fix will be available at the latest for the 20.11 release (end of November).

Hi Heisath,

 

thank you for the clarifications.

Sure, I can live with the LED problem. My original post was meant to be used as an indication that something must have gone wrong in the last few updates. Let's wait for 20.11.

 

FredK

Link to comment
Share on other sites

31 minutes ago, FredK said:

that something must have gone wrong in the last few updates


The problem behind is sadly not "from yesterday" but is systematical, briefly described in https://armbian.atlassian.net/browse/AR-492 This upgrade should wait, but that would be an exception in a bugfix upgrade. Which has to be done manually and when here we have to rely on people (my fault), things easily go south.

 

Thank you for your patience and sorry that you had troubles with.

Link to comment
Share on other sites

vor 49 Minuten schrieb Igor:


The problem behind is sadly not "from yesterday" but is systematical, briefly described in https://armbian.atlassian.net/browse/AR-492 This upgrade should wait, but that would be an exception in a bugfix upgrade. Which has to be done manually and when here we have to rely on people (my fault), things easily go south.

 

Thank you for your patience and sorry that you had troubles with.

I appreciate the work and support regarding armbian.

I'm not an advocate of "Never touch a running system!", rather the opposite. I want to learn and to gain experience. Then it might happen that I get into minor trouble sometimes. But I learn most in such situations.

 

FredK

Link to comment
Share on other sites

Hi,

after update, I cannot copy files to SMB server running on Helios4. I get these kernel errors.

 

Oct 27 11:52:50 helios4 kernel: [  333.770181] ------------[ cut here ]------------
Oct 27 11:52:50 helios4 kernel: [  333.770198] WARNING: CPU: 0 PID: 2101 at fs/btrfs/disk-io.c:516 btree_csum_one_bio+0x167/0x190
Oct 27 11:52:50 helios4 kernel: [  333.770199] Modules linked in: softdog rfkill zstd zram zsmalloc orion_wdt at24 cpufreq_dt pwm_fan lm75 marvell_cesa libdes ip_tables x_tables autofs4 raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod
Oct 27 11:52:50 helios4 kernel: [  333.770229] CPU: 0 PID: 2101 Comm: btrfs-transacti Tainted: G        W         5.8.16-mvebu #20.08.13
Oct 27 11:52:50 helios4 kernel: [  333.770231] Hardware name: Marvell Armada 380/385 (Device Tree)
Oct 27 11:52:50 helios4 kernel: [  333.770244] [<c010c88d>] (unwind_backtrace) from [<c0108caf>] (show_stack+0xb/0xc)
Oct 27 11:52:50 helios4 kernel: [  333.770255] [<c0108caf>] (show_stack) from [<c04f11e9>] (dump_stack+0x75/0x84)
Oct 27 11:52:50 helios4 kernel: [  333.770264] [<c04f11e9>] (dump_stack) from [<c011b91b>] (__warn+0x8b/0x9c)
Oct 27 11:52:50 helios4 kernel: [  333.770271] [<c011b91b>] (__warn) from [<c011b96d>] (warn_slowpath_fmt+0x41/0x7c)
Oct 27 11:52:50 helios4 kernel: [  333.770277] [<c011b96d>] (warn_slowpath_fmt) from [<c03563cf>] (btree_csum_one_bio+0x167/0x190)
Oct 27 11:52:50 helios4 kernel: [  333.770283] [<c03563cf>] (btree_csum_one_bio) from [<c03569a9>] (btree_submit_bio_hook+0x81/0x90)
Oct 27 11:52:50 helios4 kernel: [  333.770293] [<c03569a9>] (btree_submit_bio_hook) from [<c03783b1>] (submit_one_bio+0x19/0x30)
Oct 27 11:52:50 helios4 kernel: [  333.770299] [<c03783b1>] (submit_one_bio) from [<c037bb4b>] (submit_extent_page+0xcb/0x16c)
Oct 27 11:52:50 helios4 kernel: [  333.770304] [<c037bb4b>] (submit_extent_page) from [<c037f9c3>] (write_one_eb+0x12f/0x204)
Oct 27 11:52:50 helios4 kernel: [  333.770309] [<c037f9c3>] (write_one_eb) from [<c037fbd3>] (btree_write_cache_pages+0x13b/0x29c)
Oct 27 11:52:50 helios4 kernel: [  333.770319] [<c037fbd3>] (btree_write_cache_pages) from [<c01d8daf>] (do_writepages+0x2f/0x94)
Oct 27 11:52:50 helios4 kernel: [  333.770326] [<c01d8daf>] (do_writepages) from [<c01d1dfd>] (__filemap_fdatawrite_range+0xa1/0xc8)
Oct 27 11:52:50 helios4 kernel: [  333.770331] [<c01d1dfd>] (__filemap_fdatawrite_range) from [<c01d1e79>] (filemap_fdatawrite_range+0x15/0x18)
Oct 27 11:52:50 helios4 kernel: [  333.770338] [<c01d1e79>] (filemap_fdatawrite_range) from [<c035bb8f>] (btrfs_write_marked_extents+0x77/0x144)
Oct 27 11:52:50 helios4 kernel: [  333.770344] [<c035bb8f>] (btrfs_write_marked_extents) from [<c035bc89>] (btrfs_write_and_wait_transaction+0x2d/0x88)
Oct 27 11:52:50 helios4 kernel: [  333.770349] [<c035bc89>] (btrfs_write_and_wait_transaction) from [<c035d14f>] (btrfs_commit_transaction+0x5fb/0x934)
Oct 27 11:52:50 helios4 kernel: [  333.770355] [<c035d14f>] (btrfs_commit_transaction) from [<c03593f5>] (transaction_kthread+0x125/0x160)
Oct 27 11:52:50 helios4 kernel: [  333.770362] [<c03593f5>] (transaction_kthread) from [<c0132a27>] (kthread+0xe7/0x108)
Oct 27 11:52:50 helios4 kernel: [  333.770368] [<c0132a27>] (kthread) from [<c0100159>] (ret_from_fork+0x11/0x38)
Oct 27 11:52:50 helios4 kernel: [  333.770370] Exception stack(0xdab5ffb0 to 0xdab5fff8)
Oct 27 11:52:50 helios4 kernel: [  333.770374] ffa0:                                     00000000 00000000 00000000 00000000
Oct 27 11:52:50 helios4 kernel: [  333.770378] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Oct 27 11:52:50 helios4 kernel: [  333.770382] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
Oct 27 11:52:50 helios4 kernel: [  333.770384] ---[ end trace 1623043a52fda1e7 ]---
Oct 27 11:52:50 helios4 kernel: [  333.778923] BTRFS: error (device md0) in btrfs_commit_transaction:2327: errno=-5 IO failure (Error while writing out transaction)
Oct 27 11:52:50 helios4 kernel: [  333.778931] BTRFS info (device md0): forced readonly
Oct 27 11:52:50 helios4 kernel: [  333.778935] BTRFS warning (device md0): Skipping commit of aborted transaction.
Oct 27 11:52:50 helios4 kernel: [  333.778939] BTRFS: error (device md0) in cleanup_transaction:1898: errno=-5 IO failure

I also tried to do armbianmonitor -v:

Starting package integrity check. This might take some time. Be patient please...

It appears you may have corrupt packages.

This is usually a symptom of filesystem corruption caused by SD cards or eMMC
dying or burning the OS image to the installation media went wrong.

The following changes from packaged state files were detected:

/usr/share/initramfs-tools/hooks/mdadm

IDK if this is related.

Output of armbianmonitor -u http://ix.io/2CbK

Link to comment
Share on other sites

15 hours ago, Igor said:

 

This blog post / issue report only applies to Helios64, not Helios4.

 

16 hours ago, multi.flexi said:

after update, I cannot copy files to SMB server running on Helios4. I get these kernel errors.

 

Maybe corruption as you point out, or BTRFS bug on latest kernel... hard to say.

You can revert to previous kernel using armbian-config > System > Other

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines