12 12
gprovost

Helios4 Support

Recommended Posts

18 minutes ago, Victor_Williamson said:

Completed, http://ix.io/VFR

 

I have removed the fan as a temp measure and been monitoring the temp which site around 44'c currently.


You are using one of the first test build made by Helios team. Fan driver, which bothers you, was changed and you might be missing some scripts which are not part of the update process. Perhaps @gprovost could provide you details hints how to solve this but it looks something went wrong also with a kernel image update or you made some manual changes to the system?

ii  linux-dtb-mvebu                5.41                             armhf        Linux DTB, version 4.4.115-mvebu
ii  linux-image-mvebu              5.37                             armhf        Linux kernel, version 4.4.107-mvebu

If you can, I would suggest starting with the latest official armbian.

Share this post


Link to post
Share on other sites
44 minutes ago, Igor said:


You are using one of the first test build made by Helios team. Fan driver, which bothers you, was changed and you might be missing some scripts which are not part of the update process. Perhaps @gprovost could provide you details hints how to solve this but it looks something went wrong also with a kernel image update or you made some manual changes to the system?


ii  linux-dtb-mvebu                5.41                             armhf        Linux DTB, version 4.4.115-mvebu
ii  linux-image-mvebu              5.37                             armhf        Linux kernel, version 4.4.107-mvebu

If you can, I would suggest starting with the latest official armbian.

I could do a rebuild as a last resort as I have nearly 8TB of data to move off it beforehand.. Is there an easy backup of the raid config I could perform or will the fresh rebuild pick up the raid from the 4 physical disk?

 

Many thanks

Share this post


Link to post
Share on other sites
23 hours ago, Victor_Williamson said:

I could do a rebuild as a last resort as I have nearly 8TB of data to move off it beforehand.. Is there an easy backup of the raid config I could perform or will the fresh rebuild pick up the raid from the 4 physical disk?

 

Many thanks

Cheers for the help...  I poked around and eventually just blew it away and downloaded the latest build and all is well again. Backed up my fstab,exports and interfaces file and copied them to the new SD card.

After a reboot everything is back to full health.

Share this post


Link to post
Share on other sites

I have an odd situation which I can not explain. I installed the Debian 9 image (7/02/2018). I am now running Linux 4.14.20 as expected. After running apt-get update I see an update for the linux-image-next-mvebu package with the same package version (5.41) which would install Linux 4.14.18. Can somebody please explain what is going on?

 

apt-cache show linux-image-next-mvebu

Package: linux-image-next-mvebu
Priority: optional
Section: kernel
Installed-Size: 34496
Maintainer: Igor Pecovnik <igor.pecovnik@****l.com>
Architecture: armhf
Source: linux-4.14.18-mvebu
Version: 5.41
Filename: pool/main/l/linux-4.14.18-mvebu/linux-image-next-mvebu_5.41_armhf.deb
Size: 13659400
MD5sum: 1df84709f1c0fd9a7ca1c49233de5732
SHA1: baeca37e804415833fcbad31c94f8aa3111f87b5
SHA256: abe795285fe66e6d1aec7cdc4cf7f37e9ab8f13f7bc5a3b560f2ca0f401b5f41
SHA512: f6cd83c98f5ed5da26766f1d613873b50b709772a0184e34f76a181bbe9fc6870b704c1a88d745ad25f265af16d1fcd9fedff780a95c1231ec7e3f708c3dfab4
Description: Linux kernel, version 4.14.18-mvebu
 This package contains the Linux kernel, modules and corresponding other
 files, version: 4.14.18-mvebu.
Description-md5: 881a2bf41fdf3457001594d1ab3a7d0c
Homepage: http://www.kernel.org/

Package: linux-image-next-mvebu
Status: install ok installed
Priority: optional
Section: kernel
Installed-Size: 34499
Maintainer: Igor Pecovnik <igor.pecovnik@****l.com>
Architecture: armhf
Source: linux-4.14.20-mvebu
Version: 5.41
Description: Linux kernel, version 4.14.20-mvebu
 This package contains the Linux kernel, modules and corresponding other
 files, version: 4.14.20-mvebu.
Description-md5: 1e04fc0ef8ae54ea7cd36a95440d0417
Homepage: http://www.kernel.org/

 

Share this post


Link to post
Share on other sites

Looks like the release on the kobol wiki (https://cdn.kobol.io/files/Helios4_Debian_Stretch_4.14.20.img.xz) was built against the armbian development version (https://github.com/armbian/build/blob/development/config/kernel/linux-mvebu-next.config)!? The official armbian kernel for the Helios4 seems to be https://apt.armbian.com/pool/main/l/linux-4.14.18-mvebu/

5.41 seems to be the armbian version number, does this mean a newer kernel can only be released if the armbian version gets bumped?

Share this post


Link to post
Share on other sites
1 hour ago, kratz00 said:

5.41 seems to be the armbian version number, does this mean a newer kernel can only be released if the armbian version gets bumped?


Yes. But if both are the same version it should not upgrade unless forced or if you switch kernels between default and stable. In this case, you only get what is in the repository.

 

You can also switch to the beta repository, which should be latest 4.14. ... but you lost "warranty" :) 

Share this post


Link to post
Share on other sites

In my case it would be a downgrade from linux-4.14.20-mvebu to linux-4.14.18-mvebu. I am not forcing or switching anything, I was just running 'sudo apt-get upgrade'. I do not know the inner workings of apt, but to me it is strange why it tries to install a package with the same exact version (5.41) again.

What is also a little confusing is that there are different armbian version for the Helios4.

Share this post


Link to post
Share on other sites
28 minutes ago, kratz00 said:

What is also a little confusing is that there are different armbian version for the Helios4.


The official version is only one. This case here should not happen but it is also not critical. I already talked to Helios4 folks and an update will be rolled out as soon as I find the time (I guess not today since I am barely typing this). But this process is purposely not automatic. Someone has to test before pushing updated packages to the stable repository ... 

Share this post


Link to post
Share on other sites

What file systems are you all running? Btrfs/zfs? Encryption? Does it max out a 1gbit Ethernet on file transfers?

 

Sent from my ZTE A2017U using Tapatalk

 

 

 

Anyone have any output from a wattmeter on power usage?

Share this post


Link to post
Share on other sites

I got my helios4 a month back, but was waiting for my hard disks to arrive (only 2). I assembled the box as in the wiki, but found that board leds are not switched on and also the USB is not reachable via the UART.

Is there anything that I can look for, I will try to check the hards disks separately to see if they are fine. Is there a way to know if the board is working fine? OR the Power chord is working fine.

Also is there any specific way that I need to configure for a 2 disk only nas?

 

Thanks,

   Gururaj

Share this post


Link to post
Share on other sites
On 4/23/2018 at 1:09 PM, Gururaj said:

I got my helios4 a month back, but was waiting for my hard disks to arrive (only 2). I assembled the box as in the wiki, but found that board leds are not switched on and also the USB is not reachable via the UART.

Is there anything that I can look for, I will try to check the hards disks separately to see if they are fine. Is there a way to know if the board is working fine? OR the Power chord is working fine.

Also is there any specific way that I need to configure for a 2 disk only nas?

None of the LED are lighted-up ? Look at LED8 on the board, it indicates if the boards is powered-up. But first, is the LED on the AC/DC power brick is lighted-up ?

Share this post


Link to post
Share on other sites

None of the LED on the board is up.

The LED on the AC/DC power brick is not up. 

 

Let me try to find a replacement for the adapter in the local market. Please share any other online marketplace where I can buy the power brick.

 

regards,

   Gururaj

   

Share this post


Link to post
Share on other sites

I'm having a strange error. I have been enjoying OMV on my new Helios 4. After powering on the Helios one day I noticed I couldn't access it. I connected via the USB serial cable and noticed that eth0 was not showing a configured IP address. I had been using DHCP reservation from my router. I assigned a static IP to eth0. That static IP did show up however I'm unable to ping the gateway or anything else on my LAN. I can however ping the loopback address. I decided to wipe the image and I flashed the "Jessie_4.14.20-OMV_3.0.97" build again. No luck with the clean image.

 

Finally, in desperation (thinking maybe it was a hardware issue) I flashed the "Stretch_4.14.20" image and DHCP worked fine and I have fully connectivity. BTW I tried different ports on my managed switch, different patch cables just to eliminate these items. 

 

What am I doing wrong? I'm at a loss on this issue??

 

Thanks,

 

Jake

Share this post


Link to post
Share on other sites

Hello,

 

I'm having trouble with my Helios4 as well: After some random amount of time the system hangs. Sometimes after a few hours, sometimes after a day or two. I think the longest uptime I got was three days, with the unit idling most of the time.

 

What I mean by "the system hangs" is that I cannot access it in any way. I cannot access my network shares any more, I cannot SSH into it any more, the router doesn't even list it in the list of connected network devices any more. I also can't access it using the serial console, it just stays blank. The only activity I get from the box is the orange light next to the ethernet port, which blinks three or four times in rapid succession every three or four seconds or so.

 

This seems to happen more frequently with higher system load, but also when idling. I wrote a script logging the system temperature and fan speed to a file every two seconds. There are no sudden spikes or anything, the log files just end at some point, sometimes in the middle of writing out a number. There is also nothing in the kernel log, it just stops. I then have to power cycle the box to bring it back online.

 

This has caused me some data corruption already and makes the Helios4 almost unusable for me.

 

I'm running the latest official image:

ARMBIAN 5.41 user-built Debian GNU/Linux 9 (stretch) 4.14.18-mvebu

Linux helios4 4.14.18-mvebu #22 SMP Fri Feb 9 10:41:38 CET 2018 armv7l GNU/Linux

 

My system drive is a Samsung UHS-3 64GB Micro SDXC Memory Card. Could this be the issue? The OP says something about "SDcard High Speed timing have compatibility issue with some brands. Temporary fix : Disable UHS option/support." How can I disable UHS? Searching the internet and forum didn't turn up anything useful.

 

The temperatures reported by doing "cat /dev/thermal-cpu/temp1_input" seem to be very high. The CPU is idling at 75 to 80C and reaches 97C under load. The armbian MOTD reports an idle temp of about 60C, since they subtract 20C from the actual reading for some reason. Do I have a bad heat sink maybe? The box is placed with plenty of breathing room, there are only two HDDs in the top most slots and both fans are working hard.

 

Please help me out. I was really excited for the Helios4 and was very happy to finally receive my unit after all the delays. Now I'm starting to regret buying one...

 

Thenk you

 

Share this post


Link to post
Share on other sites
1 hour ago, nemo19 said:

I'm having trouble with my Helios4 as well


Can you try beta builds for a few days? armbian-config -> system -> switch to automated nightly builds. I am running it without issues but it's true that system is mostly idle. Current uptime is two days since I do frequent kernel updates.

Share this post


Link to post
Share on other sites

Thank you for the suggestion. I switched to the nightly yesterday, now I'm running

 

ARMBIAN 5.46.180604 nightly Debian GNU/Linux 9 (stretch) 4.14.47-mvebu

 

current uptime is 17 hours. The last few hours are at about 70% system load, temps:

 

Temp CPU   [C]: 94 (/dev/thermal-cpu/temp1_input)
Temp Board [C]: 37 (/dev/thermal-board/temp1_input)
Temp Ctrl  [C]: 57 (/dev/thermal-eth/temp1_input)

 

fans are spinning close to maximum. Do you see similar temperatures? To me they seem pretty high, more like the temperatures someone reported earlier when running the system without fans.

 

Do you have any suggestions regarding the SD card/UHS problem?

 

 

Share this post


Link to post
Share on other sites

Hi guys, sorry for the lack of follow-up lately, was completely caught up in another venture but now it is over. So the focus will come back 100% on Helios4 support and the second campaign.

 

@JakeK I think the problem you mentioned is something that we have encountered before at rare occasion and it seemed to be linked to the Marvell ethernet driver under Kernel 4.14. Could be a bring up sequence issue that makes the Ethernet PHY not probed properly by the SoC. Will investigate and revert to you.

 

@nemo19 It's true that your CPU temps seems a quite high. Meanwhile the SoC die is designed to operate without issue up to 115 degres. Couple of things you could do to help troubleshoot along with what Igor suggested.

  1. Can you check the thermal pad is well positioned between the SoC and the heatsink. For that you will need to unscrew the heatsink.
  2. Can you explain why the load is constantly 70% ? Is it because of RAID resync or something else ?
  3. Any chance you can keep a console open on the serial until the crash you described occurs again ?
  4. Can you share your /var/log/messages and /var/log/syslog history, unless you wipe everything already ?

 

 

 

 

 

 

Share this post


Link to post
Share on other sites

Hello @gprovost, good to hear you're back!

 

1. I unscrewed the heat sink, there is some thermal paste between the sink and the base board, but no thermal pad on top of the CPU or any of the chips (see the attached images). I might have some thermal paste lying around somewhere, should I just put some on the ICs?

2. The load was at around 70% due to Syncthing hashing my media files for off-site synchronization. With the latest release kernel Syncthing would be using both cores for hashing resulting in 100% CPU utilization. With the nightly kernel it is using only one core according to htop. Those 50% plus some background tasks result in the 70% load.

3. Using the nightly kernel the system was stable for 1 day and 15 hours until I shut it down to unscrew the heatsink. About half that time idle, half that time with the described load. Once I put it back together I can stress it and try to trigger a crash while connected via serial.

4. I forgot to pull the files before shutting down, I will post them later on.

 

So, what should I do about the heatsink?

 

 

IMG_20180608_074501.jpg

IMG_20180608_074433.jpg

Share this post


Link to post
Share on other sites

@nemo19 Ok no need to look any further, you found the issue. There should be a thermal pad between the CPU and the Heatsink as shown below. Without the thermal pad no proper heat transfer can happen, therefore the CPU might have reached above Maximum Junction Temperature (115C) resulting by it to get unstable and crash. I'm really sorry about this missing thermal pad, this should definitively not have happened, I will report / complain to the company that handled the board assembly for us.

 

FYI the thermal pad dimension we are using is 20x20x1mm.

 

Please provide me by private message your complete shipping address. I will send you this missing thermal pad. In the meantime you can try using thermal paste, even though the gap between CPU and Heatsink is a bit too big for thermal paste.

 

IMG_2471.JPG.36830f0dddc05d2fdfdea03008c8b206.JPG

Share this post


Link to post
Share on other sites

Thank you, I'll send the PM right away.

 

I was kind of expecting something like this but was worried that removing the heatsink and investigating would void some kind of warranty ;) Since the temperatures seemed off I logged them, which never showed anything above 98C, thus still well below the 115C threshold. So I thought there might be another issue.

 

Could there be some permanent damage to the board or SoC due to running without proper cooling for several weeks?

Share this post


Link to post
Share on other sites
18 minutes ago, nemo19 said:

I was kind of expecting something like this but was worried that removing the heatsink and investigating would void some kind of warranty ;) Since the temperatures seemed off I logged them, which never showed anything above 98C, thus still well below the 115C threshold. So I thought there might be another issue. 

 

It could be a quick temperature rise. Plus since the heat need to dissipate somewhere, if no heatsink, it will transfer to other components which might create other side effects. Hard to say exactly, but I'm certain the root cause here is the lack of heat dissipation.

 

31 minutes ago, nemo19 said:

Could there be some permanent damage to the board or SoC due to running without proper cooling for several weeks? 

 

I'm confident everything will be back to normal once you have proper heat transfer. In any case you keep me updated and if something is still not ok then we find a solution ;-)

Share this post


Link to post
Share on other sites

I received and mounted the thermal pad from SolidRun on Tuesday. My Helios4 now runs cooler and quieter under maximum load than when idling before. It's been running for 1 day 16 hours at 70% to 100% cpu load hashing files, staying below 80C and without crashing.

 

If it keeps going like this I'll be very happy. Thank you for the great board and the great service!

Share this post


Link to post
Share on other sites
On 6/14/2018 at 9:12 PM, igor.bernstein said:

Hello, I just wanted to check in about suspend to ram and WOL. Are there any plans to fix it? 

 

Yes it's still in the pipe but not the highest priority. It is not so trivial because of some decisions that was made quite a while ago in kernel for other A388 implementations.

Share this post


Link to post
Share on other sites
(edited)

Hey I pulled open my kit today excited to assemble. To my disappointment one of the fans is broken (fins are snapped).
Does anyone have a link where I can purchase a replacement pwm fan? I've taken a look on aliexpess and ebay but can find an exact mate.
This seems a close match.

EC21 has the exact model I believe but I have never ordered from them.

Any advice would be great.

Regards
G

 

... just an update.

I had a quick email reply from kobal. Ordered some new fans, fingers crossed :)

 

Edited by iNTiGOD

Share this post


Link to post
Share on other sites
On 6/23/2018 at 12:01 PM, iNTiGOD said:

... just an update.

I had a quick email reply from kobal. Ordered some new fans, fingers crossed :)

Let me know once you received and replaced the damage fan ;-)

Share this post


Link to post
Share on other sites

Hi gprovost,

 

Fans have been replaced. Finally got this baby setup (although it was a little trickier for me as I dont have a windows/linux machine handy, just my mac).

A couple quick questions.... I'm using OMV v3 but have noticed none of the performance graphs (under system information) are generating graphs. Also the fans seem to be running at full speed. Is there a way to fix this (I tried the steps earlier in the thread to no avail).

 

Cheers

Share this post


Link to post
Share on other sites
2 hours ago, iNTiGOD said:

I'm using OMV v3 but have noticed none of the performance graphs (under system information) are generating graphs

 

This is by design on all ARM boards for the simple reason that activating monitoring results in permanent writes to the rootfs and if this is on flash media (SD card) Write Amplification is huge and the card will die way earlier.

 

So if you want nice looking (but pretty useless) graphs simply activate monitoring and be prepared for your SD card failing early. Or move the rootfs to other storage (which has other downsides, eg. a HDD not spinning down any more when idle)

 

I tried to explain the problem wrt Write Amplification already:

 

Share this post


Link to post
Share on other sites
51 minutes ago, tkaiser said:

 

This is by design on all ARM boards for the simple reason that activating monitoring results in permanent writes to the rootfs and if this is on flash media (SD card) Write Amplification is huge and the card will die way earlier.

 

So if you want nice looking (but pretty useless) graphs simply activate monitoring and be prepared for your SD card failing early. Or move the rootfs to other storage (which has other downsides, eg. a HDD not spinning down any more when idle)

 

I tried to explain the problem wrt Write Amplification already:

 

Ok thanks for letting me know. This is the first time I’ve used omv with a single board setup. I did notice the flash memory plugin enabled but didn’t put the 2 together :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
12 12