Victor_Williamson Posted March 11, 2018 Posted March 11, 2018 Completed, http://ix.io/VFR I have removed the fan as a temp measure and been monitoring the temp which site around 44'c currently.
Igor Posted March 11, 2018 Posted March 11, 2018 18 minutes ago, Victor_Williamson said: Completed, http://ix.io/VFR I have removed the fan as a temp measure and been monitoring the temp which site around 44'c currently. You are using one of the first test build made by Helios team. Fan driver, which bothers you, was changed and you might be missing some scripts which are not part of the update process. Perhaps @gprovost could provide you details hints how to solve this but it looks something went wrong also with a kernel image update or you made some manual changes to the system? ii linux-dtb-mvebu 5.41 armhf Linux DTB, version 4.4.115-mvebu ii linux-image-mvebu 5.37 armhf Linux kernel, version 4.4.107-mvebu If you can, I would suggest starting with the latest official armbian.
Victor_Williamson Posted March 11, 2018 Posted March 11, 2018 44 minutes ago, Igor said: You are using one of the first test build made by Helios team. Fan driver, which bothers you, was changed and you might be missing some scripts which are not part of the update process. Perhaps @gprovost could provide you details hints how to solve this but it looks something went wrong also with a kernel image update or you made some manual changes to the system? ii linux-dtb-mvebu 5.41 armhf Linux DTB, version 4.4.115-mvebu ii linux-image-mvebu 5.37 armhf Linux kernel, version 4.4.107-mvebu If you can, I would suggest starting with the latest official armbian. I could do a rebuild as a last resort as I have nearly 8TB of data to move off it beforehand.. Is there an easy backup of the raid config I could perform or will the fresh rebuild pick up the raid from the 4 physical disk? Many thanks
Victor_Williamson Posted March 12, 2018 Posted March 12, 2018 23 hours ago, Victor_Williamson said: I could do a rebuild as a last resort as I have nearly 8TB of data to move off it beforehand.. Is there an easy backup of the raid config I could perform or will the fresh rebuild pick up the raid from the 4 physical disk? Many thanks Cheers for the help... I poked around and eventually just blew it away and downloaded the latest build and all is well again. Backed up my fstab,exports and interfaces file and copied them to the new SD card. After a reboot everything is back to full health.
kratz00 Posted March 14, 2018 Posted March 14, 2018 I have an odd situation which I can not explain. I installed the Debian 9 image (7/02/2018). I am now running Linux 4.14.20 as expected. After running apt-get update I see an update for the linux-image-next-mvebu package with the same package version (5.41) which would install Linux 4.14.18. Can somebody please explain what is going on? apt-cache show linux-image-next-mvebu Package: linux-image-next-mvebu Priority: optional Section: kernel Installed-Size: 34496 Maintainer: Igor Pecovnik <igor.pecovnik@****l.com> Architecture: armhf Source: linux-4.14.18-mvebu Version: 5.41 Filename: pool/main/l/linux-4.14.18-mvebu/linux-image-next-mvebu_5.41_armhf.deb Size: 13659400 MD5sum: 1df84709f1c0fd9a7ca1c49233de5732 SHA1: baeca37e804415833fcbad31c94f8aa3111f87b5 SHA256: abe795285fe66e6d1aec7cdc4cf7f37e9ab8f13f7bc5a3b560f2ca0f401b5f41 SHA512: f6cd83c98f5ed5da26766f1d613873b50b709772a0184e34f76a181bbe9fc6870b704c1a88d745ad25f265af16d1fcd9fedff780a95c1231ec7e3f708c3dfab4 Description: Linux kernel, version 4.14.18-mvebu This package contains the Linux kernel, modules and corresponding other files, version: 4.14.18-mvebu. Description-md5: 881a2bf41fdf3457001594d1ab3a7d0c Homepage: http://www.kernel.org/ Package: linux-image-next-mvebu Status: install ok installed Priority: optional Section: kernel Installed-Size: 34499 Maintainer: Igor Pecovnik <igor.pecovnik@****l.com> Architecture: armhf Source: linux-4.14.20-mvebu Version: 5.41 Description: Linux kernel, version 4.14.20-mvebu This package contains the Linux kernel, modules and corresponding other files, version: 4.14.20-mvebu. Description-md5: 1e04fc0ef8ae54ea7cd36a95440d0417 Homepage: http://www.kernel.org/
kratz00 Posted March 19, 2018 Posted March 19, 2018 Looks like the release on the kobol wiki (https://cdn.kobol.io/files/Helios4_Debian_Stretch_4.14.20.img.xz) was built against the armbian development version (https://github.com/armbian/build/blob/development/config/kernel/linux-mvebu-next.config)!? The official armbian kernel for the Helios4 seems to be https://apt.armbian.com/pool/main/l/linux-4.14.18-mvebu/ 5.41 seems to be the armbian version number, does this mean a newer kernel can only be released if the armbian version gets bumped?
Igor Posted March 19, 2018 Posted March 19, 2018 1 hour ago, kratz00 said: 5.41 seems to be the armbian version number, does this mean a newer kernel can only be released if the armbian version gets bumped? Yes. But if both are the same version it should not upgrade unless forced or if you switch kernels between default and stable. In this case, you only get what is in the repository. You can also switch to the beta repository, which should be latest 4.14. ... but you lost "warranty"
kratz00 Posted March 19, 2018 Posted March 19, 2018 In my case it would be a downgrade from linux-4.14.20-mvebu to linux-4.14.18-mvebu. I am not forcing or switching anything, I was just running 'sudo apt-get upgrade'. I do not know the inner workings of apt, but to me it is strange why it tries to install a package with the same exact version (5.41) again. What is also a little confusing is that there are different armbian version for the Helios4.
Igor Posted March 19, 2018 Posted March 19, 2018 28 minutes ago, kratz00 said: What is also a little confusing is that there are different armbian version for the Helios4. The official version is only one. This case here should not happen but it is also not critical. I already talked to Helios4 folks and an update will be rolled out as soon as I find the time (I guess not today since I am barely typing this). But this process is purposely not automatic. Someone has to test before pushing updated packages to the stable repository ...
jimandroidpc Posted April 5, 2018 Posted April 5, 2018 What file systems are you all running? Btrfs/zfs? Encryption? Does it max out a 1gbit Ethernet on file transfers? Sent from my ZTE A2017U using Tapatalk Anyone have any output from a wattmeter on power usage?
Gururaj Posted April 23, 2018 Posted April 23, 2018 I got my helios4 a month back, but was waiting for my hard disks to arrive (only 2). I assembled the box as in the wiki, but found that board leds are not switched on and also the USB is not reachable via the UART. Is there anything that I can look for, I will try to check the hards disks separately to see if they are fine. Is there a way to know if the board is working fine? OR the Power chord is working fine. Also is there any specific way that I need to configure for a 2 disk only nas? Thanks, Gururaj
gprovost Posted April 24, 2018 Author Posted April 24, 2018 On 4/23/2018 at 1:09 PM, Gururaj said: I got my helios4 a month back, but was waiting for my hard disks to arrive (only 2). I assembled the box as in the wiki, but found that board leds are not switched on and also the USB is not reachable via the UART. Is there anything that I can look for, I will try to check the hards disks separately to see if they are fine. Is there a way to know if the board is working fine? OR the Power chord is working fine. Also is there any specific way that I need to configure for a 2 disk only nas? None of the LED are lighted-up ? Look at LED8 on the board, it indicates if the boards is powered-up. But first, is the LED on the AC/DC power brick is lighted-up ?
Gururaj Posted April 26, 2018 Posted April 26, 2018 None of the LED on the board is up. The LED on the AC/DC power brick is not up. Let me try to find a replacement for the adapter in the local market. Please share any other online marketplace where I can buy the power brick. regards, Gururaj
JakeK Posted May 1, 2018 Posted May 1, 2018 I'm having a strange error. I have been enjoying OMV on my new Helios 4. After powering on the Helios one day I noticed I couldn't access it. I connected via the USB serial cable and noticed that eth0 was not showing a configured IP address. I had been using DHCP reservation from my router. I assigned a static IP to eth0. That static IP did show up however I'm unable to ping the gateway or anything else on my LAN. I can however ping the loopback address. I decided to wipe the image and I flashed the "Jessie_4.14.20-OMV_3.0.97" build again. No luck with the clean image. Finally, in desperation (thinking maybe it was a hardware issue) I flashed the "Stretch_4.14.20" image and DHCP worked fine and I have fully connectivity. BTW I tried different ports on my managed switch, different patch cables just to eliminate these items. What am I doing wrong? I'm at a loss on this issue?? Thanks, Jake
nemo19 Posted June 6, 2018 Posted June 6, 2018 Hello, I'm having trouble with my Helios4 as well: After some random amount of time the system hangs. Sometimes after a few hours, sometimes after a day or two. I think the longest uptime I got was three days, with the unit idling most of the time. What I mean by "the system hangs" is that I cannot access it in any way. I cannot access my network shares any more, I cannot SSH into it any more, the router doesn't even list it in the list of connected network devices any more. I also can't access it using the serial console, it just stays blank. The only activity I get from the box is the orange light next to the ethernet port, which blinks three or four times in rapid succession every three or four seconds or so. This seems to happen more frequently with higher system load, but also when idling. I wrote a script logging the system temperature and fan speed to a file every two seconds. There are no sudden spikes or anything, the log files just end at some point, sometimes in the middle of writing out a number. There is also nothing in the kernel log, it just stops. I then have to power cycle the box to bring it back online. This has caused me some data corruption already and makes the Helios4 almost unusable for me. I'm running the latest official image: ARMBIAN 5.41 user-built Debian GNU/Linux 9 (stretch) 4.14.18-mvebu Linux helios4 4.14.18-mvebu #22 SMP Fri Feb 9 10:41:38 CET 2018 armv7l GNU/Linux My system drive is a Samsung UHS-3 64GB Micro SDXC Memory Card. Could this be the issue? The OP says something about "SDcard High Speed timing have compatibility issue with some brands. Temporary fix : Disable UHS option/support." How can I disable UHS? Searching the internet and forum didn't turn up anything useful. The temperatures reported by doing "cat /dev/thermal-cpu/temp1_input" seem to be very high. The CPU is idling at 75 to 80C and reaches 97C under load. The armbian MOTD reports an idle temp of about 60C, since they subtract 20C from the actual reading for some reason. Do I have a bad heat sink maybe? The box is placed with plenty of breathing room, there are only two HDDs in the top most slots and both fans are working hard. Please help me out. I was really excited for the Helios4 and was very happy to finally receive my unit after all the delays. Now I'm starting to regret buying one... Thenk you
Igor Posted June 6, 2018 Posted June 6, 2018 1 hour ago, nemo19 said: I'm having trouble with my Helios4 as well Can you try beta builds for a few days? armbian-config -> system -> switch to automated nightly builds. I am running it without issues but it's true that system is mostly idle. Current uptime is two days since I do frequent kernel updates.
nemo19 Posted June 7, 2018 Posted June 7, 2018 Thank you for the suggestion. I switched to the nightly yesterday, now I'm running ARMBIAN 5.46.180604 nightly Debian GNU/Linux 9 (stretch) 4.14.47-mvebu current uptime is 17 hours. The last few hours are at about 70% system load, temps: Temp CPU [C]: 94 (/dev/thermal-cpu/temp1_input) Temp Board [C]: 37 (/dev/thermal-board/temp1_input) Temp Ctrl [C]: 57 (/dev/thermal-eth/temp1_input) fans are spinning close to maximum. Do you see similar temperatures? To me they seem pretty high, more like the temperatures someone reported earlier when running the system without fans. Do you have any suggestions regarding the SD card/UHS problem?
gprovost Posted June 7, 2018 Author Posted June 7, 2018 Hi guys, sorry for the lack of follow-up lately, was completely caught up in another venture but now it is over. So the focus will come back 100% on Helios4 support and the second campaign. @JakeK I think the problem you mentioned is something that we have encountered before at rare occasion and it seemed to be linked to the Marvell ethernet driver under Kernel 4.14. Could be a bring up sequence issue that makes the Ethernet PHY not probed properly by the SoC. Will investigate and revert to you. @nemo19 It's true that your CPU temps seems a quite high. Meanwhile the SoC die is designed to operate without issue up to 115 degres. Couple of things you could do to help troubleshoot along with what Igor suggested. Can you check the thermal pad is well positioned between the SoC and the heatsink. For that you will need to unscrew the heatsink. Can you explain why the load is constantly 70% ? Is it because of RAID resync or something else ? Any chance you can keep a console open on the serial until the crash you described occurs again ? Can you share your /var/log/messages and /var/log/syslog history, unless you wipe everything already ?
nemo19 Posted June 8, 2018 Posted June 8, 2018 Hello @gprovost, good to hear you're back! 1. I unscrewed the heat sink, there is some thermal paste between the sink and the base board, but no thermal pad on top of the CPU or any of the chips (see the attached images). I might have some thermal paste lying around somewhere, should I just put some on the ICs? 2. The load was at around 70% due to Syncthing hashing my media files for off-site synchronization. With the latest release kernel Syncthing would be using both cores for hashing resulting in 100% CPU utilization. With the nightly kernel it is using only one core according to htop. Those 50% plus some background tasks result in the 70% load. 3. Using the nightly kernel the system was stable for 1 day and 15 hours until I shut it down to unscrew the heatsink. About half that time idle, half that time with the described load. Once I put it back together I can stress it and try to trigger a crash while connected via serial. 4. I forgot to pull the files before shutting down, I will post them later on. So, what should I do about the heatsink?
gprovost Posted June 8, 2018 Author Posted June 8, 2018 @nemo19 Ok no need to look any further, you found the issue. There should be a thermal pad between the CPU and the Heatsink as shown below. Without the thermal pad no proper heat transfer can happen, therefore the CPU might have reached above Maximum Junction Temperature (115C) resulting by it to get unstable and crash. I'm really sorry about this missing thermal pad, this should definitively not have happened, I will report / complain to the company that handled the board assembly for us. FYI the thermal pad dimension we are using is 20x20x1mm. Please provide me by private message your complete shipping address. I will send you this missing thermal pad. In the meantime you can try using thermal paste, even though the gap between CPU and Heatsink is a bit too big for thermal paste. 1
nemo19 Posted June 8, 2018 Posted June 8, 2018 Thank you, I'll send the PM right away. I was kind of expecting something like this but was worried that removing the heatsink and investigating would void some kind of warranty Since the temperatures seemed off I logged them, which never showed anything above 98C, thus still well below the 115C threshold. So I thought there might be another issue. Could there be some permanent damage to the board or SoC due to running without proper cooling for several weeks?
gprovost Posted June 8, 2018 Author Posted June 8, 2018 18 minutes ago, nemo19 said: I was kind of expecting something like this but was worried that removing the heatsink and investigating would void some kind of warranty Since the temperatures seemed off I logged them, which never showed anything above 98C, thus still well below the 115C threshold. So I thought there might be another issue. It could be a quick temperature rise. Plus since the heat need to dissipate somewhere, if no heatsink, it will transfer to other components which might create other side effects. Hard to say exactly, but I'm certain the root cause here is the lack of heat dissipation. 31 minutes ago, nemo19 said: Could there be some permanent damage to the board or SoC due to running without proper cooling for several weeks? I'm confident everything will be back to normal once you have proper heat transfer. In any case you keep me updated and if something is still not ok then we find a solution ;-)
nemo19 Posted June 14, 2018 Posted June 14, 2018 I received and mounted the thermal pad from SolidRun on Tuesday. My Helios4 now runs cooler and quieter under maximum load than when idling before. It's been running for 1 day 16 hours at 70% to 100% cpu load hashing files, staying below 80C and without crashing. If it keeps going like this I'll be very happy. Thank you for the great board and the great service! 1
Guest Posted June 14, 2018 Posted June 14, 2018 Hello, I just wanted to check in about suspend to ram and WOL. Are there any plans to fix it?
gprovost Posted June 18, 2018 Author Posted June 18, 2018 On 6/14/2018 at 9:12 PM, igor.bernstein said: Hello, I just wanted to check in about suspend to ram and WOL. Are there any plans to fix it? Yes it's still in the pipe but not the highest priority. It is not so trivial because of some decisions that was made quite a while ago in kernel for other A388 implementations.
iNTiGOD Posted June 23, 2018 Posted June 23, 2018 (edited) Hey I pulled open my kit today excited to assemble. To my disappointment one of the fans is broken (fins are snapped). Does anyone have a link where I can purchase a replacement pwm fan? I've taken a look on aliexpess and ebay but can find an exact mate.This seems a close match.EC21 has the exact model I believe but I have never ordered from them. Any advice would be great. Regards G ... just an update. I had a quick email reply from kobal. Ordered some new fans, fingers crossed Edited June 23, 2018 by iNTiGOD
gprovost Posted June 26, 2018 Author Posted June 26, 2018 On 6/23/2018 at 12:01 PM, iNTiGOD said: ... just an update. I had a quick email reply from kobal. Ordered some new fans, fingers crossed Let me know once you received and replaced the damage fan ;-)
iNTiGOD Posted June 30, 2018 Posted June 30, 2018 Hi gprovost, Fans have been replaced. Finally got this baby setup (although it was a little trickier for me as I dont have a windows/linux machine handy, just my mac). A couple quick questions.... I'm using OMV v3 but have noticed none of the performance graphs (under system information) are generating graphs. Also the fans seem to be running at full speed. Is there a way to fix this (I tried the steps earlier in the thread to no avail). Cheers
tkaiser Posted June 30, 2018 Posted June 30, 2018 2 hours ago, iNTiGOD said: I'm using OMV v3 but have noticed none of the performance graphs (under system information) are generating graphs This is by design on all ARM boards for the simple reason that activating monitoring results in permanent writes to the rootfs and if this is on flash media (SD card) Write Amplification is huge and the card will die way earlier. So if you want nice looking (but pretty useless) graphs simply activate monitoring and be prepared for your SD card failing early. Or move the rootfs to other storage (which has other downsides, eg. a HDD not spinning down any more when idle) I tried to explain the problem wrt Write Amplification already: https://forum.armbian.com/topic/6635-learning-from-dietpi/?tab=comments#comment-50489 https://forum.armbian.com/topic/6444-varlog-file-fills-up-to-100-using-pihole/?do=findComment&comment=50833
iNTiGOD Posted June 30, 2018 Posted June 30, 2018 51 minutes ago, tkaiser said: This is by design on all ARM boards for the simple reason that activating monitoring results in permanent writes to the rootfs and if this is on flash media (SD card) Write Amplification is huge and the card will die way earlier. So if you want nice looking (but pretty useless) graphs simply activate monitoring and be prepared for your SD card failing early. Or move the rootfs to other storage (which has other downsides, eg. a HDD not spinning down any more when idle) I tried to explain the problem wrt Write Amplification already: https://forum.armbian.com/topic/6635-learning-from-dietpi/?tab=comments#comment-50489 https://forum.armbian.com/topic/6444-varlog-file-fills-up-to-100-using-pihole/?do=findComment&comment=50833 Ok thanks for letting me know. This is the first time I’ve used omv with a single board setup. I did notice the flash memory plugin enabled but didn’t put the 2 together :)
Recommended Posts