Jump to content

Helios4 ECC - CPU freq info - air flow


Bramani

Recommended Posts

1. Question: How do I check that ECC is really active?

 

2. Possible Kernel issue: The Debian Buster image's Kernel seems to lack CPU frequency info settings: "cat: /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq: No such file or directory". How can that be fixed?

 

3. Possible assembly issue: I think the air flow through the case should be reversed.

 

Testing scenario: two units about 0,5 m apart in the same location. One unit assembled as shown in the Wiki, the second one assembled with reverse fan direction. First unit equipped with WD40EFRX-68N32N0 (3 platters), second one with the older WD40EFRX-68WT0N0 (4 platters). Because of that, the second unit was expected to run slightly warmer.

 

First test after 12 hs idling, second test 30 mins. after a reboot. As you can see, the unit with the reversed air flow always stays cooler even with lower fan speeds.

 

In idle, there is a whopping 5.72°C difference in SoC temperature and a 2.56°C difference in ambient temperature. Fans speed in second unit reduced by nearly 36%. Disk temperatures are all pretty similar.

After reboot, there is a still a 3.80°C difference between the SoCs and a 1.94°C difference in ambient temperature. Fans speed difference is nearly 29%. Disks in the second unit run warmer, but are now within their optimal temperature range between 30°C and 40°C. (Beyond that, failure rates rise steeply even if WD40EFRX are rated for up to 60°C.)

 

Of course, behaviour under load remains to be seen. I cannot test that at this early stage. But even with measurement inaccuracies taken into account I still think the fans should be mounted the other way round. That would require either longer screws, silicone anti-vibration mounts or you would have to do with the fans mounted on the other side of the plate with the ribbon cables threaded through one of the holes, as I have done for this test.

 

Thoughts/comments?

12 hs idling:

Unit #1:                     Unit #2:
Fan J10 RPM:    68           Fan J10 RPM:    44
Fan J17 RPM:    74           Fan J17 RPM:    47

SoC core temp.: 49,408.      SoC core temp.: 43,695
Ambient  temp.: 34,687.      Ambient  temp.: 32,125

/dev/sda temp.: 23           /dev/sda temp.: 23
/dev/sdb temp.: 23           /dev/sdb temp.: 23
/dev/sdc temp.: 22           /dev/sdc temp.: 23
/dev/sdd temp.: 22           /dev/sdd temp.: 25

/dev/sda state: standby      /dev/sda state: standby
/dev/sdb state: standby      /dev/sdb state: standby
/dev/sdc state: standby      /dev/sdc state: standby
/dev/sdd state: standby      /dev/sdd state: standby


30 mins. after reboot:

Unit #1:                     Unit #2:
Fan J10 RPM:    76           Fan J10 RPM:    52
Fan J17 RPM:    74           Fan J17 RPM:    55

SoC core temp.: 49,408.      SoC core temp.: 45,600
Ambient  temp.: 35,312.      Ambient  temp.: 33,375

/dev/sda temp.: 26           /dev/sda temp.: 29
/dev/sdb temp.: 26           /dev/sdb temp.: 30
/dev/sdc temp.: 26           /dev/sdc temp.: 31
/dev/sdd temp.: 26           /dev/sdd temp.: 30

/dev/sda state: active/idle  /dev/sda state: active/idle
/dev/sdb state: active/idle  /dev/sdb state: active/idle
/dev/sdc state: active/idle  /dev/sdc state: active/idle
/dev/sdd state: active/idle  /dev/sdd state: active/idle

 

Link to comment
Share on other sites

19 hours ago, Bramani said:

1. Question: How do I check that ECC is really active?

 

During U-Boot load you will see the DRAM line that shows you if ECC mode is enable.

It is then transparent to the Linux.

 

45947086_Screenshotfrom2019-09-0913-09-10.png.013c0a194631e3c059ae84d3a8027359.png

 

19 hours ago, Bramani said:

2. Possible Kernel issue: The Debian Buster image's Kernel seems to lack CPU frequency info settings: "cat: /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq: No such file or directory". How can that be fixed?

 

Thanks for reporting that, there is effectively something missing somewhere to not have that cpufreq path present... we are looking at it.

 

19 hours ago, Bramani said:

3. Possible assembly issue: I think the air flow through the case should be reversed.

 

Very interesting investigation on the air flow direction impact.

 

However can you first take a picture of the back of the unit on which you reverse the fan orientation. Because if I understood well you just reversed the fan plate (not screwed the over way around the fan) so there is a bigger gap / distance between the fan and the internal component (Helios4 board and HDDs), am I right ? If it's the case, the increase distance between the fan and the component could effectively have an impact since there will be less air resistance with the increase of internal space... For my experience changing the fan orientation doesn't have much impact because end of the day a fan is just moving air... of course depending of the internal layout the air resistance might be different from one direction to another. One of the reason of having air intake at the front (something you will see on almost any appliance with fan) is that most of time people will stuck their Helios4 in cupboard / shelf, where most likely there will be a build up of hot air and dust at the back... so it's better to suck fresh air from the front.

 

You can also play with /etc/fancontrol config file to adapt it more to your country / house environment, making the fan speed curve more quiet. We are located in a hot and humid country so maybe our default fan configuration is slightly aggressive. What matters, as you pointed out, is that the HDD temp stays withing this 20-40 degrees range, the SoC itself is designed to run at much higher temperature so doesn't really matters. Maybe it's true we should have implemented fan control based on HDD temp ;-)

 

Anyhow it's an interesting topic that we are happy to talk about.

 

 

Link to comment
Share on other sites

Quote

During U-Boot load you will see the DRAM line that shows you if ECC mode is enable.

It is then transparent to the Linux.

I see. But does the Armbian Kernel actually make use of it? I am curious, because I know how to check for ECC functionality on arch x86, but unfortunately not on arm.

 

Air Flow: My apologies for being too focused on my own narrow use case. A bit naive and unprofessional an approach, I guess. I really didn't think of units placed in cupboards or other locations where dust and waste heat might cause trouble (mine are in an open 19" rack).

You were right, though, the first approach basically corresponded to a reversed fan plate. I have since changed that back to the original position, but with reversed fans. So they are now mounted exactly as shown in the Wiki, but with the labels facing inwards. As mentioned, that required slightly longer screws. I have used those from a Lian Li OF-01B anti-vibration fan mounting kit that was lying around.

 

This configuration shows even greater differences. Currently I see a 10°C divergence between the SoCs with a 3°C difference in ambient temperature. No relevant changes in disk temps. Now, I don't trust these numbers myself, so I will repeat the above tests at the weekend when there is more time and report back in this thread.

 

Fan noise was never the trouble, by the way; the default fancontrol settings were not changed. These lower fan speeds are just a result of the case interior keeping cooler. I think your settings are fine. Keeping the disks within their optimal temperature range is far more important, and in my environment the fans are barely audible in normal operation.

Link to comment
Share on other sites

I wonder if it's more a matter of where are the sensors mounted relative to the fans.  The sensors only tell the story at a very specific point in space, after all.

 

The helios board itself is placed in the back half of the enclosure.  With reversed (intake) fans, you're blowing external cool air directly on the board.

With standard (exhaust) fans, you're pulling internal air that's been preheated by the HD's, across the board.

Link to comment
Share on other sites

On 9/12/2019 at 4:18 AM, Bramani said:

I see. But does the Armbian Kernel actually make use of it? I am curious, because I know how to check for ECC functionality on arch x86, but unfortunately not on arm.

It's the integrated memory controller of the SoC that it's in charge of doing the ECC operation not the operating system. It's why it's transparent to the OS but yes when you see it as Enable during U-Boot initialization then it means it is used.

 

On 9/12/2019 at 4:18 AM, Bramani said:

This configuration shows even greater differences. Currently I see a 10°C divergence between the SoCs with a 3°C difference in ambient temperature. No relevant changes in disk temps. Now, I don't trust these numbers myself, so I will repeat the above tests at the weekend when there is more time and report back in this thread.

I guess I would have to redo some test to cross check the delta you see between the 2 fan setup. But I think @devman make a good point that sensor location might have a significant impact.

I think the main point of comparison point should be the HDD temp. 

 

Well it's the cool thing about Helios4, it give flexibility to people to fine tune their setup according to their need / environment.

Link to comment
Share on other sites

@devman: Good point. And I think you are right. Out of curiosity I have placed another sensor in the case, directly beneath the disks. It is not very precise though, but I thought it might at least give a hint. I have no others at the moment (HomeMatic IP HmIP-STH).

 

@gprovost: Thanks for the clarification on ECC! That is good to know - and also very reassuring.

 

Regarding the delta I think it was a temporary glitch. And yes, I agree that the HDD temps should be the main point. The new numbers are below, and they support your configuration. Come to think of it, I am not sure if I would like to see the HDD temps rise above another five degrees or so.  The last summers were hot here and room temperature could easily climb above 30°C.  I think ich will reverse these reversed fans again. But it was at least fun to test!

 

Yes, I like the Helios4 for it's possibilities. - By the way, when you announced that you would ship each order with an OLED display (as I read it), I meant to ask for another one for the second unit. I was more than happy to learn that each one would be shipped with it. So, identical hardware. Small things, but they look cool, especially in the dark.

12 hs idling:

Room temperature: 21,70°C    Humidity: 55%

Unit #1:                     Unit #2:
Fan J10 RPM:    74           Fan J10 RPM:    39
Fan J17 RPM:    74           Fan J17 RPM:    41

SoC core temp.: 48,456       SoC core temp.: 42,267
Ambient  temp.: 34,812       Ambient  temp.: 31,250
Case:           24,10        Case:           24,20
      Humidity: 48%                Humidity: 48%

/dev/sda temp.: 23           /dev/sda temp.: 23
/dev/sdb temp.: 23           /dev/sdb temp.: 24
/dev/sdc temp.: 23           /dev/sdc temp.: 24
/dev/sdd temp.: 23           /dev/sdd temp.: 25

/dev/sda state: standby      /dev/sda state: standby
/dev/sdb state: standby      /dev/sdb state: standby
/dev/sdc state: standby      /dev/sdc state: standby
/dev/sdd state: standby      /dev/sdd state: standby


30 mins. after reboot:

Room temperature: 22,00°C    Humidity: 54%

Unit #1:                     Unit #2:
Fan J10 RPM:    74           Fan J10 RPM:    52
Fan J17 RPM:    76           Fan J17 RPM:    49

SoC core temp.: 49,408       SoC core temp.: 44,171
Ambient  temp.: 35,500       Ambient  temp.: 32,500
Case:           23,80        Case:           25,10
      Humidity: 50%                Humidity: 49%

/dev/sda temp.: 27           /dev/sda temp.: 29
/dev/sdb temp.: 27           /dev/sdb temp.: 29
/dev/sdc temp.: 27           /dev/sdc temp.: 30
/dev/sdd temp.: 27           /dev/sdd temp.: 31

/dev/sda state: active/idle  /dev/sda state: active/idle
/dev/sdb state: active/idle  /dev/sdb state: active/idle
/dev/sdc state: active/idle  /dev/sdc state: active/idle
/dev/sdd state: active/idle  /dev/sdd state: active/idle

 

Link to comment
Share on other sites

Hello,

i am a bit confused about the fans and the RPM.

My Helios4 Batch3 is pretty loud.

 

I would like to check the RPM of the fans and the status of the HDUs.

How do i get such a nice overview like @Bramani?

 

i red, the fan in Batch3 can not be shut off, should/could i switch the fans?

 

Thanks in advance,

Jeckyll

Edited by Jeckyll
Link to comment
Share on other sites

Apologies again, I started this thread out of eagerness and curiosity. Not to create confusion. This said, the fans should be relatively quiet once fancontrol kicked in after boot, but that depends very much on your environment. You are right though, Batch 3 fans cannot be shut off (see Wiki). I am not sure whether I understand your second question correctly, are you considering to reverse the fans (and the air flow)? Please don't. As gprovost wrote, the default direction should be the optimum for most use cases. Keeping the HDDs primarily at safe operating temperatures is what really matters in a NAS.

 

The overview are just a few shell functions I whipped up for comparison of the two units. They have several flaws. For instance, I have used sd[a|b|c|d] names instead of proper UUIDs. The label "RPM" ist wrong, it should read "PWM" instead. And I am not sure if the temperature representation is correct. But nevertheless, here they are. Note that getCPUFreq and getCPUStats currently do not work on Buster, but on Stretch only.

 

Add these to your non-privileged user's .bashrc and reload with "source .bashrc" or relogin. After that, simply enter "getSysStatus" on the commandline to print the overview.

# Print current CPU frequency
getCPUFreq() {
    local i freq
    for i in 0 1; do
        freq=$(cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq)
        printf "%s\n" "CPU $i freq.:    ${freq%???} MHz"
    done
}

# Print CPU statistics
getCPUStats() {
    local i stats
    for i in 0 1; do

        # This works, but it needs three expensive syscalls to external commands
        #stats="$(cpufreq-info -c $i | grep 'stats' | sed 's/,/;/;s/:/: /g;s/\./,/g')"

        # Same, but reduced by one
        stats="$(cpufreq-info -c $i | \
                 awk '/stats/{gsub(",",";");gsub(":",": ");gsub("\.",",");print}')"

        # Cut front and end from string; this could be done in awk, too, but the 
        # resulting expression would be long and hard to decipher for non-awk users.
        # Using shell internal string functions should not be that expensive, either.
        stats="${stats#*: }"
        stats="${stats% *}"

        # Finally, print the resulting string, nicely formatted
        printf "%s\n" "CPU $i stats:   ${stats}"
    done
}

# Print system fans speed
getFanSpeed() {
    local i j=3 speed
    for i in 10 17; do
        speed=$(cat /sys/devices/platform/j$i-pwm/hwmon/hwmon$j/pwm1)
        printf "%s\n" "Fan J$i RPM:    ${speed}"
        ((j++))
    done
}

# Print SoC temperature
getSoCTemp() {
    local temp=$(cat /sys/devices/virtual/thermal/thermal_zone0/temp)
    printf "%s\n" "SoC core temp.: ${temp%???},${temp: -3}"
}

# Print ambient temperature
getAmbientTemp() {
    local temp=$(cat /dev/thermal-board/temp1_input)
    printf "%s\n" "Ambient  temp.: ${temp%???},${temp: -3}"
}

# Print temperature of all HDDs
getDriveTemps() {
    local i temp
    for i in /dev/sd[abcd]; do
        temp=$(sudo /usr/sbin/smartctl -a $i | awk '/^194/{print $10}')
        printf "%s\n" "$i temp.: ${temp}"
    done
}

# Print current power mode status of all HDDs
getDriveStates() {
    local i state
    for i in /dev/sd[abcd]; do
        state="$(sudo /sbin/hdparm -C $i)"
        printf "%s\n" "$i state: ${state##* }"
    done
}


# Print system status
getSysStatus() {
#    printf "\n"
#    getCPUStats
#    printf "\n"
#    getCPUFreq
    printf "\n"
    getFanSpeed
    printf "\n"
    getSoCTemp
    getAmbientTemp
    printf "\n"
    getDriveTemps
    printf "\n"
    getDriveStates
}

 

Link to comment
Share on other sites

Hey,

 

thanks for your answer. 

@gprovost i checked fancontrol, everything is exactly like the Wiki says.

@Bramani thanks a lot for the script. Now i have to learn what your are saying about the bashrc :P I dont know Linux very well.  You droped the word "ich" some posts earlier, so i guess we share the same environment;)

I dont want to reverse the fans, i thought about exchange them.

 

But, today the Helios was extremely loud. I checked and touched it (i have to crouch under the table in the last corner of the room) and the noise stopped. Now i assume i have two "problems". 

1. the plate where the fans are attached to are not tight enough and starts to vibrating or something like that, maybe i can fix that with a piece of paper.

2. the fans spin to fast. i will check this with the script from Bramani the next days.

 

And again, thanks a lot :)

Link to comment
Share on other sites

21 hours ago, Jeckyll said:

But, today the Helios was extremely loud. I checked and touched it (i have to crouch under the table in the last corner of the room) and the noise stopped.

Couldn't it be the ribbon cables of the fans that touch the fan blades ?

Link to comment
Share on other sites

Hello,

 

no i dont think so, i know the sound of "fan eat cable" its more something vibrating. However, yesterday i put a piece of silicon under the Helios, since then everything is fine.

The fans are spinning in an expected noise-level. I still guess the case is not tight enough and starts vibrating. 

My Helios stands on a wooden floor, maybe its not a good combination.

 

Independently from my experience, maybe you could add a silicon ring for the HDD screws between HDD and case in the next Batch.

 

Thanks for your support, im fine and happy now :)

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines