FredK Posted November 23, 2020 Share Posted November 23, 2020 I ordered a replacement PSU which was delivered and put in operation this morning. I'll give feedback in any case (spontaneous reboot or continous operation). Link to comment Share on other sites More sharing options...
Heisath Posted November 23, 2020 Share Posted November 23, 2020 19 hours ago, Mangix said: I use qbittorrent ina docker container. Okay is there an easy way to install? Then I might try it. 19 hours ago, Mangix said: Again, it's a kernel issue. .66 is the last one that does not reboot. 8 hours uptime so far. With all future kernels, I can barely get 2 hours. I believe you on the kernel issue. Just thinking it might be a user space triggering a kernel issue issue. Link to comment Share on other sites More sharing options...
Mangix Posted November 24, 2020 Author Share Posted November 24, 2020 I will need to do more testing. .65 rebooted multiple times today... @Heisathso I actually install OpenMediaVault on top of armbian with https://github.com/OpenMediaVault-Plugin-Developers/docs/blob/master/Adden-A-Installing_OMV5_on_Armbian.pdf Under OMV, I install Portainer and then install https://hub.docker.com/r/linuxserver/qbittorrent Link to comment Share on other sites More sharing options...
FredK Posted November 25, 2020 Share Posted November 25, 2020 (edited) Am 23.11.2020 um 12:25 schrieb FredK: I ordered a replacement PSU which was delivered and put in operation this morning. I'll give feedback in any case (spontaneous reboot or continous operation). Spontaneous reboot with new PSU after 26 hours of operation. EDIT: System now upgraded to 5.8.18-mvebu Buster 20.11 (coming from 5.8.16-mvebu Buster 20.08.22) Edited November 25, 2020 by FredK Link to comment Share on other sites More sharing options...
Mangix Posted November 26, 2020 Author Share Posted November 26, 2020 @Heisathhow do I build dev kernels? compile.sh only shows current and legacy. Link to comment Share on other sites More sharing options...
gprovost Posted November 26, 2020 Share Posted November 26, 2020 FYI if it helps you can find old Helios4 images here : https://archive.armbian.com/helios4/archive/ Link to comment Share on other sites More sharing options...
Mangix Posted November 26, 2020 Author Share Posted November 26, 2020 so without the cpufreq patches, I can't get my Helios 4 to reboot. Problem solved looks like. edit: that's with kernel 5.8.18. I'm curious about 5.9 but looks like those cpufreq patches were the issue. They're not upstream and I only see armbian with them. Link to comment Share on other sites More sharing options...
gprovost Posted November 26, 2020 Share Posted November 26, 2020 @Mangix I might have missed out something, but which patched are you referring to exactly ? Link to comment Share on other sites More sharing options...
Heisath Posted November 26, 2020 Share Posted November 26, 2020 5 hours ago, Mangix said: @Heisathhow do I build dev kernels? compile.sh only shows current and legacy. You can build dev with ./compile.sh EXPERT=yes 26 minutes ago, gprovost said: @Mangix I might have missed out something, but which patched are you referring to exactly ? I assume he is referring to the series of patches we have for DFS support: https://github.com/armbian/build/blob/master/patch/kernel/mvebu-current/800-Add_Armada_38x_support_for_clk-cpu.patch https://github.com/armbian/build/blob/master/patch/kernel/mvebu-current/801-Use_shorter_register_definition_in_pmsu_c.patch https://github.com/armbian/build/blob/master/patch/kernel/mvebu-current/802-Made_the_dynamic_frequency_scaling_support_more_generic.patch https://github.com/armbian/build/blob/master/patch/kernel/mvebu-current/803-Armada_38x_Add_dynamic_frequency_scaling_support_in_pmsu.patch https://github.com/armbian/build/blob/master/patch/kernel/mvebu-current/804-Update_Armada_38x_DT_for_dynamic_frequency_scaling.patch These have been in there for a long time and might indeed cause the crashes. Probably they need to be adjusted (although they still apply fine). Link to comment Share on other sites More sharing options...
gprovost Posted November 26, 2020 Share Posted November 26, 2020 I would have expected this patch would have made it mainline already but no :-/ Link to comment Share on other sites More sharing options...
Heisath Posted November 26, 2020 Share Posted November 26, 2020 Yeah maybe we have to contact Gregory Clement (the original author) on updates? I find no trace of this patch in any kernel after about 4.4 Maybe they figured it was broken and thus have not mainlined it? Link to comment Share on other sites More sharing options...
gprovost Posted November 26, 2020 Share Posted November 26, 2020 Are those the exact same patch that in Armbian : https://github.com/hnyman/openwrt/commit/90113cd70f33449a68827e63501dcc688c14d007 ? Also lets keep in mind that it has been measure that DFS for Armda38X doesn't provide much gain in term of power consumption and temperature so i think we could also consider removing it. Link to comment Share on other sites More sharing options...
Heisath Posted November 26, 2020 Share Posted November 26, 2020 Yeah these seem to be the exactly same. Only difference is their way of disabling the global timer. We remove the DT node, they disable the compilation. Our way: https://github.com/armbian/build/blob/master/patch/kernel/mvebu-current/fix_time_drift_remove_global_timer.patch Their way: https://github.com/hnyman/openwrt/commit/90113cd70f33449a68827e63501dcc688c14d007#diff-b7a0f3497875655ca3abc14fb540ca45c913b347a8b2c1efa2ae91b4fa5d9b39 EDIT: From the commit msg: "Note: upstream messages mention possible instability under heavy I/O." Link to comment Share on other sites More sharing options...
gprovost Posted November 26, 2020 Share Posted November 26, 2020 In view of the small benefit of having DFS on Armada 38x i would suggest we completely remove it... at least for the time being. Link to comment Share on other sites More sharing options...
Heisath Posted November 26, 2020 Share Posted November 26, 2020 Yeah might be the best way. If we just remove the patches on legacy & current, I assume all the existing systems will keep working but at max clock? I'd like to leave the patches in on dev to further test. Link to comment Share on other sites More sharing options...
Heisath Posted November 26, 2020 Share Posted November 26, 2020 I have updated our DFS patches with the OpenWRT ones. There were some small differences (probably not functional ones). The build compiles, dfs works and there is no time drift. As @Mangix has a reliable way of causing a hang it would be great if you could build a image based on the PR and test if the openwrt patches still cause the crashes. Afterwards I will either make a PR to remove the patches for legacy¤t or update the patches there to OWRT also Link to comment Share on other sites More sharing options...
Mangix Posted November 26, 2020 Author Share Posted November 26, 2020 I thought Hannu moved to developing for ipq806x. Interesting... I love how he notes instability under heavy I/O. That's exactly what I experience. From what I see, patch 806 accomplishes the same as fix_time_drift_remove_global_timer.patch in a cleaner way. Anyway, I will be waiting to confirm 24 hour uptime before I try anything else. I also vote for removing these patches. We don't have these in OpenWrt. Stability is more important. edit: on that last note, a PR like that for OpenWrt will be rejected. We have problems with having too many patches. We don't need any that have no chance of making it upstream eventually. edit2: the Turris people have also sort of abandoned this patchset. They have it for their OpenWrt fork, but they use mainline openwrt in newer versions. edit3: I will note, this device has fans. I don't think temperature is ever a problem. Link to comment Share on other sites More sharing options...
Heisath Posted November 26, 2020 Share Posted November 26, 2020 Just in case you misunderstood: I do not want to send any PR to OpenWRT. I am only asking you (as you have a reliable way of crashing) to test once without any DFS patches at all. And once with the OpenWRT DFS patches, which I added to AR-526. 1 Link to comment Share on other sites More sharing options...
Mangix Posted November 26, 2020 Author Share Posted November 26, 2020 Ah I see what you mean now. Any testing I do or don’t do will have to wait. My gut tells me these *new* patches have the same problem. Link to comment Share on other sites More sharing options...
Heisath Posted November 26, 2020 Share Posted November 26, 2020 I also assume these new patches have the same problem. But as some lines have changed they might be better adjusted to newer kernels. Because the DFS patches in general were already in 4.14, 4.19 etc. and as you had no problems prior to a specific 4.19 version, it is not "just" a problem with the patches but more a problem with the patches after a specific kernel version. So I assume the DFS patches don't fit as good anymore. In any case I'd like to be sure that DFS is not stable for mainline (and not just because of outdated patches) before we remove it. Link to comment Share on other sites More sharing options...
Mangix Posted November 26, 2020 Author Share Posted November 26, 2020 Related: https://forum.openwrt.org/t/cpu-frequency-scaling-driver-for-mvebu-wrt3200acm-etc/2808/91 Not looking good. edit: I got 18 hours uptime before I gave up. testing kernel 5.9 with that PR on GitHub. Hopefully this works. dmesg shows this also: debugfs: Directory 'cpu1' with parent 'opp' already present! edit2: seems this dev 5.9 kernel has broken PWM. Fans are going at full speed. Otherwise, I went hard at it for ~3 hours. I can't get it to reboot. We'll see if it survives 24 hours. Looks like the turris people fixed something... or the last patch is what actually fixes things. edit3: I got impatient. Flashed a freshly built kernel with a new dtb. Fan works correctly now. edit4: bad news. Even these new patches cause freezing. Turns out this is easier to reproduce with btrfs scrub. It reboots within an hour. Link to comment Share on other sites More sharing options...
Heisath Posted November 27, 2020 Share Posted November 27, 2020 I just merged the PR, so we now have DFS with the old patches on legacy¤t and the new apparently better patches on dev. You mentioned these new patches also freeze but only with btrfs scrub. Can you do one more test? Compare 'btrfs scrub' without DFS vs. 'btrfs scrub' with the new patches? This should then give definitive answer. Link to comment Share on other sites More sharing options...
Mangix Posted November 27, 2020 Author Share Posted November 27, 2020 I'm running btrfs scrub currently without the DFS patches. 2 hours uptime and counting. Old or new DFS patches do not make a difference. They both cause freezing. edit: I should mention the reason I'm running btrfs scrub is because of all of these kernel freezes. I'm expecting to see errors. So far there are none. That's pretty impressive as there have been 100+ freezes. Anyway I'm done with these DFS patches. Whether or not they get removed, I'm building my kernels without them. Link to comment Share on other sites More sharing options...
Heisath Posted November 27, 2020 Share Posted November 27, 2020 Okay if btrfs scrub works without the patches situation is clear. Thank you for your help! This disables DFS on legacy and current: https://github.com/armbian/build/pull/2387 Link to comment Share on other sites More sharing options...
kratz00 Posted November 27, 2020 Share Posted November 27, 2020 How does it work or better when will it be possible to run apt-get upgrade to update to a 5.18.* kernel without the problematic DFS patches? Link to comment Share on other sites More sharing options...
Heisath Posted November 27, 2020 Share Posted November 27, 2020 Once the above PR has been merged and a bugfix release has been done. We can inform you then. Link to comment Share on other sites More sharing options...
kratz00 Posted November 27, 2020 Share Posted November 27, 2020 14 minutes ago, Heisath said: Once the above PR has been merged and a bugfix release has been done. We can inform you then. Sounds great, thanks in advance, I am looking forward to having a stable Helios4 again. Link to comment Share on other sites More sharing options...
FredK Posted November 27, 2020 Share Posted November 27, 2020 FWIW Although it seems to be evidence that the DFS code is the cause for the spontaneous reboots, I want to inform you that my installation (see https://forum.armbian.com/topic/16038-random-system-reboots/?do=findComment&comment=113510) is running now more than two days after upgrade to 5.8.18-mvebu. Link to comment Share on other sites More sharing options...
Heisath Posted November 27, 2020 Share Posted November 27, 2020 Yeah that is the reason why it was not removed until now. No one complained. Armbian is past LK4.19 for a longer while (hell there even was a complete 5.4 release) and no one seems to have any issues. Igor, gprovost and myself all use Helios4 / clearfogpro on a daily basis (as NAS or whatever) and do not have / are unable to reproduce these problems... I think there are just many specific factors under which the DFS stuff causes problems. Link to comment Share on other sites More sharing options...
Mangix Posted November 29, 2020 Author Share Posted November 29, 2020 2 days uptime without DFS patches. No issues to report. Marked as solved. I assume new kernels will be out if they're not already. Link to comment Share on other sites More sharing options...
Recommended Posts