Jump to content

Mangix

Members
  • Posts

    66
  • Joined

  • Last visited

Posts posted by Mangix

  1. Is that btrfs raid5?

     

    edit: I just had a thought. Since the Helios 4 is underclocked, I wonder if placing it back to the stock clocks would make the DFS patches work.

     

    Then again, I do remember my Turris Omnia freezing as well.

     

    Never mind.

  2. 15 hours ago, FredK said:

    OTOH After the upgrade to 5.8.18, containing the DFS patches, my installation is up and running for 4 days now.

    Are there any negative implications to be expected after the release of the new kernels without DFS management?

    2 degrees higher CPU temperatures were reported on a WRT3200ACM.

     

    That's a router with no fans. In other words, no difference.

  3. I'm running btrfs scrub currently without the DFS patches. 2 hours uptime and counting. Old or new DFS patches do not make a difference. They both cause freezing.

     

    edit: I should mention the reason I'm running btrfs scrub is because of all of these kernel freezes. I'm expecting to see errors. So far there are none. That's pretty impressive as there have been 100+ freezes.

     

    Anyway I'm done with these DFS patches. Whether or not they get removed, I'm building my kernels without them.

  4. Related: https://forum.openwrt.org/t/cpu-frequency-scaling-driver-for-mvebu-wrt3200acm-etc/2808/91

     

    Not looking good.

     

    edit: I got 18 hours uptime before I gave up. testing kernel 5.9 with that PR on GitHub. Hopefully this works.

     

    dmesg shows this also:

     

    debugfs: Directory 'cpu1' with parent 'opp' already present!

     

    edit2: seems this dev 5.9 kernel has broken PWM. Fans are going at full speed. Otherwise, I went hard at it for ~3 hours. I can't get it to reboot. We'll see if it survives 24 hours. Looks like the turris people fixed something... or the last patch is what actually fixes things.

     

    edit3: I got impatient. Flashed a freshly built kernel with a new dtb. Fan works correctly now.

     

    edit4: bad news. Even these new patches cause freezing. Turns out this is easier to reproduce with btrfs scrub. It reboots within an hour.

  5. I thought Hannu moved to developing for ipq806x. Interesting...

     

    I love how he notes instability under heavy I/O. That's exactly what I experience.

     

    From what I see, patch 806 accomplishes the same as fix_time_drift_remove_global_timer.patch in a cleaner way.

     

    Anyway, I will be waiting to confirm 24 hour uptime before I try anything else.

     

    I also vote for removing these patches. We don't have these in OpenWrt. Stability is more important.

     

    edit: on that last note, a PR like that for OpenWrt will be rejected. We have problems with having too many patches. We don't need any that have no chance of making it upstream eventually.

     

    edit2: the Turris people have also sort of abandoned this patchset. They have it for their OpenWrt fork, but they use mainline openwrt in newer versions.

     

    edit3: I will note, this device has fans. I don't think temperature is ever a problem.

  6. so without the cpufreq patches, I can't get my Helios 4 to reboot. Problem solved looks like.

     

    edit: that's with kernel 5.8.18. I'm curious about 5.9 but looks like those cpufreq patches were the issue. They're not upstream and I only see armbian with them.

  7. I compiled my own 4.19.63 . So far, it's not rebooting. Fingers crossed that it can survive a day.

     

    If it does, kernel 4.19.64 and above are the problematic ones.

     

    The next step is to selectively revert potentially problematic commits and figure out which one is causing reboots.

     

    edit: I just realized that I never deleted the old cpufreq patches...

  8. I use qbittorrent ina docker container. Easily reboots the Helios4. Again, it's a kernel issue. .66 is the last one that does not reboot. 8 hours uptime so far. With all future kernels, I can barely get 2 hours.

     

    edit: just rebooted.

     

    I'm out of ideas at this point. I have a feeling it's a kernel configuration issue.

     

    I have no idea what config that 4.19.63 version has.

  9. Progress update: kernel 4.19.70 fails. .65 works. Testing .67 now.

     

    edit: .66 has not crashed yet. Will wait to see if it can stay alive for 12 hours.

     

    I'm trying to compile kernels based on commit. It doesn't seem to work though. I'm trying

     

    ```

    --- a/config/sources/families/mvebu.conf
    +++ b/config/sources/families/mvebu.conf
    @@ -10,7 +10,7 @@ fi
     case $BRANCH in
            legacy)
     
    -               KERNELBRANCH='tag:v4.19.66'
    +               KERNELBRANCH='commit:46b306f3cd7b47901382ca014eb1082b4b25db4a'
     
            ;;
    ```

     

    Which gives

     

    ```

    [ error ] ERROR in function compile_kernel [ compilation.sh:379 ]
    [ error ] Error kernel menuconfig failed 
    ```

     

    I'm trying to see which commit is responsible for the failure based on https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.19.158&ofs=9800

     

    Current theory is this commit: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v4.19.158&id=46b306f3cd7b47901382ca014eb1082b4b25db4a

     

    It says it's for 32-bit.

  10. With the watchdog disabled, it does not reboot. I have a serial log with journalctl -f running. I can't see anything interesting.

     

    Anyway, I now conclude this is an upstream kernel issue. Given that I know kernel 4.19.84 works and .104 is broken, I will try to narrow the issue down.

     

    edit: nope. .84 reboots as well. Given that I know .63 works (I had multiple months of uptime), I'll try versions between .63 and .84. Starting with .70.

     

    FFS this kernel rebooted while I was installing a new one. Now I have a brick. I forgot how to reinstall it. chroot something.

     

    @gprovostwestern digital blues.

  11. The more I think about this the more I think it's the kernel and not the power supply. I have 4 laptop hard drives connected to my Helios4. Those only use 5V. I also only started having issues when I swapped out the kernel.

     

    Time to figure out how to build an old kernel looks like.

  12. Sigh false alarm. It still happens. I've learned that I can reproduce by downloading with qbittorrent in addition to watching a video connected through a Samba share. This actually reminds me of the time on my Turris Omnia that I managed to reboot the device just by watching a video through a Samba share. I wonder if the same thing is happening here...

     

    When I mentioned that I could do this with mvebu and Samba but not ksmbd, the DD-WRT developer told me it's a serious kernel issue if a userspace program can crash the kernel.

     

    I only installed linux-image-current-mvebu_20.11.0-trunk_armhf.deb with dpkg -i.

     

    I'm thinking of collecting a serial log again. Should I be running journalctl -f while doing so?

     

    @gprovost watchdog is running, yes.

  13. I found the issue. It's some local armbian patch that messing things up.

     

    I recently cloned https://github.com/armbian/build and removed 5 pointless patches. That PR was merged. So I built that and same issue.

     

    Then I deleted a bunch of patches from the mvebvu-current directory.

     

    So far with this kernel, I am not getting any issues. My git status currently is this:

     

    ```

        deleted:    patch/kernel/mvebu-current/0044-gpio-report-all-gpios-in-debugfs.patch
        deleted:    patch/kernel/mvebu-current/40-pci-add-irq-change-handler-sspl.patch
        deleted:    patch/kernel/mvebu-current/402-sfp-display-SFP-module-information.patch
        deleted:    patch/kernel/mvebu-current/412-ARM-dts-armada388-clearfog-emmc-on-clearfog-base.patch
        deleted:    patch/kernel/mvebu-current/92-mvebu-gpio-add_wake_on_gpio_support.patch
        deleted:    patch/kernel/mvebu-current/92-mvebu-gpio-remove-hardcoded-timer-assignment-2.patch
        deleted:    patch/kernel/mvebu-current/92-mvebu-gpio-remove-hardcoded-timer-assignment.patch
        deleted:    patch/kernel/mvebu-current/dts-disable-spi-flash-on-a388-microsom.patch
        deleted:    patch/kernel/mvebu-current/fix_time_drift_remove_global_timer.patch
        deleted:    patch/kernel/mvebu-current/general-increasing_DMA_block_memory_allocation_to_2048.patch
        deleted:    patch/kernel/mvebu-current/unlock_atheros_regulatory_restrictions.patch

    ```

     

    My theory is that the pci patch or one of the GPIO ones is causing the issue.

×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines