Jump to content

prahal

Members
  • Posts

    116
  • Joined

  • Last visited

Everything posted by prahal

  1. @snakekick Thanks, that seems to confirm my findings back a few months ago. Adding a 5ms delay in the test case did not prevent the crash. Though it could be the system load is at play. Maybe adding a delay at the kernel level would do. pcie is tagged on the big CPUs so the SATA disks seem to matter (as the ethernet port). One could try in emergency mode (passing emergency to the kernel (I do it by "setenv extraboardargs emergency" after halting u-boot with a key press then enter "boot"). You will have just the root partition mounted read-only (so no network connection, a serial console is required). Then run the test. Also note that the design of the GPU regulator has the same issue as the CPU b one ... (for my tests I blacklisted panfrost, ie the GPU driver). After looking at the rockchip64 board schematics the design around the CPU b regulator is not similar but exactly the same as the helios64 one (rockchip64 uses a tcs4545 regulator for cpu b and tcs4546 for GPU). I wonder if the easiest fix would not be to pay someone to desolder the syr837 regulator and solder a tcs4545 instead - same for the GPU regulator a tcs4546 instead of the syr838... except that these chips from Torch Chip seem nowhere to be found. Maybe rip them from a rockpro64 board. @aprayoga can you confirm the Helios64 design for the rk3399 big cpu and gpu regulators are the same as the RockPro64 ones? Would it make sense (and would it fix the unstable cpu_b) to desolder the syr837/syr838 to replace them with tcs4545/tcs4546? Ie the tcs4535 datasheet (I am still unable to find the tcs4545 datasheet) I found tells tcs425 has internal pulldown for VSEL and EN which syr837 does not, the syr837 datasheet requires a 22uF capacitor for VIN but the helios64 has a 10uF one like the rockpro64 for the tcs4545. The SW pin of the helios64 has 470uH inductor with 4 x 22uF capacitors like the rockpro64 for the tcs4545 (like the typical application in the tcs4535 datasheet with 470uH inductor with two 22uF capacitors)? Do you know a replacement for the TCS4545/TCS4546 that has closer specs than the syr837/syr838? I cannot seem to find TCS4545/TCS4546 for sale (maybe I could buy a rockproc64 to desolder them at least for a test... or could you check on your side with a helios64 board that the cpufreq-switching-2-b test above crash with syr837 but not with tcs4545 with vanilla rk3399 opp definitions in dts? Sadly the Helios64 filled a market that is left unfilled. People who do not have the know-how to go full low-wattage DIY NAS and who also cannot afford to pay 1K€ for a NAS (and who might need two NASs to make things worse). In the meantime, I spend a lot of time learning about DIY NAS, but it is still hard to get wattage at full load (they tend to give all idle power usage). I probably will end up gambling and buying one build and pray... but with Helios64 I had the metrics before buying. I found that the Rock960 has the same design for the cpu_b and gpu regulators except for the inductor which is 0.240uH on the rock960 and 0.470uH on the Helios64. But hard to tell if the Rock960 is stable with my cpufreq switching test for the big cpus of the rk3399, might be the use of the board does not stress it as much as a raid10 on the helios64 pcie sata which is tagged to the cpu_b ... (initially it was 4 3TB WD Red - the old CMR model WD30EFRX-68EUZN0), from Helios4 setup as advertised by Kobol wiki for the Helios4... the board crashes on first boot after assembly with this raid setup. Mind I found that the Pinebook Pro also has the same design as the Helios64 this time around the syr837/syr838 ... I begin to wonder if either they are all broken (could be the amount of stress of a NAS ethernet or raid10 pcie is not that common) or if this is not the issue at stake.
  2. I wonder if upping the voltage was the correct fix (and if it would always work). From other rk3399 board schematics and TCS4525 datasheet ... it seems Kobol team designed the board for the TCS4525 regulator used before CPU BIG in a lot of designs and replaced the TCS4525 with the SYR837 later on (without taking into account the different recommendations for the SYR837 ... ie VIN with a 22uF instead of 10uF for the TCS4525). All the components around the SYR837 on the helios64 datasheet match the reference design for the TCS4525 (from Torch Chip, datasheet behind Chinese paywalls). I don't know if replacing the VIN capacitor would be enough to get stable big CPUs...
  3. this should take time, how long did it take to complete? Could you paste the last 10 lines of output from the command (or even a single run)? And maybe run the test with "time for i in $(seq 1 100); do ./cpufreq-switching-2-b ; done" to get the time it took at the end (but if it took ages it ran fine, it is not required to run the 100 iterations anew). It might be that the test runs fine on your hardware. That would be interesting. But as I said as it crashed once I doubted it. One option is that in one one the first attempt you tried: and on the second you tried https://gist.github.com/prahal/8fab73325eb0d7091ad7c4627bf8e25a which has a delay between cpub frequency transition of 5 milliseconds while the first has no delay at this point. (again sorry I did not notice the gist github one had this 5msec delay I added to test if a delay would help. To check you can replace: "#define TRANSITION_DELAY 5000" by "#define TRANSITION_DELAY 0" and check if it crashes. Then it will point to an issue with the delay between switching operating points for the big CPU. Do you know which kernel was running when your box crashed? Also, do you know which u-boot you have? (requires serial console output) Mind you don't need to paste cpufrequtils data because the test case bypasses the cpufrequtils settings and manages the chosen frequencies and how to switch them on its own. Note: if you want to quote a text from this forum, select it with your mouse, a popover box will appear above the selection "Quote selection", click on it. You can quote more than one selection to the same post.
  4. @BipBip1981you mean you have no crash running cpufreq-switching-2-b five times with 6.6.16 and 6.6.28? No that is not what I expected especially since you told me the first time it crashed and rebooted? So it crashed with which kernel? Did you rebuild cpufreq-switching-2-b between the test that crashed and the ones that did not? You can run the test in a loop 100 times with: for i in $(seq 1 100); do ./cpufreqswitching/cpufreq-switching-2-b; done With only one opp not upped 75mV I have seen tests crash only after 80 runs but without any changes, it seems unlikely. Could you paste the 10 last lines from a cpufreq-switching-2-b run? I could think that any boards have defective components ... but then why did your board crash once and then no more? By the way, do not compile cpufreq-switching.c, as I told you previously it was not the correct code for the test case. Has no use for the issue at stake. It was a first attempt because Kobol team told the crash could be due to too fast frequency switching, so I tried the extremes only. But it turns out these are the most stable and likely the only ones that survive without upping the opp voltage by 75 mV. Yes, because it set the governor to userspace to be able to force switch the frequency via code. After the run, it does not restore the cpufreq-utils governor (/etc/default/cpufrequtils) "systemctl restart cpufrequtils.service" should restore it for you. You mean cpufreq-switching-2-b that output thousands of lines for each run, built with "gcc -o cpufreq-switching-2-b cpufreq-switching-2.c", from: and https://gist.github.com/prahal/8fab73325eb0d7091ad7c4627bf8e25a (note there is a small diff between the two, the "usleep(50);" which should not matter).
  5. @BipBip1981I don't understand what was not crashing on second try just after reboot. Still thank you for running my test case v2 (and again sorry for pasting you the v1 which was not the correct one to reproduce the crash at first). It is expected for the v2 to crash the board quite fast. Even if it survives a run you should test a few runs (at least 5). Next, we need to find someone to look into the schematics to find out if upping the voltage is the best course of action. If so ship the upped opp voltages into helios64 dts.
  6. If the issue is that the cpu frequency is switched too fast and I can reproduce the crash with a regulator-ramp-delay of 1000, then there is no point in testing anything above 1000 that will make the issue worse. regulator-ramp-delay is badly named. It is not a dealy it is a divider for the delay. The greater regulator-ramp-delay the fastest the transition (I believe the Kobol team made this mistake, but as I also believe the issue could be otherwise than the delay between transitions this is not a big deal). I still have not tried with a lower than 1000 value for regulator-ramp-delay (ie without tweaking the opp voltages as I am currently doing).
  7. @ebin-devI believe initramfs messages are not written to syslog. @Trillien you see that message on the serial console? /usr/share/initramfs-tools/scripts/local-bottom/mdadm is part of the mdadm package which pcakaged by Debian. "dpkg -S /usr/share/initramfs-tools/scripts/local-bottom/mdadm", "apt policy mdadm" Though it could be the fact that the generated initramfs lack/bin/rm is armbian specific. You might want to open a bug against armbian or at least open a topic in the forum. But nothing helios64 specific as far as I know. Could even be a Debian bug. I don't even know if we ought to fix this missing /bin/rm for mdadm at the board level, even as a workaround.
  8. could you try my older test case code: Turns out I did not compile my test case anew before pasting it to github gist and could be the new one I pasted there is not testing what I expected (in that it could be I changed it to try testing CPU frequency changes from max to min instead of each step). Mind I use a binary of the test case I made long ago for my tests which is the one in the link above. I did not feel like sharing a binary test case was a good idea. I prefer you to be able to audit the code (or have someone audit it for you). , I did not have much time to devote to sharing my findings so I checked the source was fine but not if the test was the same as the one I used on my side to stress test the big cpu. Sorry. It looks normal for you the test case I shared to you working fine as as far as I know 1.8GHz 1.2V and 408MHz at 825mV are pretty stable. They could crash I am not sure of that, but it would take more than 50 runs of the test for it to happen (at least it took 80 of them for the 600MHz to fail at 825mV). Mind you should do at least 5 runs of the above test case to be somewhat confident you cannot get the cpu b to crash. The fact that it does not crash is not the point of the test. Its usefulness is that it nearly always crashes the big cpu on the first run. EDIT: the previous gists I gave you as a test case was my v1. The current test case is https://gist.github.com/prahal/8fab73325eb0d7091ad7c4627bf8e25a which is in the other thread I linked in this comment.
  9. @ebin-devnearly, I do not change the max value of the voltage (only the min and the central value): change opp-microvolt = <0xc96a8 0xc96a8 0x1312d0>; to opp-microvolt = <0xdbba0 0xdbba0 0x1312d0> though it could be you could be able to increase the max value, only I don't know if it is safe and how to know if so. Note that in the edited dts (be it via armbian-config or else) you can replace the hex numbers by decimals. Ie you can write: opp-microvolt = <900000 900000 0x1312d0> It is way easier than computing the hex of the initial voltage with 75mV added. @BipBip1981 best is to have a reproducible way to trigger the crash. Then you can tell when the issue is gone. My test case is https://gist.github.com/prahal/316111da0a9b8cc0d0791d26659dc682 If you can run it without a crash with any kernel it is new to me. (I Believe I even got the linux 4.4 helios64 first kernel to crash with this test case). With this patch to increase the min and "central" voltage (I believe requested voltage) by 75mV I cannot get my above test case to crash helios64 (mind there are other helios64 crashers so it best to run the test case in systemd emergency mode, but I managed to run it 100 times in "full" session mode): EDIT: this patch is incomplete: since then I have added opp-00 an opp-01 with the same values as opp-02 (ir 900000 and the appropriate frequencies) diff --git a/arch/arm64/boot/dts/rockchip/rk3399-kobol-helios64.dts b/arch/arm64/boot/dts/rockchip/rk3399-kobol-helios64.dts index 77844650e2fe..34d94e4d6ada 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-kobol-helios64.dts +++ b/arch/arm64/boot/dts/rockchip/rk3399-kobol-helios64.dts @@ -1160,10 +1160,36 @@ &cluster0_opp { /delete-node/ opp06; }; &cluster1_opp { /delete-node/ opp08; + + /delete-node/ opp02; + /delete-node/ opp03; + /delete-node/ opp04; + /delete-node/ opp05; + /delete-node/ opp06; + opp02 { + opp-hz = /bits/ 64 <816000000>; + opp-microvolt = <900000 900000 1250000>; + }; + opp03 { + opp-hz = /bits/ 64 <1008000000>; + opp-microvolt = <950000 950000 1250000>; + }; + opp04 { + opp-hz = /bits/ 64 <1200000000>; + opp-microvolt = <1025000 1025000 1250000>; + }; + opp05 { + opp-hz = /bits/ 64 <1416000000>; + opp-microvolt = <1100000 1100000 1250000>; + }; + opp06 { + opp-hz = /bits/ 64 <1608000000>; + opp-microvolt = <1175000 1175000 1250000>; + }; }; &cpu_thermal { trips { cpu_warm: cpu_warm { Mind this patch will not apply (the cpu_thermal is from another patch of mine. But it gives you an idea of what you should write. Also, you should account that crashes might be related to the load or the speed between transitions in the load. So a kernel version might help but will merely hide or render a crash less frequent. But it is not even a workaround, merely it makes the crash more or less frequent. It might be there is still a bug in the kernel that only affects helios64, but it is unlikely. I think I always had the helios64 (even on the first boot after I mounted the box) because I have a mdadm raid10 with ext4 setup. The raid10 stress the board (and especially the big cpus). If you could try my stress with your stable kernel that would help decipher if this kernel is really stable with regards to big cpu. Mind that even with this cpu-b 75mV workaround I still get crashes from my board, but not with my test case, and way less often. I don't have a test case or know what triggered these remaining crashes yet. Also, the fact that upping by 75mV workaround crashes when cpufreq switching the big cpus might not fix the root cause. I am not able to analyze the schematic on my own. We would need someone to do so to get a clearer clue as to why this helps and why it could be required. Finally the rk339 is told to be very robust. So it could be it sometimes works with invalid voltages but not all the time.
  10. Can you provide the exact commands you run to get the crash? Also for most of the instability (big CPU cluster) see my comment above, that is up the opp-table-1 voltages by 75mV. Else I will post the DTS block to up the opp-table-1 voltages by 75mV in a few days at most, I hope.
  11. @ebin-devI discussed with on IRC #u-boot and I believe one board designer told me that there could be issue with the regulatorh hardware design (CPU big). He suggested me to up the voltage to max after looking at the schematics (that are available in the wiki in the left pane documents section) to try if it fixed my crashes and this indeed fixed lost of them. That is I first tried every opp-table-1 ie cpu-b at 1.2V then I tried with voltage closer to the vanilla rk3399 ones. In the end I was able to run the cpufreq switching test I gave you 100 times without a crash with upping all the opp voltages for cpu-b by 75mV. Any of the opp run mostly stable with only 50mV but in I still had crashes. So up 75mV looks fine. I still have crash around once a day but not with my cpufreq test case as far as I know. I am now on on demand cpufreq governor with freq from 408MHz to 1.8GHz. Still I would really like to be able to be able to reproduce the crashes I still get. They might be from gpu opp voltages as they have the same hardware design as the CPU-big. Or something else. But I doubt the kernel is involved except that any kernel version might stress the board less. But for one I had added a big delay between CPU-b frequent switching and still had crashes, so I doubt the speed has anything to do with it. And in my test I tried with all cpu-b OPP voltages to 1.2V except even the lowest one and was still able to get random crashes with my cpufreq test case, so I doubt this had anything to do with high freq. Only that 1.6GHz was the one the most sensible to a voltage without 75mV up from upstream rk3399 OPP voltage values. And 408/600 were the less likely to crash but still crashed from time to time. I don't know if you know how to redefine the opp voltage values for cpu-b. I will try to post you my patch asap ( currently on my phone).
  12. @TDCroPower invalid free and segfault: likely https://github.com/armbian/build/issues/4761 ie you need rockchip DDR bin in u-boot. Nothing OMV7 related (but python3 related which OMV7 makes use of).
  13. @Juriom might be related to the hs400es issue from mainline
  14. @ebin-dev what I meant was that the Helios64 specific code is in the rk3399 section. But I believe nowadays (you can check in /etc/armbian-release the value of BOARDFAMILY) this rk3399 is not read on a helios64 install. The section "rockchip64" above the "rk3399" one is instead (line 203 of /usr/lib/armbian/armbian-hardware-optimization for the armbian-bsp-cli-helios64-edge package). You can check the logs of the systemd service: systemctl status armbian-hardware-optimize.service Maybe you already tweak the end0 in the rockchip64 section though you could then see that the Helios64 BOARD_NAME section is under rk3399; not rockchip64. @ebin-devYou seem to have the Helios64 code that is applied: Can you give your /etc/armbian-release content? Especially the BOARDFAMILY value? And "dpkg -l armbian-bsp-cli-helios64-*" output?
  15. @TDCroPower the kernel packages are armbian BRANCH based (current, edge, etc). So not bound to the OS release ("current" will always be the latest armbian stable release), but armbian channel bound. You can have multiple branches installed (in this case if the current is updated after edge kernel debs current will be the new default kernel loaded at boot even if edge is newer, I believe we are not supposed to install both edge and current but nothing prevents us from doing so). You can tell your installed kernel packages with: dpkg -l "linux-*-rockchip64" | grep "^.i" here I have: hi linux-dtb-edge-rockchip64 24.2.0-trunk arm64 Armbian Linux edge DTBs in /boot/dtb-6.6.11-edge-rockchip64 hi linux-headers-edge-rockchip64 24.2.0-trunk arm64 Armbian Linux edge headers 6.6.11-edge-rockchip64 hi linux-image-edge-rockchip64 24.2.0-trunk arm64 Armbian Linux edge kernel image 6.6.11-edge-rockchip64 so it means I only have edge branch kernel installed. So to put them on hold (they already are thus the leading "hi", for on hold and installed), I do: sudo apt-mark hold linux-dtb-edge-rockchip64 linux-headers-edge-rockchip64 linux-image-edge-rockchip64 If your installed Linux kernel packages are named linux-*-current-rockchip64 you instead do: sudo apt-mark hold linux-dtb-current-rockchip64 linux-headers-current-rockchip64 linux-image-current-rockchip64 Mind I replaced armbian-bsp-cli-helios64-current with armbian-bsp-cli-helios64-edge and noticed that the edge version have BRANCH defined as "current" in the /etc/armbian-release that is shipped with the edge version. You can check the version of your armbian-bsp-cli package with: dpkg -l "armbian-bsp-cli-helios64-*" I believe the fact the "edge" version does define BRANCH="current" in /etc/armbian-release shipped by this package to be a bug and would explain why armbian-config would fail to freeze your kernel.
  16. @ebin-dev@OdyX I found that the Helios64-specific code in /usr/lib/armbian/armbian-hardware-optimization is not run. This helios64-specific code is under the rk3399 BOARDFAMILY section, while for armbian-bsp-cli-helios64-current 23.11.1 (bookworm) which ships /etc/armbian-release BOARDFAMILY is rockchip64 .... I found out by checking /proc/interrupts and seeing ahci spread across little and big cores and xhci only on little cores. You might also notice that armbian-hardware-optimization sets settings on eth0 while we only have end0 and eth1.
  17. @ebin-dev I am currently cleaning a backup archive on the helios64. I will test values below 1000 asap but I do not expect much (I already had the regulator-ramp-delay set at 1000 for months and it is not stable. Though it could be this regulator-ramp-delay is not the issue ... I already tried adding "regulator-settling-time-us = 5000", no better). I will also try with my test program only asking for a frequency switch every 5 seconds instead of 50 microseconds. I will also try to skip any frequencies to test if only specific frequencies are at play. At least with a reliable crasher (the above test program), it is easier to tell if a setting helps or not (not "it did not crash for a week so it is better" when the trigger for the crasher might not have happened for this week only). The test program helps but I am out of clue what other setting to try. If it turned out that this test program also crashes other rk3399 boards (or even knowing it does not) that would help. I would also like to test with the xhci and ahci interrupts removed from the big cores. This is the main difference with other boards.
  18. @ebin-dev about regulator-ramp-delay you should take the rationale in the commit that introduced this setting in the kernel as a reference, not the comment from the Kobol team commit (which states that increasing this value has slowed down the frequency switching, as in my understanding they misunderstood the Odroid post https://forum.odroid.com/viewtopic.php?t=30303 which was about speeding the transitions not slowing them down because the poster wanted faster transition and he tested that even with a faster transition - ie greater regulator-ramp-delay - the CPU was still stable). As the Linux mainline commit states regulator ramp delay is the uV per uS, that is the greater it is the more V is switched per unit of time. I already reverted it to its previous 1000 value but as it was already unstable before being increased to 40000 I am not surprised it is still unstable (though my program ran longer than yours, but it might be random). I will try to decrease it next attempt. Still, to me, something else should be at play otherwise I do not understand why the same CPU would require a very slow transition switching on helios64 and a very fast one on Odroid N1 😕 At best if it works lowering regulator-ramp-delay this would be a workaround in my opinion. I begin to doubt the correctness of the dts nodes set by the Kobol team (thinking they could have set the wrong regulator type for vdd_cpu_b or the like, or maybe set the wrong pinctrl definition for this regulator ... all things that cannot be confirmed as they did not provide the schematics. I found a picture of the board without the heatsink (from the Kobol team on Twitter https://twitter.com/kobol_io/status/1281088456391667713) but I believe the picture is not detailed enough to see the marking on the syr827 regulator for cpu_b. And it will not tell the wiring and pulldown. Maybe we could ask @aprayoga as he told he would still be around, in September 2021 https://forum.armbian.com/topic/18844-kobol-team-is-pulling-the-plug/?do=findComment&comment=128364). And I do not exclude DDR timings even though from the previous DDR issue (which led me to revert to rockchip DDR setting blob in u-boot) it seems to me such an issue also affects userspace and with the current instability I do not get user space programs crashing, only kernel errors (but this is based on a single experience of a DDR setting issue). I also want to try other things like an ATX power supply plugged to the board instead the power adapter (even though my multimeter shown above 12V on the board with the power adapter, power is a common cause of kernel issue on SBC).
  19. @ebin-dev@OdyX if time permits you could try changing "CPUL" to "1" and "CPUB" to "0" in my above code ("#define CPUL 1" for example). Running the program on cpu_l (slower 4 CPUs) should not crash. you can then compile as: gcc -o cpufreq-switching-2-l cpufreq-switching-2.c and run it. @ebin-dev if yours crashes in one second it seems my hardware is as stable as your board... sad. If it was a matter of soldering a component or even a new RK3399 CPU I would have tried. I believe that the fact any have the issue often and other sporadically has to do with the load (and maybe the mains power and ground could make it even more frequent but it is just a guess). I believe something is wrong with the cpu_b regulator or the voltage it is fed. I tested the 12V input voltage on the board and it was fine. Note that CPU big (CPU 4 and 5) loads are related to PCI/SATA and r8152 (in armbian build repository): (I believe the r8152 assignment to CPU4 is an assignment of the whole USB3, not r8152 only). from the last Kobol team posts the cause instability of the instability is unknown Any told it was DFS, ie the instability would not come from the frequencies per se that are set during the transitions but the speed between these transitions (from the odroid forum post https://forum.odroid.com/viewtopic.php?t=30303 the bigger the frequency switch at once the more unstable). However this remains to be confirmed that this is what makes the big CPU on our board unstable, the Odroid n1 post and @piter75 patchset for NanoPi M4V2 were about the little cores, not the big ones. That is they added "max-buck-steps-per-change = 4;" to help with instability but this setting applies to the rk808-D regulator which to me only affects the little CPU cluster (I have not yet tried if the little cluster is unstable without this setting though), ie not the a72 CPUs. As confirmed by the patch submitter @piter75 these max-buck-steps-per-change were to fix little cores: I believe the big CPU cluster is stable on other rk3399 boards (even those with the same syr827 regulator), though it is just a guess. If one could try my cpufreq-switching-2-b test on another rk3399 board that would help. The Kobol team also took a patch from the Odroid team repository (https://forum.odroid.com/viewtopic.php?t=30303) which switches the vdd_cpu_b regulator-ramp-delay from 1000 to 40000 to improve stability ... though I believe they misunderstood (the odroid patch aim was to speed up transition because it was tested as still stable). Increasing this regulator-ramp-delay does not up the delay between frequencies transition but fasten it (thus doing the opposite to what they meant to fix the instability that is slowing down frequency switching ie https://patchwork.ozlabs.org/project/uboot/patch/20190216094548.911-7-krzk@kernel.org/ the regulator-ramp-delay is in uV/uS which means it is the number of uV that it switches per uS. Increasing it switches faster. Maybe we could try the opposite that is lower this value and retry the test program.
  20. @ebin-dev @OdyX note that the instability I noticed is related to the big CPU cluster (the a72) and that the SATA and XHCI (usb3 including r8152) are bound to these a72 big cpus in /usr/lib/armbian/armbian-hardware-optimization (though I do not understand this code yet): case ${BOARD_NAME} in "Helios64") for i in $(awk -F":" "/xhci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do echo 10 > /proc/irq/$i/smp_affinity done for i in $(awk -F":" "/ahci/ {print \$1}" < /proc/interrupts | sed 's/\ //g'); do echo 30 > /proc/irq/$i/smp_affinity done ;; I believe raid resync or rebuild triggers this instability via the SATA activity (even at boot). (ie I kept my HDD setup from the Helios4 install instructions, ie raid10) So not using RAID could help keep the board stable, even with a heavy load.
  21. @TDCroPower I use "apt-mark hold <package>" and then on "apt upgrade" I get the package marked as hold told: The following packages have been kept back: linux-dtb-edge-rockchip64 linux-headers-edge-rockchip64 linux-image-edge-rockchip64 The following packages will be upgraded: openmediavault-compose still "apt list --upgradable" tells the on-hold packages as upgradable. I mean "apt-mark showhold" is the way to tell which packages will be kept on upgrade. For the second question, why on your second box you do not have linux-dtb-* and linux-image-* on hold, I can only guess from the armbian-config code (ie /usr/lib/armbian-config/jobs.sh and search for "Freeze") that armbian-config only Freeze the installed packages with the name including the BRANCH value from "grep BRANCH /etc/armbian-release". I believe if you have linux-dtb-edge-rockchip64 installed with BRANCH=current in /etc/armboan-release then armbian-config will fail to mark the package with "edge" instead of "current" in its name as on hold.
  22. @OdyX can you try this testcase with 5.15.93 (it should crash)
  23. @ebin-dev can you confirm your box crashed before completing this program: cpufreq-switching-2.c #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <malloc.h> #include <unistd.h> #include <sys/mman.h> #define MAIN_LOOPS (100) #define TRIALS_PER_TOGGLE (10) #define MAX_MEGS (64) #define CPUL 0 #define CPUB 1 const char *cpul_freqs[] = { "408000", "600000", "816000", "1008000", "1200000", "1416000" }; const char *cpub_freqs[] = { "408000", "600000", "816000", "1008000", "1200000", "1416000", "1608000", "1800000" }; uint32_t *megs[MAX_MEGS]; int checked_open(char *name) { int fd = open(name, O_RDWR); char err[128]; if (fd < 0) { snprintf(err, 128, "cannot open %s", name); perror(err); exit(1); } return fd; } #define SCALING_PATHL "/sys/devices/system/cpu/cpu0/cpufreq/" #define SCALING_PATHB "/sys/devices/system/cpu/cpu4/cpufreq/" void browse_freq(int *cpul_index, int *cpub_index, int *cpul_step, int *cpub_step) { static int inited = 0; int freql_target_len; int freqb_target_len; int freqfd; int cpul_freqs_count = 0; int cpub_freqs_count = 0; cpul_freqs_count = sizeof(cpul_freqs)/sizeof(cpul_freqs[0]); cpub_freqs_count = sizeof(cpub_freqs)/sizeof(cpub_freqs[0]); if (!inited) { #if CPUL freqfd = checked_open(SCALING_PATHL "scaling_governor"); write(freqfd, "userspace", 9); close(freqfd); #endif #if CPUB freqfd = checked_open(SCALING_PATHB "scaling_governor"); write(freqfd, "userspace", 9); close(freqfd); #endif inited = 1; } if (*cpul_index >= cpul_freqs_count - 1) *cpul_step = -1; if (*cpul_index <= 0) *cpul_step = 1; if (*cpub_index >= cpub_freqs_count - 1) *cpub_step = -1; if (*cpub_index <= 0) *cpub_step = 1; *cpul_index += *cpul_step; *cpub_index += *cpub_step; #if CPUL printf("cpul_freq %s\n", cpul_freqs[*cpul_index]); freql_target_len = strlen(cpul_freqs[*cpul_index]); freqfd = checked_open(SCALING_PATHL "scaling_setspeed"); write(freqfd, cpul_freqs[*cpul_index], freql_target_len); close(freqfd); #endif #if CPUB printf("cpub_freq %s\n", cpub_freqs[*cpub_index]); freqb_target_len = strlen(cpub_freqs[*cpub_index]); freqfd = checked_open(SCALING_PATHB "scaling_setspeed"); write(freqfd, cpub_freqs[*cpub_index], freqb_target_len); close(freqfd); #endif } void write_test_data(int nmegs, int toggle) { int cpul_index = 0; int cpub_index = 0; int cpul_step = 1; int cpub_step = 1; while (nmegs--) { browse_freq(&cpul_index, &cpub_index, &cpul_step, &cpub_step); } } void check_test_data(int nmegs, int toggle) { int cpul_index = 0; int cpub_index = 0; int cpul_step = 1; int cpub_step = 1; while (nmegs--) { browse_freq(&cpul_index, &cpub_index, &cpul_step, &cpub_step); } } int main(int argc, char **argv) { int nmegs = MAX_MEGS; printf("allocated %dMB\n", nmegs); int nloop, ntoggle, ntrial; printf("test: toggle freq before write\n"); for (nloop = 0; nloop < MAIN_LOOPS; nloop++) { printf("\r%d/%d ", nloop, MAIN_LOOPS); fflush(stdout); write_test_data(nmegs, 1); usleep(50); check_test_data(nmegs, 0); } printf("\n"); printf("test: toggle freq before read\n"); for (nloop = 0; nloop < MAIN_LOOPS; nloop++) { write_test_data(nmegs, 0); usleep(50); for (ntrial=0; ntrial < TRIALS_PER_TOGGLE; ntrial++) { printf("\r%d/%d, %d/%d ", ntrial, TRIALS_PER_TOGGLE, nloop, MAIN_LOOPS); fflush(stdout); check_test_data(nmegs, 1); } } printf("\n"); return 0; } gcc -o cpufreq-switching-2-b cpufreq-switching-2.c then running it: sudo ./cpufreq-switching-2-b I was able to reproduce the crash even with linux-u-boot-edge-helios64_22.02.1_arm64.deb. That is rockchip ddr binary and atf and u-boot 2021.07, as well as the current one. Your box being pretty stable and mine not lasting long that would help me decipher if my board has a hardware issue or if the load I apply to the board is at fault (the electrical environment my helios64 lives in could be at play too, but that is another topic)
  24. @ebin-dev I confirm that latest u-boot https://fi.mirror.armbian.de/beta/pool/main/l/linux-u-boot-helios64-edge/ has the rockchip DDR. ! You might want to wait as it seems uboot compiling is broken in armbian ! You could test with rockchip ATF blob too (which I guess is what is inside `linux-u-boot-edge-helios64_22.02.1_arm64.deb`). To do so edit `config/boards/helios64.csc` in armbian build clone and replace `BOOT_SCENARIO="tpl-blob-atf-mainline"` by `BOOT_SCENARIO="spl-blobs"` (if you details check the comments in `config/sources/families/include/rockchip64_common.inc`). Then build u-boot deb with: ./compile.sh uboot BOARD=helios64 BRANCH=edge RELEASE=bookworm After installing the deb you can install the u-boot to the emmc (even if your OS is on SD u-boot is read from emmc first by helios64, except if you set the jumper) wit: source /usr/lib/u-boot/platform_install.sh write_uboot_platform $DIR /dev/mmcblk0 (where /dev/mmcblk0 is the emmc) That would help confirm your r8192 issue is related to mainline ATF vs rockchip ATF.
  25. @ebin-dev I will test the latest u-boot from https://fi.mirror.armbian.de/beta/pool/main/l/linux-u-boot-helios64-edge/ and tell you if it has the rockchip DDR as soon as I can. Your current u-boot linux-u-boot-edge-helios64_22.02.1_arm64.deb has the same rockchip DDR blob than I put back in latest merge request. But your ATF (wich is called by the Linux kernel at runtime) is way older (version 1.3 from July 2020 while current ATF LTS is version 2.8) and seems to be have rockchip tweaks. Your u-boot is v2021.07. My Helios64 suffers from random crashes at runtime. I will try with the ATF you have. Thanks for having provided your version. Do you have any Linux oops say once in a month or is helios64 perfectly stable with your setup ( I mean out of the r8152 triggering the netdev watchdog, that is a plain crash that requires a reboot to restore functionality?
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines