Guy Posted May 9, 2020 Posted May 9, 2020 Armbianmonitor: http://ix.io/2lnj I started compiling some project (snowboy ) on the nanopiDuo2 like I am doing on the FriendlyCore 16.04 and suddenly the board stop working/booting. The red led is blinking fast, the same sd card is working on different boards, FriendlyCore does not boot on this board. "provide a log from serial console" I don't have serial cable yet. This happened to me twice, on Armbian 18.04 and on 20.04. First it destroyed the "IoT-2G Application Carrier Board for NanoPi Duo2" that it was on without the board, and the carrier had burnt smell, internet cable and power were connected. For one of them the red led is blinking fast, for the other the led does not blink at all (I can upload video of the red led if it helps). Just the last time it was compiling snowboy, the others I don't remember. I am uploading the Armbianmonitor of the last sd card but using different board.
Igor Posted May 9, 2020 Posted May 9, 2020 It is possible that thermal throttling does not work very well. Need to be inspected ... Can you try to limit the max cpu speed and switching CPU governor. Set min and max to 1Ghz (armbian-config -> system) and retry what you were doing. 9 minutes ago, Guy said: This happened to me twice, on Armbian 18.04 and on 20.04. Kernel is identical - what you are switching is user land applications and scripts. Kernels are legacy (4.19.y), current (5.4.y), dev (5.6.y) ... they can be switched in armbian-config.
xwiggen Posted May 20, 2020 Posted May 20, 2020 On 5/9/2020 at 2:16 PM, Igor said: It is possible that thermal throttling does not work very well. Need to be inspected ... Can you try to limit the max cpu speed and switching CPU governor. Set min and max to 1Ghz (armbian-config -> system) and retry what you were doing. Kernel is identical - what you are switching is user land applications and scripts. Kernels are legacy (4.19.y), current (5.4.y), dev (5.6.y) ... they can be switched in armbian-config. Out of the box (i.e. without armbian-config) governor is ondemand with frequency 480Mhz-1.37Ghz (5.4.41). Stressing all cpu's quickly raises temperature to 56C, at which point I stopped. Is there any way I could implement the system would not overclock by default?
Igor Posted May 20, 2020 Posted May 20, 2020 1 minute ago, xwiggen said: Is there any way I could implement the system would not overclock by default? How do you mean by default? That you will build the image with limited CPU speed to 1Ghz?
xwiggen Posted May 20, 2020 Posted May 20, 2020 5 minutes ago, Igor said: How do you mean by default? That you will build the image with limited CPU speed to 1Ghz? I mean that deploying an image as newbie would not allow the cpu to overclock, possibly damaging hardware (the sdcard will run very hot too). [considering faulty temp sensor on orangepi's this might seem even preferable] Solution IMO would not be to touch the kernel image (as it's simply a hardware capability) but to enable cpufreq daemon with safer defaults in the distro images (e.g. armbian-firstrun?). Then it's still possible to overclock thru armbian-config if you want.
Igor Posted May 20, 2020 Posted May 20, 2020 5 minutes ago, xwiggen said: I mean that deploying an image as newbie would not allow the cpu to overclock, possibly damaging hardware Damage control is alright, its working, but board is getting powered off which is not exactly a desired way of normal operations. I am aware of the problem, but we are looking for a best solution. A possible workaround would be to add: CPUMAX=1080000 to all smaller board configs, that might be affected by this problem https://github.com/armbian/build/blob/master/config/boards/
5kft Posted May 20, 2020 Posted May 20, 2020 @Igor - in case this is helpful, might this be fixed for the Duo2 by making the same/a similar change like I did for the R1? See my change here: https://github.com/armbian/build/commit/88023e0eccbf25c8a22d6365d20d9bc4df78003b. I believe the Duo2 uses the same power circuit as the R1. By adding the correct regulator entry for the MP2143DJ then the correct CPU clock values will be used. (I didn't make this change originally as I don't have a Duo2 to verify the changes on.)
Igor Posted May 20, 2020 Posted May 20, 2020 1 hour ago, 5kft said: adding the correct regulator entry Thanks. I will check that when possible. Can such symptom also be related to wrong regulator settings?
5kft Posted May 20, 2020 Posted May 20, 2020 35 minutes ago, Igor said: Thanks. I will check that when possible. Can such symptom also be related to wrong regulator settings? I wouldn't think that would be the case... I would really need to refresh my memory on this area, but in doing a quick check of the DTs it seems that the cooling maps are much smaller/simpler than they were previously (?) - e.g., thermal-zones { cpu_thermal: cpu-thermal { polling-delay-passive = <0>; polling-delay = <0>; thermal-sensors = <&ths 0>; trips { cpu_hot_trip: cpu-hot { temperature = <80000>; hysteresis = <2000>; type = "passive"; }; cpu_very_hot_trip: cpu-very-hot { temperature = <100000>; hysteresis = <0>; type = "critical"; }; }; cooling-maps { cpu-hot-limit { trip = <&cpu_hot_trip>; cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, <&cpu2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, <&cpu3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>; }; }; }; IIRC, there used to be more passive trips that would keep the CPU cooler earlier before hitting critical, weren't there? I was thinking of going to go back in time and look at some older ones to compare...I figure starting there might help provide some insight.
Igor Posted May 20, 2020 Posted May 20, 2020 32 minutes ago, 5kft said: IIRC, there used to be more passive trips that would keep the CPU cooler earlier before hitting critical, weren't there? Yes, that's my IIRC too ...https://github.com/armbian/build/blob/master/patch/kernel/sunxi-legacy/ths-29-add-correct-h5-thermal-zone.patch
5kft Posted May 21, 2020 Posted May 21, 2020 2 hours ago, Igor said: Yes, that's my IIRC too ... Yes, that's it! The "cooling-device entries" map to the CPU clock frequencies to use - this is what makes it clock down and cool off. With these missing, it's just going to overheat and hit critical...
Recommended Posts