0
Guy

NanopiDuo2 not booting after intense processing task

Recommended Posts

Armbianmonitor:

I started compiling some project (snowboy ) on the nanopiDuo2 like I am doing on the FriendlyCore 16.04 and suddenly the board stop working/booting.
The red led is blinking fast, the same sd card is working on different boards, FriendlyCore does not boot on this board.
"provide a log from serial console" I don't have serial cable yet.

 

This happened to me twice, on Armbian 18.04 and on 20.04.

 

First it destroyed the "IoT-2G Application Carrier Board for NanoPi Duo2" that it was on without the board, and the carrier had burnt smell, internet cable and power were connected.

For one of them the red led is blinking fast, for the other the led does not blink at all (I can upload video of the red led if it helps).

Just the last time it was compiling snowboy, the others I don't remember.

 

I am uploading the Armbianmonitor of the last sd card but using different board.

 

Share this post


Link to post
Share on other sites

It is possible that thermal throttling does not work very well. Need to be inspected ... Can you try to limit the max cpu speed and switching CPU governor. Set min and max to 1Ghz (armbian-config -> system) and retry what you were doing.

 

9 minutes ago, Guy said:

This happened to me twice, on Armbian 18.04 and on 20.04.

 

Kernel is identical - what you are switching is user land applications and scripts. Kernels are legacy (4.19.y), current (5.4.y), dev (5.6.y) ... they can be switched in armbian-config.

Share this post


Link to post
Share on other sites
On 5/9/2020 at 2:16 PM, Igor said:

It is possible that thermal throttling does not work very well. Need to be inspected ... Can you try to limit the max cpu speed and switching CPU governor. Set min and max to 1Ghz (armbian-config -> system) and retry what you were doing.

 

 

Kernel is identical - what you are switching is user land applications and scripts. Kernels are legacy (4.19.y), current (5.4.y), dev (5.6.y) ... they can be switched in armbian-config.

Out of the box (i.e. without armbian-config) governor is ondemand with frequency 480Mhz-1.37Ghz (5.4.41). Stressing all cpu's quickly raises temperature to 56C, at which point I stopped. Is there any way I could implement the system would not overclock by default?

Share this post


Link to post
Share on other sites
1 minute ago, xwiggen said:

Is there any way I could implement the system would not overclock by default?


How do you mean by default? That you will build the image with limited CPU speed to 1Ghz?

Share this post


Link to post
Share on other sites
5 minutes ago, Igor said:


How do you mean by default? That you will build the image with limited CPU speed to 1Ghz?

I mean that deploying an image as newbie would not allow the cpu to overclock, possibly damaging hardware (the sdcard will run very hot too).

[considering faulty temp sensor on orangepi's this might seem even preferable]

Solution IMO would not be to touch the kernel image (as it's simply a hardware capability) but to enable cpufreq daemon with safer defaults in the distro images (e.g. armbian-firstrun?). Then it's still possible to overclock thru armbian-config if you want.

 

 

Share this post


Link to post
Share on other sites
5 minutes ago, xwiggen said:

I mean that deploying an image as newbie would not allow the cpu to overclock, possibly damaging hardware

 

Damage control is alright, its working, but board is getting powered off which is not exactly a desired way of normal operations. I am aware of the problem, but we are looking for a best solution. 

A possible workaround would be to add:

CPUMAX=1080000

to all smaller board configs, that might be affected by this problem 

 

https://github.com/armbian/build/blob/master/config/boards/

Share this post


Link to post
Share on other sites

@Igor - in case this is helpful, might this be fixed for the Duo2 by making the same/a similar change like I did for the R1?  See my change here: https://github.com/armbian/build/commit/88023e0eccbf25c8a22d6365d20d9bc4df78003b.  I believe the Duo2 uses the same power circuit as the R1.  By adding the correct regulator entry for the MP2143DJ then the correct CPU clock values will be used.  (I didn't make this change originally as I don't have a Duo2 to verify the changes on.)

 

Share this post


Link to post
Share on other sites
1 hour ago, 5kft said:

adding the correct regulator entry


Thanks. I will check that when possible. Can such symptom also be related to wrong regulator settings?

Share this post


Link to post
Share on other sites
35 minutes ago, Igor said:

Thanks. I will check that when possible. Can such symptom also be related to wrong regulator settings?

 

I wouldn't think that would be the case...  I would really need to refresh my memory on this area, but in doing a quick check of the DTs it seems that the cooling maps are much smaller/simpler than they were previously (?) - e.g.,

 

	thermal-zones {
		cpu_thermal: cpu-thermal {
			polling-delay-passive = <0>;
			polling-delay = <0>;
			thermal-sensors = <&ths 0>;

			trips {
				cpu_hot_trip: cpu-hot {
					temperature = <80000>;
					hysteresis = <2000>;
					type = "passive";
				};

				cpu_very_hot_trip: cpu-very-hot {
					temperature = <100000>;
					hysteresis = <0>;
					type = "critical";
				};
			};

			cooling-maps {
				cpu-hot-limit {
					trip = <&cpu_hot_trip>;
					cooling-device = <&cpu0 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
							 <&cpu1 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
							 <&cpu2 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
							 <&cpu3 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
				};
			};
		};

IIRC, there used to be more passive trips that would keep the CPU cooler earlier before hitting critical, weren't there?  I was thinking of going to go back in time and look at some older ones to compare...I figure starting there might help provide some insight.

 

Share this post


Link to post
Share on other sites
2 hours ago, Igor said:

Yes, that's my IIRC too ...

 

Yes, that's it!  The "cooling-device entries" map to the CPU clock frequencies to use - this is what makes it clock down and cool off.  With these missing, it's just going to overheat and hit critical...

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
0