Jump to content

stability with orangepi lite, one and zero


dottgonzo

Recommended Posts

Hi,
 
From months i'm using the raspberry boards to monitor, control, log and to sending commands to remote stations. I've fighted against wifi devices that stop to work untill i've tried the raspberry pi3 and the edimax dongles, correct the sd errors (untill i've decided to run the boards with a readonly fs), installed the (software) watchdog, etc... but now everything works perfectly from months on more then 20 boards equipped with good power supply and sd.
 
The last months i've tried to replace the raspberry pi board, with the orange pi lite board and armbian jessie 5.20 server. With the old version of orange pi pc, connected by ethernet, everything works fine, but with orange pi lite in cases of hard reset, it sometimes don't boot. I've tried with better power supply (2-3 models of good 3A 5v power supply), with no luck. Now finally see something happened while they is connected to a monitor and i don't know if this is the issue that happen when it don't boot, but i see that the system is frozen while systemd is showing " set cpufreq kernel... etc.." maybe this confirm what i've readed on other posts, there is something that don't works well between the voltage regulator and the cpu governor.
 
Someone can confirm this issue? Everytime i've readed about this issue on orange pi one topics, is this the same issue discovered on orange pi lite? Someone can post me a link or a guide or a discussion where i can find something to fix this (and or others) problem, and if the fix exists, someone know if it is resolutive? This issue affect also the orange pi zero?
 
Thanks for your attention and many thanks to anyone is working with those boards :) i hope to solve any problem and start to use the lite/zero boards for my projects, otherwise i have to wait for C.H.I.P.

Link to comment
Share on other sites

When doing some tests I noticed that depending on power cable used H3 SoC temperature differed. Just an assumption: But VDD_CPUX depends on input voltage.

 

Are you able to measure this on the test points? I fear undervoltage problems. It would also be worth a try to adjust this line in fex file to 816 MHz and then report back (requires the use of bin2fex/fex2bin)

Link to comment
Share on other sites

thanks for your answer.

 

I'm able to measure everything is necessary, but the problem is that those boards must be placed (when they aren't on my desk) on remote rural country, powered by energy that is totally uncontrolled and unstable day by day... So if the lite/zero is affected by power stability errors, i need to fix with something that works in absence of power stability.

 

Can i change the fex file to have everything ever undervolted and underclocked to improve stability over clock and speed?

 

Another solution could be adding some panic countdown on boot phase to force the board to reboot when there is some issue. I've placed kernel.panic = 3 in /etc/sysctl.conf and a watchdog, but it works when the system is just running, not on boot phase

 

in the meanwhile, i'm reading about the sunxi tools to manage the fex file. This is the first time i manage the boot process on an arm device, do it will write the bin file placed on boot/bin or somewhere on boot partition, or it is like a firmware that must be flashed on every device i want to use?

Link to comment
Share on other sites

ok, i've converted orangepilite.bin from /boot/bin/orangepilite.bin in fex file by bin2fex, then i modified the row you suggested and reconverted it with fex2bin, and finally i've substituted the original orangepilite.bin with this modified one. 

 

Setting more secure fex file, for example limiting the clock speed to 816Mhz as max frequency (what here is called max_freq and LV1_freq) disabling the ability to scale up to 1.1V over the LV3_freq and volt could solve the voltage regulation issue?

Link to comment
Share on other sites

Some background info: http://linux-sunxi.org/Orange_Pi_One#Compatibility

 

On the small H3 boards voltage can only be adjusted between 1.1V and 1.3V. It might be possible that this voltage somehow depends on input voltage at least with NanoPi NEO I believe I had different thermal readouts based on different input voltages (crappy power cable vs. good one).

 

In the so called fex file one can define a few dvfs operation points but essentially only two are needed on these devices and the only interesting one is the one that defines at which cpufreq to switch from 1.1V to 1.3V instead. By default we do this above 912 MHz but maybe that's too high already which is why I suggested changing that to 816 MHz instead.

 

There should be a test point nearby U53, would be great if you can check voltage available. By adjusting /etc/default/cpufrequtils and restarting the latter you can specify minimal and maximum clockspeeds. And installing RPi-Monitor is also a great idea (sudo armbianmonitor -u) since I added functionality to also monitor VDD_CPUX voltage. So you could increment/decrement maximum cpufreq and run sysbench each time to see when instabilities occur and especially with which cpufreq/voltage combination)

Link to comment
Share on other sites

i've just read many docs after your first answer, included this. As i see, i can specify how many scaling step i want, but i have to choose between only 1100v and 1300v for voltage, so i start with some test in these days. I have to take some power supply in which i can regulate the voltage/power to simulate some crazy power issue to see if the pilite stop to works. Maybe first i've to set the cpu governor to a fixed speed to be sure to test the issues with the chosed frequency. So is a bit hard to test i think, but i have to try. My question is: if i don't need much speed, i don't need a monitor and the clockspeed can stay ever down at minumum (even at boot) is more safer and maybe could be a definitive solution put the board at a minimum clockspeed (500~800mhz) @ 1100v and disable the option to scale up the speed and the voltage (even on boot)?

 

in addition, do you notice if the dram clock could cause problems like that and i have to underclock the ram also?

 

by the way. I don't think that the power supply/cable was wrong when i use the orange on site (because i've tried many, industrial grade power supply also included), but i'm sure that the energetic company that serve the zone have daily issues

Link to comment
Share on other sites

Well, there's our h3consumption tool. And if mains power is an issue you could use a power bank in between but have to take care that you get one that is able to charge while providing power to a device. And also important when running in a remote location: Power bank should start to provide power without the need to hold a button first.

 

Just install RPi-Monitor as suggested and adjust stuff with h3consumption. And report back please. I'm thinking about switching back from 912 MHz to 816 MHz since some time. Wanted to discuss that first but without some measurements and reports it's quite useless to discuss this stuff :)

Link to comment
Share on other sites

ok for today i stop to test stuff with the opi lite. Next days i will post results as you tell me. I will try with:

 

 

h3consumption -D 132 -c4 -g off -u off -m 816

 

as suggested by you, about opi lite, in other topics, plus the -m 816, to use max 1100v and 816mhz, and maintain active all 4 cores for now 

 

which speed i can select for dram? 132mhz seems too low, can i set any arbitrary value up to 132? i have to respect a multiplier?

 

I've evaluated the option of a power bank, but i think that if i need a power bank, maybe a better option is to use C.H.I.P. that can be used with a battery (and everything needed for power the device by a battery is just included in the board). Now my goal is to have the same stability of the raspberry/opi pc, on the orange pi lite/zero, and to use the C.H.I.P. only where the device must stay up when there is no power or where i can't use read-only file system and i need to shutdown safely the board.

 

Thanks for your help and sorry to everyone for my english

Link to comment
Share on other sites

which speed i can select for dram? 132mhz seems too low, can i set any arbitrary value up to 132? i have to respect a multiplier?

 

Shouldn't matter at all so you can stay with Armbian defaults. I have a couple of H3 boards with DRAM clocked down to 132 MHz but this is more or less to minimize consumption. If it's about stability as in your case you still should just try to isolate undervoltage problems (modify fex file --> 1.1V to 1.3V switch).

 

Since your application is not that performance critical I would downclock DRAM to 408 MHz (above it's 24 MHz steps, below 12 MHz)

Link to comment
Share on other sites

Nope, just do some testing and please report back. We're particularly interested whether exchanging 912 MHz with 816 MHz in fex file makes a difference for you (since I wasted many many hours of my live but found none on all those H3 devices that use the more primitive voltage regulator)

Link to comment
Share on other sites

Hello,

 

i have got the same problem with an orange pi one.

 

The difference is, that I am using a stable power supply. It is a own built power supply with backup battery and a stable "step upped" power at 5.2 Volt. A RPi 3 which cunsumed more power ran stable on even this place and supply for month.

 

The OPi one is installed 2-3 weeks ago, making the same tasks and crashing every 2-3 days.

 

The tasks: reading 3x 1W dallas temp sensors every 20 seconds and writing the values with wget into an other system.

It is reading around 10 values from a power meter every 5 seconds. This is connected via RS-485 and an USB adapter (same adapter used on RPi 3 before). After 4 times it adds these values and dividides it through 4 to get the average values even after 20 seconds. after it it writes these 10 values via wget into the external data logger.

these jobs are run via cronjob every minute

every 20 minutes a python script recieves the german dcf77 time signal, this procedure takes 2-3 minutes and after that the correct time is written into an external rtc connected via i2c.

it also serves as ntp.

 

via a sh script started at boot the whatchdog is triggered every 3 seconds. it cats a > /dev/watchdog

 

sometimes the opi only reboots (perhaps triggered by watchdog) but other times the OPi crashes totally. The LED behaviour is standard, so the green led is switched on on boot and blinks a few times when booted and gets solid after a few seconds of booting.

 

But when the pi is crashed the green led is off and the LAN leds are also solid (both) and not blinking even on network traffic.

 

My Settings are:

 

root@orangepione:~# uname -a
Linux orangepione 3.4.112-sun8i #10 SMP PREEMPT Sun Oct 23 16:06:55 CEST 2016 armv7l GNU/Linux
root@orangepione:~# h3consumption -p
Active settings:

cpu       912 mhz allowed, 1200 mhz possible, 2 cores active

dram      132 mhz

hdmi/gpu  off

usb ports active

eth0      100Mb/s/Full, Link: yes
 

For this task the speed reaches fully and consumption is really low.

 

How to figure out whats wrong with the OPi???

 

Thanks in advance

 

Dirk

 

EDIT: I watched a very interesting thing:

I am also logging the cpu temp of the OPi, and there seems to a temperature peak every day around 6:50 and the crashes all 3 days seem to happen also around 6:50.

 

Is there any factory based task in armbian at this time (6:50 german winter time)???

Link to comment
Share on other sites

Maybe doing more monitoring will help. It's just a 'sudo armbianmonitor -r' to get a full RPi-Monitor installation. And you could check cron.d tasks what's happening every night (you didn't tell which distro you use, please also check my signature)

Link to comment
Share on other sites

@tkaiser

 

i am now installing armbianmonitor. perhaps i can see something happen before the crash.

crontab -e (as root) tells me only exact my 5 tasks i created by my own. Perhaps there are other crontabs in other userspaces?

 

I am using armbian server 5.20 upgraded to 5.23 for the orangepi one.

 

@dottgonzo

with usb adapter also?

I ordered a rs-485 to uart adapter from china to disable the usb port but this adapter is not here... i will test to switch from usb to uart also to save a little bit of power.

Link to comment
Share on other sites

Perhaps there are other crontabs in other userspaces?

 

I am using armbian server 5.20 upgraded to 5.23 for the orangepi.

Well, it's either Debian Jessie or Ubuntu Xenial Xerus. And yes, there are other cron jobs running, depends on the distro you use. Since freezes happen when board is busy I would assume you're also running in undervoltage issues. You could lower cpufreq to 816 MHz and see whether that makes the difference.

 

Also it would be great if you can measure VDD_CPUX voltage on the test point and report back.

Link to comment
Share on other sites

@tkaiser

 

well done

 

h3consumption -p prints now

 

root@orangepione:~# h3consumption -p
Active settings:
 
cpu       816 mhz allowed, 1200 mhz possible, 2 cores active
 
dram      132 mhz
 
hdmi/gpu  off
 
usb ports active
 
eth0      100Mb/s/Full, Link: yes
 
I found a TP - The TP1 next to the Connector 1 driving at 1125mv - So it should be ok if it is the correct TP
 
edit:
sorry mate, forgot to tell, running Jessie, the current downloadable distro. Downloaded and upgraded 2-3 weeks ago....
 
where can I look for other cronjobs?
Link to comment
Share on other sites

I found a TP - The TP1 next to the Connector 1 driving at 1125mv - So it should be ok if it is the correct TP

I had some hope a voltage lower than 1100mV would be reported. Please leave cron jobs as they are now and report back in 2 weeks or if nothing will change. Maybe we should lower the operating point where we switch from 1.1V to 1.3V on the small H3 boards since it seems a few boards get in trouble. But to do this correctly we would also have to test through throttling settings which is time consuming. @zador: what do you think?

Link to comment
Share on other sites

okay, i will test with 816mhz and will report in a few days if it is going right or will crash again.

 

I will not change any crons, but for my information, where can i look for further jobs running in this armbian jessie distro? The weird thing is really, it is at least the 3rd time when i crashed around 6:50 o clock.

 

when cpu crashes because of undervolting, shouldnt it be reset through the watchdog? I am starting to trigger the watchdog in boot process and from this point every three seconds.

 

sorry, i told wrong, i am echo-ing an "a" > /dev/watchdog...

when I stop this echoing the OPi reboots within 6 seconds

 

Or is there some overflow bechause the echo a > /dev/watchdog?

 

I am using exactly the same procedure and scripts in my RPi 3 - it is fully stable.

 

Thank you for support ;)

Link to comment
Share on other sites

when cpu crashes because of undervolting, shouldnt it be reset through the watchdog? I am starting to trigger the watchdog in boot process and from this point every three seconds.

If hardware watchdog is correctly set up, board should be reset on kernel crash, but undervolting is another issue (still not confirmed though)

 

sorry, i told wrong, i am echo-ing an "a" > /dev/watchdog...

when I stop this echoing the OPi reboots within 6 seconds

 

Or is there some overflow bechause the echo a > /dev/watchdog?

Watchdog should be reset from systemd or tools from "watchdog" package

 

I will not change any crons, but for my information, where can i look for further jobs running in this armbian jessie distro? The weird thing is really, it is at least the 3rd time when i crashed around 6:50 o clock.

System logs (if they are written to the card before the crash)? I would also guess that there are APT periodic tasks enabled by default, which includes updating packages list and installing security related updates if there are any.

Link to comment
Share on other sites

@chaoswk: I can confim the exakt same peak of activity arount 7 in the morning. Even the cpu temperature rises.

Certainly updates or something.

In this context I consider your watchdog practice very unsafe: If the computer is busy, your script will lag a little,

and --pouf-- bye,bye.

Please try to install and configure watchdog package, or leave this out for some time.

best, gnasch

Link to comment
Share on other sites

@chaoswk: I can confim the exakt same peak of activity arount 7 in the morning. Even the cpu temperature rises.

Certainly updates or something.

In this context I consider your watchdog practice very unsafe: If the computer is busy, your script will lag a little,

and --pouf-- bye,bye.

Please try to install and configure watchdog package, or leave this out for some time.

best, gnasch

 

Or disable "updates or something". (try journalctl | grep apt)

Link to comment
Share on other sites

hello,

 

as told I'll leave all as it is, too many changes will result in unclear results ;)

Only things I did is now set lower max mhz (816 instead of 912) and installed armbianmonitor.

 

Today I observed CPU temp and Load and as you can clearly see in the armbianmonitor graphs (attached image) cpu load rises to 50% from 6:51 to 6:54. As I set CPU to 2 cores this may mean, that 1 core is fully busy for ca. 180 seconds.

 

Journalctl tells me nothing strange from 6:50 to 6:52.

 

 

post-3822-0-77157200-1480318156_thumb.png

Link to comment
Share on other sites

Today I observed CPU temp and Load and as you can clearly see in the armbianmonitor graphs (attached image) cpu load rises to 50% from 6:51 to 6:54. As I set CPU to 2 cores this may mean, that 1 core is fully busy for ca. 180 seconds.

 

Which is laughable in other words. :)

 

We tortured those H3 gems with cpuburn-a7 over days and they can cope with that.

 

So back to nailing the problem down: Please eliminate this watchdog thingie and use sysbench. Once with max cpufreq set to 816 MHz and then with 912 MHz allowed. If your board freezes at 912 MHz then IMO it's obvious that we're talking about undervoltage here (since 912 MHz @ 1.1V are too low for your specific H3 SoC or maybe there's something other going on on your board which is something only you could tell by measuring again TP1 while sysbench is running -- so better test this while still at 816 MHz).

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines