Jump to content

ARISC Errors have returned


jungle_roger

Recommended Posts

Hi,

I noticed today that I'm getting those ARISC Errors again.

I'm running the 5.05 server version which I ran apt-get upgrade on. I got the same when I tested out the new 5.10 image aswell. (Upgrading to 5.10 3.4.112 or using the new image wont let me compile the 'make scripts' either)


[ 1424.305736] [ARISC ERROR] :message process error
[ 1424.305763] [ARISC ERROR] :message addr   : f004b840
[ 1424.305778] [ARISC ERROR] :message state  : 5
[ 1424.305792] [ARISC ERROR] :message attr   : 2
[ 1424.305805] [ARISC ERROR] :message type   : 30
[ 1424.305818] [ARISC ERROR] :message result : ff
[ 1424.305831] [ARISC WARING] :callback not install
[ 1424.305849] [cpu_freq] ERR:set cpu frequency to 648MHz failed!

Is this something I can fix myself or does it need to be done when compiling the Kernel??

 

Thanks

Jungle

Link to comment
Share on other sites

I've experienced ARISC errors on Armbian_5.10 server for OPI ONE. The culprit was a faulty script.bin setting.

 

Fix for OPI ONE :

 

Login as root

 

cp /boot/bin/orangepione.bin /boot/bin/orangepione.bin.old

bin2fex /boot/bin/orangepione.bin /boot/bin/orangepione.fex

nano /boot/bin/orangepione.fex

------------------------
[dvfs_table]
pmuic_type = 1
pmu_gpio0 = port:PL06<1><1><2><1>
pmu_level0 = 11300
pmu_level1 = 1100
max_freq = 1200000000
min_freq = 648000000
LV_count = 2
LV1_freq = 1200000000
LV1_volt = 1300
LV2_freq = 648000000
LV2_volt = 1100
------------------------

fex2bin /boot/bin/orangepione.fex /boot/bin/orangepione.bin

 

 

et voilà !   ( settings valid only for OPI ONE )

Link to comment
Share on other sites

see previous ( 0.23567 seconds earlier ) post ;)

 

Nope, see the real solution. My assumptions back then when I defined initial dvfs settings were wrong (maybe also caused by a leading zero as now again). So a way better idea is to adopt zador's dvfs table that will be the 'fix' later since this allows lesser clockspeeds when not needed, leads to a cooler board and also more performance (when used without heatsink or in a small enclosure). The 2 relevant changes are:

  • 480 MHz min instead of 648 MHz
  • switch between 1.1V and 1.3V at 816 MHz and not 648 MHz as before
Link to comment
Share on other sites

 

Nope, see the real solution. My assumptions back then when I defined initial dvfs settings were wrong (maybe also caused by a leading zero as now again). So a way better idea is to adopt zador's dvfs table that will be the 'fix' later since this allows lesser clockspeeds when not needed, leads to a cooler board and also more performance (when used without heatsink or in a small enclosure). The 2 relevant changes are:

  • 480 MHz min instead of 648 MHz
  • switch between 1.1V and 1.3V at 816 MHz and not 648 MHz as before

 

 

 

I've actually tested your proposed solution on OPI ONE and still got ARISC errors.

 

I then tested the fix I posted on OPI ONE and got NO ARISC errors. 

 

Have you actually tested your solution on OPI ONE ?

Link to comment
Share on other sites

I then tested the fix I posted on OPI ONE and got NO ARISC errors. 

 

The 'fix you posted' are simply the older (and wrong) settings I came up with some months ago when I added support for Orange Pi One. Nope, I have not tested myself since my Orange Pi One arrived today at zador's home. And I've absolutely no doubt that if he says the new settings he came up with are ok, that they are ok. Next update will fix this.

Link to comment
Share on other sites

@zador @tkaiser

 

I've tested different dvfs-settings ( 2 - 5 levels ) for OPI ONE  ( HDMI graphics session, remote graphical x2go-session with youtube-video running in iceweasel, htop average load 2.5 ).

 

The following settings produced the lowest temperatures under comparable loads

 

/etc/default/cpufrequtils
------------------------
ENABLE=true
MIN_SPEED=480000
MAX_SPEED=1200000
GOVERNOR=interactive
------------------------

 

/boot/bin/orangepione.fex
------------------------
[dvfs_table]
pmuic_type = 1
pmu_gpio0 = port:PL06<1><1><2><1>
pmu_level0 = 11300
pmu_level1 = 1100
max_freq = 1200000000
min_freq = 480000000
LV_count = 3
LV1_freq = 1200000000
LV1_volt = 1300
LV2_freq = 1008000000
LV2_volt = 1100
LV3_freq = 480000000
LV3_volt = 1100
------------------------

 

cpufreq-info
 

--------------------

.....

analyzing CPU 3:
  driver: cpufreq-sunxi
  CPUs which run at the same hardware frequency: 0 1 2 3
  CPUs which need to have their frequency coordinated by software: 0 1 2 3
  maximum transition latency: 2.00 ms.
  hardware limits: 480 MHz - 1.20 GHz
  available cpufreq governors: conservative, ondemand, userspace, powersave, interactive, performance
  current policy: frequency should be within 480 MHz and 1.20 GHz.
                  The governor "interactive" may decide which speed to use
                  within this range.
  current CPU frequency is 480 MHz.
  cpufreq stats: 60.0 MHz:0.00%, 120 MHz:0.00%, 240 MHz:0.00%, 312 MHz:0.00%, 408 MHz:0.00%, 480 MHz:66.91%, 504 MHz:0.02%, 600 MHz:0.08%, 648 MHz:0.00%, 720 MHz:0.01%, 816 MHz:0.00%, 912 MHz:0.00%, 1.01 GHz:20.89%, 1.10 GHz:0.38%, 1.20 GHz:11.71%, 1.30 GHz:0.00%, 1.34 GHz:0.00%, 1.44 GHz:0.00%, 1.54 GHz:0.00%  (2783)
--------------------

 

There were no ARISC messages with these settings.

Link to comment
Share on other sites

HDMI graphics session, remote graphical x2go-session with youtube-video running in iceweasel, htop average load 2.5

 

Unfortunately this pretty lightweight workload isn't relevant when trying to push hardware limits. I for example have one Orange Pi PC that runs with these settings stable even when torturing it: http://linux-sunxi.org/User:Tkaiser#Headless_without_connected_USB_peripherals_and_4.5V_DC-IN

 

The very same settings lead to bit flips on a colleague's Orange Pi PC and are therefore not useable as general settings. So while your settings might work for you we have to take into account that other devices behave different and always leave some safety headroom.

 

Regarding torturing: At least this procedure has to work flawlessly to be sure that your settings work for your device: http://linux-sunxi.org/Hardware_Reliability_Tests#CPU

 

Bit flips are nasty especially when they only occur under special circumstances. So we have to take sane defaults with higher voltages at specific dvfs operating points than those we tested to work reliably on our own devices.

Link to comment
Share on other sites

@zotac

 

The new default Armbian dfvs_table looks perfectly fine as I've already successfully tested these settings.

 

@tkaiser

 

Your use cases of "torturing poor little boards" of course merit safer default settings. So far I've not seen any problems in running typical target loads with the simpler settings for OPI ONE ( migth be different for OPI PC ). Thanks for the links, I'm tempted to start a little bonfire with a spare board. ;)

 

 

Thank you both for your excellent work in converting OrangePi bricks into something useful.

Link to comment
Share on other sites

Ah that explains my blistering hot One ;) after upgrade indeed, it's ok now with the 2 LV_count fix from Rodolfo, thanks !

So when will it be automatically rolled out when I want to upgrade another OPiO ? How does that work ?

Link to comment
Share on other sites

@Set3

 

Glad to hear your Opi One stopped burning. The "fix" is actually just adjusting minimum frequency and reverting to previous ( tkaiser ) settings. The next Armbian release will feature a generally optimized dvfs_table for OPI ONE. It should be fixed with your next upgrade. Search for "script.bin", "fex" on this or the the sunxi.org site for further info on board definitions.

Link to comment
Share on other sites

Ah that explains my blistering hot One ;) after upgrade indeed, it's ok now with the 2 LV_count fix from Rodolfo, thanks !

 

C'mon guys. The whole issue is exactly none. Idle temperature increases by 10°C, temperatures under load aren't affected at all, a few error messages were thrown out. That's all. And people start to get crazy and spread bullshit like 'burning boards'. Really annoying!

 

And you didn't fix anything you just reverted back to before the update and now use wrong settings since they base on (my) wrong assumptions made a few months ago. The new settings you could've adopted instead will show superiour behaviour. Stop whining now, use your boards as you did before, the next update will fix it.

Link to comment
Share on other sites

I've retested different settings for [dvfs_table] and actually noticed no different behaviour for 2 or 5 levels. There are only two voltages anyway, so the min, max and voltage change frequencies are all that matter. I've put some more load on the running system and tested the impact of the governor.

 

By default. the governor is set to "interactive", a supposedly optimized "ondemand" governor adjusting frequency levels to loads. The "interactive" governor running on the OPI ONE behaved more like "performance", hardly ever switching from max.frequency and producing rather high cpu-temperatures ( no drama at all, I was just joking about the towering inferno boards , mea culpa, mea maxima culpa .....).

 

A very noticeable change happened when switching to "ondemand" governor. Suddenly, the whole frequency range was being used, temperature dropped by 9°C and current draw was reduced by 50mA.

 

With previously mentioned ( or soon to be released in Armbian ) fex-settings and a small heatsink on the OPI ONE  the board behaves very nicely under the testing load ( x2go-server, flash dongle file-server with 2 parallel 4G scp network file-copies wireless and wired , modest browsing in iceweasel )

 

temperatures 39 (idle) , 59 (typical load) and 79 ( lots of stress ) when running governor "interactive"

 

temperatures 39 (idle) , 50 (typical load) and 77 ( lots of stress ) when running governor "ondemand"

 

To test governor "ondemand" :

 

sudo echo ondemand>/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

 

To revert to governor "interactive" :

 

sudo echo interactive>/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

 

To configure default governor "ondemand" :

 

sudo nano /etc/default/cpufrequtils
------------------------
ENABLE=true
MIN_SPEED=480000        
MAX_SPEED=1200000
GOVERNOR=ondemand             
------------------------

 

To check board temperature and current frequency :

 

sudo watch cat /sys/class/thermal/thermal_zone0/temp

 

sudo watch -n .1 cpufreq-info -w

Link to comment
Share on other sites

I've retested different settings for [dvfs_table] and actually noticed no different behaviour for 2 or 5 levels. There are only two voltages anyway, so the min, max and voltage change frequencies are all that matter.

 

Sure, adding more dvfs operating points when only 2 voltages are available is a bit pointless. Maybe we revert back to just 2 (but still using 816 MHz to switch between voltages and not 648 MHz as before). Since you were able to use 1008 MHz with 1.1V and others also reported they get 1200 MHz stable with 1.1V I already thought about providing a script to automatically test this out.

 

Main problem: users really love to fool themselves and reliably/really testing higher clockspeeds using cpuburn-a7 and cpufreq-ljt-stress-test (or Linpack -- see below) would require already a small fan since otherwise throttling might occur. Therefore such a script has to monitor /sys/devices/system/cpu/cpu0/cpufreq/stats/total_trans to be able to confirm whether throttling occured or not and to inform the testing user that his setup is inappropriate.

 

When we optimised dvfs operating points for A64 back in March we found a specific Linpack benchmark as the perfect tool to detect undervoltage (since board does not already crash but the Linpack bench throws errors since it detects data corruptions)

 

So we would need a kernel able to use also 1248, 1152, 1104, 1056, 960, 912 and 864 MHz as cpufreq steps (since we want to check reliability as fine graded as possible) and a script that 

  • installs Linpack
  • patches the fex file so that one time only 1.1V and the other time only 1.3V will be used
  • restarts the board automatically after adding itself to /etc/rc.local
  • then starting reliability testing with 816 MHz clockspeed using Linpack and increasing clockspeed by 48 MHz incrementally until Linpack fails or /sys/devices/system/cpu/cpu0/cpufreq/stats/total_trans shows throttling
  • then exchanging script.bin (now using 1.3V) using this specific clockspeed where 1.1V doesn't work reliably any more and testing through remaining cpufreq operating points until Linpack fails or throttling occurs

If throttling occurs the user has to be informed that it's necessary to ensure appropriate heat dissipation (using heatsink and most probably an additional fan) since it's really necessary to test through the clockspeeds under full load.

 

As a result you get 

  • the ideal cpufreq to switch between 1.1V and 1.3V
  • the maximum cpufreq useable with just 1.3V (I would suspect most Orange Pi One/Lite and NanoPi M1 out there would be able to run reliably with 1248 or even 1296 MHz)

And then it's up to the user. If he thinks data corruption and bit flips are nice then he can use these values otherwise subtracting 48 MHz from both dvfs operating points is the smarter move. But this way we could provide a mechanism every user could check his specific board for the real hardware limits (the VDD_CPUX voltage accuracy might differ between different boards and of course H3's ability to run reliably at different dvfs operating points).

 

The same might be possible with the other H3 boards that use the fine graded programmable SY8106A voltage regulator to generate a board specific 100% optimal dvfs table. And maybe switching between 1.1V and 1.3V on the One is possible by writing to registers without exchanging script.bin and a reboot.

 

Regarding your observations regarding cpufreq governor: 2 years ago I also preferred ondemand (on A20 boards) but only with the necessary settings regarding IO activity. In the meantime I came to the conclusion that interactive works better. But maybe different BSP kernels behave differently here (sun7i vs. sun8i for example). But IMO it's impossible to draw any conclusions without setting up real monitoring before. Easy with Armbian since it takes only 5-8 minutes to get RPi-Monitor installed.

Link to comment
Share on other sites

@tkaiser

 

Thanks for detailling the rationale for H3 optimizations. IMHO the OPI ONE/LITE merit simple foolproof settings appropriate to running the boards within safe limits. A well-performing cheap little powerhouse without overclocking, overvolting, throttling, fans... running on stable Armbian.

 

OPI ONE / (and probably LITE from the specs) run well with 2 dvfs operating points. Switching between 1.1V and 1,3V at 816 MHz ( lower than tested 1008 MHz ) should embrace any board quality. As I've mentioned before, using governor "ondemand" instead of "interactive" seems appropriate for OPI ONE when testing with typical and even more when testing with higher loads.

 

Boards with a more sophisticated voltage/frequency control definitely merit appropriate testing/monitoring/optimization tools. For OPI ONE/LITE it is probably an overkill. By keeping it simple and using the simple boards well within their limits, the true power of these tiny jewels is unleashed : pleasing performance at a ridiculous price.

Link to comment
Share on other sites

For OPI ONE/LITE it is probably an overkill.

 

Quite the opposite. These boards (NanoPi M1 as well) would benefit the most from such a reliability testing allowing them to both exceed 1200 MHz as maximum cpufreq and enter 1.1V state earlier. On passively cooled H3s this might lead to a performance increase by probably 25% under load. The only problem with this 'auto tuning' approach might be the user in question not willing to understand that he temporarely has to increase heat dissipation to test through all available dvfs operating points even if he does not want to use a fan in normal operation mode (for example have a look what I had to do to test out higher cpufreqs with A64 -- a simple fan should suffice to test H3 with heatsink and up to 1296 MHz).

 

Adding few more cpufreq operating points shouldn't hurt at all so I opened a Github issue to discuss adding more with the other devs (based on experiences with Pine64 this might already increase full load performance by ~5% absolutely for free since throttling starts to work better).

 

BTW: I also disagree regarding your cpu governor findings :)

Link to comment
Share on other sites

Adding few more cpufreq operating points shouldn't hurt at all so I opened a Github issue to discuss adding more with the other devs (based on experiences with Pine64 this might already increase full load performance by ~5% absolutely for free since throttling starts to work better).

 

BTW: I also disagree regarding your cpu governor findings :)

 

Adding few more cpufreq operating points sounds great. Governor "ondemand" used more frequencies than "interactive" when tested, leading to cooler running and less current draw. :)

Link to comment
Share on other sites

Governor "ondemand" used more frequencies than "interactive" when tested, leading to cooler running and less current draw. :)

 

Unfortunately this is one of the many areas where tests and real world situations might differ. The problem with ondemand is that it takes some time to adjust the clockspeeds in both directions so while the cpufreq driver is still busy deciding whether to increase clockspeed or not with interactive you would've been already at maximum cpufreq (and the specific task maybe already done). Same applies to clocking down.

 

And again: When using ondemand it's important to set io_is_busy to 1 otherwise IO intensive tasks will be way slower and it's also very important to set minimum cpufreq to a rather high value otherwise the system will behave always sluggish since crawling from lowest cpufreq to the highest takes too much time.

 

Please see also the 'race to idle' concept mentioned here: http://linux-sunxi.org/Cpufreq#The_.22performance.22_governor

Link to comment
Share on other sites

I do not know why , but after applying the update YESTERDAY  temperature increase in several graduses, nothing other changed ... (only  aptitude upgrade... )

strange ? or ?

 

No, quite normal since I introduced a bug with 5.10 -- I moved your post from the 'Security Alert' thread into this thread so you can follow the manual fix instructions as outlined here and get also a lot of basic knowledge for free above. Issue will be fixed with next Armbian release.

Link to comment
Share on other sites

Ok, I have decided to start my orange pi one story with armbian.. and the latest version of it, is not working due to this cpu freq. looping issues.

 

Luckily, I was able to find previous version (5.05) and its running ideally with this board (thanks to developers). So, waiting for an official fix included to the latest distro.

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines