Jump to content

Recommended Posts

Posted

this is somewhat 'off-topic' but still relevant to 'orange pi zero 3'

 

If Orange Pi Zero 3 is operated in warm climates (e.g. room temperature 30 deg C etc) , it can at times run up to like 60 deg C.
this is in open still air 

chart.thumb.png.cdcd1054f44282a0edf7be7ee67a48ee.png

 

adding a fan blowing at it reduce that by some 20 deg C to 40 deg C !

And this is my ghetto fan setup, no fancy case, no heatsink nothing, just a single long machine screw that lifts it up :)

 

checking temperatures is easy

> armbianmonitor -m
Stop monitoring using [ctrl]-[c]
Time        CPU    load %cpu %sys %usr %nice %io %irq   Tcpu  C.St.

18:03:39   480 MHz  0.00   0%   0%   0%   0%   0%   0%  40.8 °C  0/7^C


strictly speaking, 60 deg C is 'nothing to scream about' , I've a Rpi 4 hitting up 80 deg C and it throttles.
similarly use a fan blowing at it + a heat sink over the cpu, drastically reduce running temperatures.

for 'occasional' use, I don't think it is necessary to have a fan blowing at the Orange Pi Zero 3.

I think it is feasible to run at lower temperatures if I disable and unclock the GPU and HDMI, but for now I'm not sure how to go about doing that.
Initially, I'm thinking maybe the wifi is causing it, but now I don't think so, it is moderately likely the gpu is heating it up a bit.
And still air don't seem to dissipate heat very well.




 

fan.jpg

Posted

just like to say that the recent images

works just well

    _             _    _                                         _ _        
   /_\  _ _ _ __ | |__(_)__ _ _ _    __ ___ _ __  _ __ _  _ _ _ (_) |_ _  _ 
  / _ \| '_| '  \| '_ \ / _` | ' \  / _/ _ \ '  \| '  \ || | ' \| |  _| || |
 /_/ \_\_| |_|_|_|_.__/_\__,_|_||_|_\__\___/_|_|_|_|_|_\_,_|_||_|_|\__|\_, |
                                 |___|                                 |__/ 
 v25.8 rolling for Orange Pi Zero3 running Armbian Linux 6.12.35-current-sunxi64

 Packages:     Debian stable (bookworm)
 Support:      for advanced users (rolling release)
 IPv4:         (LAN) xxx.xxx.xxx.xxx (WAN) yyy.yyy.yyy.yyy
 IPv6:         fd00:xxxx:xxxx::xxxx:xxxx (WAN) xxxx:xxxx::yyyy:yyyy
 WiFi AP:      SSID: (ssid), 

 Performance:  

 Load:         2%                Uptime:       3:50
 Memory usage: 4% of 3.83G  
 CPU temp:     41°C              Usage of /:   3% of 58G    
 RX today:     7 MiB        
 Commands: 

 Configuration : armbian-config
 Monitoring    : htop


 

Posted
On 7/12/2025 at 3:27 AM, ag123 said:

and a recent 'success story'

 

My story is definitely NOT a success story :(

I see the media.patches in the cache folder, I compile armbian edge, but the image didn't contain the cedrus+v4l2 kernel modules I need for decoding acceleration

Posted
Quote

* mpv plays most mp4s VERY SMOOTHLY  BUT WITH 100% CPU

oops, I missed reading that 100% cpu, but it is ok it is a a53 after all 😅

 

videos I'd guess is still 'difficult' on z3, accordingly there is some support for gpu vector graphics but I'd guess mostly just triangles.

video decoding can be done with just neon (vector computation) , but i'd guess there is still limited access to video decoding hardware.

using neon is likely to give that 100% cpu reading as the cpu is busy literally, using real video hardware would be 'invisible' in a sense, the cpu usage may look low but that one won't see that the video hardware itself may after all be reading 100%.

 

Posted (edited)
12 hours ago, ag123 said:

oops, I missed reading that 100% cpu, but it is ok it is a a53 after all 😅

 

videos I'd guess is still 'difficult' on z3, accordingly there is some support for gpu vector graphics but I'd guess mostly just triangles.

video decoding can be done with just neon (vector computation) , but i'd guess there is still limited access to video decoding hardware.

using neon is likely to give that 100% cpu reading as the cpu is busy literally, using real video hardware would be 'invisible' in a sense, the cpu usage may look low but that one won't see that the video hardware itself may after all be reading 100%.

 

Thank you for replying.

 

I run xscreensavers-gl in a window and it always gets 30FPS with <10% CPU in my opiz3, I even show 3D models in F3D... so I am getting 3D MESA acceleration in HDMI and SPI-LCD displays.

 

If the video decoding in my opiz3 is using ARM NEON instructions, then I am fortunate I have that, at least :) (note: this is possible without needing ffmpeg-v4l2request)

 

I will have to re-check how I was successful with 1080P H264 acceleration last year (I was even getting temporary glitches and pink hues sometimes).

Edited by robertoj
Posted

Arm Neon is quite a thing, SIMD

https://developer.arm.com/documentation/102159/latest/

https://github.com/thenifty/neon-guide

and accordingly aarch64 (e.g. Cortex A53, A55, A72, A75, A76 etc etc i.e. arm V8a onwards have them)

https://developer.arm.com/documentation/102474/0100/Fundamentals-of-Armv8-Neon-technology

 

the H618 is an A53 and hence should have it.

 

it is a good 'replacement' for proprietary hardware etc as this like Intel's sse, avx , simd are defiined and standardized by Arm.

Hence, they'd work if programs are coded and compiled to use them. Accordiingly, the pripietary video hardware is still undocumented (at least not publicly accessible), and most of that works are reverse engineered and incomplete.

 

apps written to use Neon SIMD would however 'just works' and accelerated by virtue that it is SIMD.

 

 

Posted

ok we have a cheap SBC Z3 H618, but we'd still want to run it as like a supercomputer

https://linux-sunxi.org/Benchmarks#Linpack

 

download

https://www.netlib.org/benchmark/linpackc.new

save as linpack.c

 

makefiile

all: linpack-noopt linpack-o3

linpack-noopt: linpack.c
        gcc -o $@ $^

linpack-o3: linpack.c
        gcc -O3 -o $@ $^ -lm -mcpu=cortex-a53 -march=armv8-a -ftree-vectorize -funsafe-math-optimizations 

clean: linpack-noopt linpack-o3
        rm $^

.PHONY: all clean

 

ok, for your convenience it is in the attached zip file. to unzip you may need (as sudo):

apt install zip unzip

 

for the compilers you may need

apt install build-essential

 

$ make
gcc -o linpack-noopt linpack.c
gcc -O3 -o linpack-o3 linpack.c -lm -mcpu=cortex-a53 -march=armv8-a -ftree-vectorize -funsafe-math-optimizations 

$ ./linpack-noopt 
Enter array size (q to quit) [200]:  
Memory required:  315K.


LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 200 X 200.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
      32   0.68  88.14%   2.66%   9.20%  71117.671
      64   1.36  88.13%   2.66%   9.21%  71103.230
     128   2.72  88.14%   2.66%   9.20%  71118.447
     256   5.44  88.14%   2.66%   9.20%  71117.368
     512  10.89  88.14%   2.66%   9.20%  71118.505

Enter array size (q to quit) [200]:  q

$ ./linpack-o3 
Enter array size (q to quit) [200]:  
Memory required:  315K.


LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 200 X 200.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
     128   0.53  86.33%   2.89%  10.78%  374433.231
     256   1.05  86.33%   2.88%  10.79%  374573.654
     512   2.10  86.34%   2.88%  10.79%  374443.201
    1024   4.21  86.32%   2.88%  10.80%  374574.751
    2048   8.42  86.32%   2.88%  10.80%  374612.768
    4096  16.83  86.33%   2.88%  10.79%  374574.926

Enter array size (q to quit) [200]:  q

 

This is single core benchmark, apparently gcc -o3 does Neon SIMD

 

 

 

 

linpack.zip

Posted (edited)

For those who are seeking to enable video decode i've been able to get it working with those libs 

 

https://www.elektroda.pl/rtvforum/topic4018092.html#20840047

 

download h618_hwdec.tar.gz and replace the libs and add "extraargs=cma=256M" to your /boot/armbianEnv.txt , mpv works flawlessly with --hwdec=drm --profile=fast, scrcpy is also fast and with minimal cpu usage...

i'm also using rolling edge kernel, and latest 25.2 mesa from source

 

 

 

 

libva-v4l2-request-HACK_HEVC.zip

mpv.png

Edited by Gabriel Negrisiolo Righi
include info
Posted

@robertoj

Quote

I don't even have a reference viewpoint  what should I start comparing?

 

I read claims that Python3-numpy, python3-opencv are highly optimized, but I never researched HOW OPTIMIZED 

 

I have also heard that DRM can help accelerate machine learning https://www.youtube.com/watch?v=NQz6VqvtehI&t=5m7s

 

well, Neon SIMD isn't just useful for that matrix math, it is useful e.g. as a video decoder/encoder in place of specialized on chip video hardware. it could partially explain the 'better performance' of mpv (https://mpv.io/)

e.g. if mpv is after all built with -o3 or that mpv uses a library that is optimised iwth Neon SIMD, it could likely practically see a performance as the on-chip proprietary video hardware which is not publicly documented.

with an apparent 100% cpu usage if all 4 cpu cores are used with Neon SIMD. 

 

I think I once chanced upon an Rpi forum comment about shifting the codes to Neon SIMD instead instead of using propietary video hardware, partly as these 'small' chips has 'limited' capabilities for on chip video processing etc.

it isn;'t really a bad thing if after all we'd use say an Opi Z3 as a 'dedicated' video streamer.  A thing is at 100% cpu, non compute threads may struggle to get a slot to run at times, it may take setting 'nice' levels so that some threads get a higher priority.

 

I've been thinking about running a (crypto coin) miner on it, probably would do that some time. They certainly don't get close to say even a Haswell, or Ryzen or even a 'low end gpu' but that they are faster than the 'older' 'smaller' chips

for a comparison, the quoted 'old' figures

https://linux-sunxi.org/Benchmarks#Linpack

-mcpu=cortex-a8 -march=armv7-a -mfpu=neon -mfloat-abi=hard -funsafe-math-optimizations -fno-fast-math

Memory required:  315K.


LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 200 X 200.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
      16   0.61  88.52%   6.56%   4.92%  37885.057
      32   1.21  85.12%   2.48%  12.40%  41459.119
      64   2.43  93.83%   2.47%   3.70%  37561.254
     128   4.86  91.77%   2.47%   5.76%  38381.368
     256   9.70  92.06%   2.89%   5.05%  38173.000
     512  19.41  91.29%   2.47%   6.23%  38634.432

mcpu=cortex-a8 -mtune=cortex-a8 -march=armv7-a -mfpu=neon -mfloat-abi=hard -funsafe-math-optimizations -fomit-frame-pointer -ffast-math -funroll-loops -funsafe-loop-optimizations

Memory required:  315K.

LINPACK benchmark, Double precision.
Machine precision:  15 digits.
Array size 200 X 200.
Average rolled and unrolled performance:

    Reps Time(s) DGEFA   DGESL  OVERHEAD    KFLOPS
----------------------------------------------------
      16   0.53  90.57%   1.89%   7.55%  44843.537
      32   1.05  90.48%   3.81%   5.71%  44390.572
      64   2.13  90.14%   2.35%   7.51%  44615.905
     128   4.23  90.54%   3.07%   6.38%  44390.572
     256   8.46  90.19%   2.84%   6.97%  44672.596
     512  17.03  90.55%   2.76%   6.69%  44250.892

 

vs that above is like 8x - 10x improvements on a single core

 

 

 

Posted

tried mining feathercoin,

 

git clone https://github.com/ghostlander/cpuminer-neoscrypt

 

lots of missing dependencies to build that

apt install automake autoconf-archive pkg-config libtool libcurl4-openssl-dev 

 

but once done it is autogen.sh, configure, make

 

next register on https://www.mining-dutch.nl/

then run

./minerd -D --algo=neoscrypt --url=stratum+tcp://mining-dutch.nl:9993 -u username.worker1 -p d=10


 

Spoiler

 

Hash:   020E9F4B68201E05469CC87039286A9EEFAFB6525E9CDDAD40E4DCFC6D950000x0
Target: 0000000000000000000000000000000000000000000000000000000098990100x0
[2025-07-17 20:07:06] thread 2: 537 hashes, 1.097 KH/s
[2025-07-17 20:07:06] accepted: 14/14 (100.000%), 4.398 KH/s (yay!!!)
[2025-07-17 20:07:14] DEBUG (little endian): hash <= target
Hash:   FD8193D5659404573894CC22F8A37859A2347842E382CCF151747A0E34220100x0
Target: 0000000000000000000000000000000000000000000000000000000098990100x0
[2025-07-17 20:07:14] thread 1: 9481 hashes, 1.100 KH/s
[2025-07-17 20:07:14] accepted: 15/15 (100.000%), 4.397 KH/s (yay!!!)
[2025-07-17 20:07:18] DEBUG (little endian): hash <= target
Hash:   9B92D3883EA6CE4BC66BE97CF9CDC6E8C51B0D28B8C3D7F2BA28B8C58D310100x0
Target: 0000000000000000000000000000000000000000000000000000000098990100x0
[2025-07-17 20:07:18] thread 2: 13983 hashes, 1.100 KH/s
[2025-07-17 20:07:18] accepted: 16/16 (100.000%), 4.400 KH/s (yay!!!)
[2025-07-17 20:07:32] DEBUG (little endian): hash <= target
Hash:   3FC00FD320605941571915FC723A4CBB2C9E798552E309C980F333CA76B40000x0
Target: 0000000000000000000000000000000000000000000000000000000098990100x0
[2025-07-17 20:07:32] thread 1: 20346 hashes, 1.101 KH/s
[2025-07-17 20:07:32] accepted: 17/17 (100.000%), 4.401 KH/s (yay!!!)
[2025-07-17 20:07:34] thread 3: 66030 hashes, 1.100 KH/s
[2025-07-17 20:07:36] DEBUG (little endian): hash <= target
Hash:   787E9A8E6CF0D6293659419BA9963ED809F7D606A2AD8E879968A5B075620000x0
Target: 0000000000000000000000000000000000000000000000000000000098990100x0
[2025-07-17 20:07:36] thread 1: 4421 hashes, 1.100 KH/s
[2025-07-17 20:07:36] accepted: 18/18 (100.000%), 4.400 KH/s (yay!!!)
[2025-07-17 20:07:42] DEBUG (little endian): hash <= target
Hash:   27163C9021EAE5D4D5D458F384751990AE336F19BD3AC8D980AA97C55A940000x0
Target: 0000000000000000000000000000000000000000000000000000000098990100x0
[2025-07-17 20:07:42] thread 3: 8654 hashes, 1.101 KH/s
[2025-07-17 20:07:42] accepted: 19/19 (100.000%), 4.400 KH/s (yay!!!)
[2025-07-17 20:08:00] thread 0: 65995 hashes, 1.100 KH/s
[2025-07-17 20:08:18] thread 2: 65977 hashes, 1.100 KH/s
[2025-07-17 20:08:36] thread 1: 65992 hashes, 1.100 KH/s
[2025-07-17 20:08:37] DEBUG (little endian): hash <= target
Hash:   037AD1DB8D7C7FAB0E022F8541BA34D67A7E9B3695B396FAF84C6D47AD950000x0
Target: 0000000000000000000000000000000000000000000000000000000098990100x0
[2025-07-17 20:08:37] thread 0: 40427 hashes, 1.100 KH/s
[2025-07-17 20:08:37] accepted: 20/20 (100.000%), 4.401 KH/s (yay!!!)
[2025-07-17 20:08:39] DEBUG (little endian): hash <= target
Hash:   5FBB0159477A05E1324B6A3D240B89DAEDF34A83A8AA78FE155FA86DE3140100x0
Target: 0000000000000000000000000000000000000000000000000000000098990100x0
[2025-07-17 20:08:39] thread 3: 62568 hashes, 1.101 KH/s
[2025-07-17 20:08:39] accepted: 21/21 (100.000%), 4.401 KH/s (yay!!!)

 

a whopping 1.1 k hash/s on each core, well not very impressivve, but it mines :)

i think this is no Neon SIMD

 

Stop monitoring using [ctrl]-[c]
Time        CPU    load %cpu %sys %usr %nice %io %irq   Tcpu  C.St.

20:22:02  1416 MHz  3.90 100%   0%   0%  99%   0%   0%  53.2 °C  0/7
20:22:07  1416 MHz  3.91 100%   0%   0%  99%   0%   0%  53.2 °C  0/7
20:22:12  1416 MHz  3.92 100%   0%   0%  99%   0%   0%  53.4 °C  0/7

^ this is with the fan on

 

optimise it a little in makefile

#CFLAGS = -g -O2
CFLAGS =

minerd_CPPFLAGS = -O3 -mcpu=cortex-a53 -march=armv8-a -ftree-vectorize -funsafe-math-optimizations 

 

 

Spoiler

 

[2025-07-17 20:41:42] DEBUG (little endian): hash <= target
Hash:   02D285CA9C499E30195BD3EC4F4DF544D03EDDC7A00DFC0D255EB1E7BA160000x0
Target: 0000000000000000000000000000000000000000000000000000008099190000x0
[2025-07-17 20:41:42] thread 1: 19364 hashes, 1.127 KH/s
[2025-07-17 20:41:42] accepted: 1/1 (100.000%), 1.127 KH/s (yay!!!)
[2025-07-17 20:41:54] thread 3: 32766 hashes, 1.128 KH/s
[2025-07-17 20:41:54] thread 0: 32766 hashes, 1.127 KH/s
[2025-07-17 20:41:54] thread 2: 32766 hashes, 1.124 KH/s
[2025-07-17 20:42:42] thread 1: 67603 hashes, 1.127 KH/s
[2025-07-17 20:42:53] thread 3: 67650 hashes, 1.128 KH/s
[2025-07-17 20:42:54] thread 0: 67639 hashes, 1.127 KH/s
[2025-07-17 20:42:54] thread 2: 67439 hashes, 1.124 KH/s
[2025-07-17 20:43:42] thread 1: 67627 hashes, 1.128 KH/s

 

 


well, just a very minor 0.025 k hash/s  improvement per core. perhaps it already has Neon SIMD or that it needs 'hand optimization', that is hard.

 

 

 

Posted (edited)

If you are having trouble with hardware decoding. You need to disable the compositor. Follow jock’s setup instructions in the link below. Use this command to restart xfwm4 with compositor disabled.

 

killall xfwm4 && xfwm4 --compositor=off &

 

 

Edited by Nick A

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines