RK3588(S) on Armbian Noble/Jammy - actual RKNPU utilization still capped at ~35-42% under INT8/INT4 for Llama.cpp / MLC-LLM in 2025H2?

lucass · 2025-11-20T12:36:03Z

Hey folks, Been daily-driving OrangePi 6 / 6B with latest Armbian noble-edge (6.12-rc kernel + rknpu 0.9.9-8) for local LLM inference. Even after manually loading the closed-source RKNPU DDK 0.9.9, switching to the "new" rknpu.ko from Rockchip’s 5.10 BSP, and forcing INT4 quantization via llama.cpp built with -DRKNN_RT=ON, the NPU utilization reported by rk_nn_tool tops out around 36-42% when running Q4_K_M 70B-class models (tokens/s barely hits ~19-21 t/s). CPU is almost idle, no thermal throttling, 16GB LPDDR4X fully available. Is this still the known "18 TOPS theoretical vs real ~7-8 TOPS usable" ceiling, or did anyone manage to push past 65%+ utilization on mainline-ish kernels in late 2025? Happy to share my build scripts and rk_nn_tool logs if anyone wants to dig deeper. Bonus question: has anyone successfully coerced the NPU into handling 2+ concurrent contexts without dropping to zero utilization (the infamous "second model load kills the first" bug)?

Werner · 2025-11-20T14:30:26Z

orangepi5 and 6 have nothing in common. double-check what you are actually talking about.

Also there is no 0.9.9 rknpu stuff.

5.10 bsp has been abandoned long ago.

Sign In

RK3588(S) on Armbian Noble/Jammy - actual RKNPU utilization still capped at ~35-42% under INT8/INT4 for Llama.cpp / MLC-LLM in 2025H2?

Recommended Posts

lucass

Werner

Join the conversation

Similar Content

Forums

My Activity Streams

Download

Store

Important Information