Jump to content

RK3588(S) on Armbian Noble/Jammy - actual RKNPU utilization still capped at ~35-42% under INT8/INT4 for Llama.cpp / MLC-LLM in 2025H2?


Recommended Posts

Posted

Hey folks, Been daily-driving OrangePi 6 / 6B with latest Armbian noble-edge (6.12-rc kernel + rknpu 0.9.9-8) for local LLM inference. Even after manually loading the closed-source RKNPU DDK 0.9.9, switching to the "new" rknpu.ko from Rockchip’s 5.10 BSP, and forcing INT4 quantization via llama.cpp built with -DRKNN_RT=ON, the NPU utilization reported by rk_nn_tool tops out around 36-42% when running Q4_K_M 70B-class models (tokens/s barely hits ~19-21 t/s). CPU is almost idle, no thermal throttling, 16GB LPDDR4X fully available. Is this still the known "18 TOPS theoretical vs real ~7-8 TOPS usable" ceiling, or did anyone manage to push past 65%+ utilization on mainline-ish kernels in late 2025? Happy to share my build scripts and rk_nn_tool logs if anyone wants to dig deeper. Bonus question: has anyone successfully coerced the NPU into handling 2+ concurrent contexts without dropping to zero utilization (the infamous "second model load kills the first" bug)?

Posted

orangepi5 and 6 have nothing in common. double-check what you are actually talking about.

Also there is no 0.9.9 rknpu stuff. 

5.10 bsp has been abandoned long ago.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines