lucass Posted 3 hours ago Posted 3 hours ago Hey folks, Been daily-driving OrangePi 6 / 6B with latest Armbian noble-edge (6.12-rc kernel + rknpu 0.9.9-8) for local LLM inference. Even after manually loading the closed-source RKNPU DDK 0.9.9, switching to the "new" rknpu.ko from Rockchip’s 5.10 BSP, and forcing INT4 quantization via llama.cpp built with -DRKNN_RT=ON, the NPU utilization reported by rk_nn_tool tops out around 36-42% when running Q4_K_M 70B-class models (tokens/s barely hits ~19-21 t/s). CPU is almost idle, no thermal throttling, 16GB LPDDR4X fully available. Is this still the known "18 TOPS theoretical vs real ~7-8 TOPS usable" ceiling, or did anyone manage to push past 65%+ utilization on mainline-ish kernels in late 2025? Happy to share my build scripts and rk_nn_tool logs if anyone wants to dig deeper. Bonus question: has anyone successfully coerced the NPU into handling 2+ concurrent contexts without dropping to zero utilization (the infamous "second model load kills the first" bug)? 0 Quote
Werner Posted 2 hours ago Posted 2 hours ago orangepi5 and 6 have nothing in common. double-check what you are actually talking about. Also there is no 0.9.9 rknpu stuff. 5.10 bsp has been abandoned long ago. 0 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.