I made some patches so that the unified driver / TIM-VX for the NPU works on the A7Z. I am also building a MLIR pipeline that emit TIM-VX code, so hopefully we can get more flexibility running ML models on the SBC soon.
https://github.com/MaverickLong/Radxa-A733-NPU-Unified-Driver-Support-Package
@qq20739111 thks, official community = github? https://github.com/radxa-docs/docs/
i asked claude to make a summary of my project with npu:
We use the vendor's ACUITY toolkit inside a Docker container (ubuntu-npu:v2.0.10.1) to convert ONNX → quantized .nb files.
- Pipeline: onnxsim → pegasus_import → pegasus_quantize → pegasus_export, uint8 quantization, deployed to ~/npu_models/ via push.sh.
- Runtime: custom npu_server.c (pre-allocated input buffer, no per-call mmap) that the Python app talks to.
- Hard lesson #1 — quantization: only pure Conv+BN+ReLU survives uint8. Attention, SE blocks, hard-swish, LayerNorm all collapse to constant outputs. So no MobileNetV3+, no transformers.
- Hard lesson #2 — concurrency hang: NPU IRQs get lost when camera ISP DMA runs in parallel (shared memory bus). Fix: suppress GStreamer buffer copies during inference. Never STREAMOFF/ON the sunxi-vin driver — instant kernel crash.
- Result: 12 models running (9 NPU + 3 CPU) at ~40ms/inference.