Jump to content

tiobily

Members
  • Posts

    4
  • Joined

  • Last visited

Reputation Activity

  1. Like
    tiobily got a reaction from qq20739111 in Radxa Cubie A7A/A7Z - Allwinner a733   
    @qq20739111 thks, official community = github? https://github.com/radxa-docs/docs/

    i asked claude to make a summary of my project with npu:
     
    We use the vendor's ACUITY toolkit inside a Docker container (ubuntu-npu:v2.0.10.1) to convert ONNX → quantized .nb files. - Pipeline: onnxsim → pegasus_import → pegasus_quantize → pegasus_export, uint8 quantization, deployed to ~/npu_models/ via push.sh. - Runtime: custom npu_server.c (pre-allocated input buffer, no per-call mmap) that the Python app talks to. - Hard lesson #1 — quantization: only pure Conv+BN+ReLU survives uint8. Attention, SE blocks, hard-swish, LayerNorm all collapse to constant outputs. So no MobileNetV3+, no transformers. - Hard lesson #2 — concurrency hang: NPU IRQs get lost when camera ISP DMA runs in parallel (shared memory bus). Fix: suppress GStreamer buffer copies during inference. Never STREAMOFF/ON the sunxi-vin driver — instant kernel crash. - Result: 12 models running (9 NPU + 3 CPU) at ~40ms/inference.  


     
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines