https://github.com/MaverickLong/MLIR-TIM-VX
This is an MLIR-based lowering path from the TOSA v1 dialect to TIM-VX, VeriSilicon's OpenVX-based GPU/NPU ML Framework.
It includes the lowering from TOSA v1 to a custom timvx dialect (mirrors the TIM-VX C++ semantics) and a full lowering to C++ source.
It is currently on par in inference speed with the vendor ACUITY compiler pipeline while still giving you full control on the graph level.
In my own testing, on a Radxa Cubie A7Z \w Allwinner A733, ResNet-50 takes 8.0ms to inference, while the ACUITY-compiled baseline takes 7.3ms according to Radxa.
It all starts with Radxa/VeriSilicon's false advertisement of the NPU supporting MLIR, but turns out we just don't have it yet, so I made my own.
I have only tested the pipeline on the A733, but it should be able to be extended to any other VIP9000 variants as well.