Jump to content

usual user

Members
  • Posts

    527
  • Joined

  • Last visited

Posts posted by usual user

  1. 10 hours ago, mircsicz said:

    So now I need to ask @usual user what specific kernel are you running?

    I am currently at 7.0.0-rc1. I can upload my jump-start image so you can check if my kernel build works with your device. If you like what you see, it is only a 'prepare-jump-start ${target-mount-point}' away to install the kernel package alongside your existing system.

    1 hour ago, flappyjet said:

    I found an interesting project https://github.com/NotPunchnox/rkllama

    I know about it, but since it is just another not mainline solution with another dependency mess, I am not particularly interested.

  2. On 3/2/2026 at 2:22 AM, flappyjet said:

    Would you be generous to post your NPU device info?

    Since the hardware support for Rockchip SoCs in the mainline kernelis generally already very outstanding and their further development is also being actively pursued, I only have SBCs with integrated NPUs that are based on them. Among them are ODROID-M2, NanoPC-T6, and ROCK-5-ITX. But since the NPU is an integral part of the SoC, the board manufacturer and the design of the SBC are not necessarily of importance.

     

    On 3/2/2026 at 2:22 AM, flappyjet said:

    I'm also looking for some w8a8 llm models working on NPU (the GEMM capability is perfect, isn't it).

    As far as I understand, edge-class NPUs are best suited for computer vision tasks.
    I am therefore engaged in object detection:Object-detector_6_simultaneous_executions.thumb.png.248f0b4317f1e416e40431c97fe5e5f3.png

    and super-resolution:0851x4-crop-sumary-4-esrgan_quant.tflite.thumb.png.9b5b856224ed8f11ea46c8a9ea8819e9.png

  3. 8 hours ago, flappyjet said:

    Do you have rocket kernel with mesa that enable NPU working now?

    This is what my software stack looks like:

    NPU-software-stack.png.1e724afa3aca9b9c2c6eb9144897d13e.png

    My kernel is build as a generic one, hence my OS is working on any device equipped with a VeriSilicon VIPNano, a Rockchip RK3588 or an Arm Ethos-U65/U85 NPU.
    The application can be written NPU-agnostic, as long as a model.tflite file suitable for the NPU is used.

  4. 8 hours ago, BoringName said:

    drm has been changed to v4l2request.

    The patches that were available out-of-tree for a long time were a kind of hack using the DRM subsystem for decoding. For inclusion in mainline, they were further developed into a more correct request method.

     

    8 hours ago, BoringName said:

    I'm going to assume you have hwdec=v4l2request in your mpv.conf file.

    It is hwdec=v4l2request-copy in fact, because the stateless decoder is an m2m device and the scan-out is still carried out via the DRM subsystem. However, the copy is cheap because it is executed via dmabuf as zero copy.

  5. 6 hours ago, BoringName said:

    Run mpv --hwdec=help

    If drm is not listed as one of the options it will not work. You need to find a build with drm support.

    mpv_--hwdec=help.logis what I get, and everything works as expected, but I am on current mainline releases with in-flight patches for mpv and ffmpeg on top. Gstreamer framework based applications work out-of-the-box.
    The log entries that contain the 'request' component are the ones that matter.
    But you're right, it can still take a while before current mainline releases are declared stable by some distributions and adopted. But this is not the fault of mainline development, which continues to progress and does not take outdated versions into account any longer. 

  6. Lately, I've been playing around a bit with computer vision detection. I managed to patch together a PoC script with which I conducted some tests. The results are quite promising. The frame rate is just based on the round trip time of my test script, so it only roughly reflects the inference time. The throughput includes all additional overhead but is sufficiently informative for a relative comparison.

     

    Inference on a single CPU core delivers an image throughput of about 4 images:Object-detector_single_core_CPU.png.b9c3f53528570da02da465b5674dc3a1.png

     

    Inference on a single NPU core delivers an image throughput of about 17 images:Object-detector_single_core_NPU.png.c89b59dac1a31d96007a4e567e96cdb5.png

     

    Inference on eight CPU cores delivers an image throughput of about 21 images. But all eight cores run over 80% during this, and after a short time the fan kicks in. The headroom is also quite limited, for e.g., to perform other tasks concurrently. Running several similar inference tasks concurrently immediately results in a proportional drop in frame rate per task.


    When six similar inference tasks are executed simultaneously with NPU delegates, they are distributed across the three available NPU cores, and the SoC utilization is moderate enough that the fan doesn't even turn on. The throughput does not degrade and the CPU cores remain available for other tasks as well:Object-detector_6_simultaneous_executions.thumb.png.14c64261127cefad3db33e0042ff1429.png

     

    For my tests, I used a random video clip. For the inference, I used a model pre-trained with the COCO dataset. With its 4.1MB memory size and its 80 object classes, it delivers surprisingly good results. Using the NPU hardware not only reduces the load on the CPU cores but also provides additional acceleration of processing. But the best part is that only current mainline code is required for use. No dependencies on proprietary implementations or outdated software stacks. It just works out-of-the-box, you just need to know how to use it.

  7. 51 minutes ago, fever_wits said:

    I'll probably wait for the new kernel and test it.

    Waiting for others to do the work is a good strategy. If everything goes well, it will land at the earliest in 6.20, if improvements are needed, 7.x is probably more likely. When the corresponding kernel will be included in the chosen distribution is the next data point. You are clearly aware of what you have to wait for.
    Until then, I will simply continue using the necessary modifications as a self-written DTBO, which I have been doing for some time.
    But maybe you'll get lucky and someone else takes the burdon to apply the patches as an early adaptation, so you can just sit back and do nothing and still enjoy the upcoming availability.


  8. The rk3588 NPU support has been working as expected for me for some time now, but I don't use legacy software.

    $ ./classification
    Loading external delegate from /usr/lib64/libteflon.so with args: {}
    INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
    0.901961: military uniform
    0.023529: Windsor tie
    0.011765: mortarboard
    0.007843: bulletproof vest
    0.003922: cornet
    time: 15.706ms
    $ ./classification
    INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
    0.901961: military uniform
    0.023529: Windsor tie
    0.007843: bulletproof vest
    0.007843: mortarboard
    0.003922: cornet
    time: 70.607ms

    versus

  9. On 12/8/2025 at 10:11 PM, gpupoor said:

    I'm stuck with how to use Fedora as a root filesystem with Armbian.

    That's not really possible either. The only thing that is possible is using my kernel build in Armbian environment to see how it is performing for you. With my jumpstart image, this is actually quite easy to implement.
    For this, both root filesystems just need to be mounted and

    extlinux/prepare-jump-start ${target-rootfs-mount-point}

    needs to be executed in the jumpstart rootfs.
    If the target system is now being booted with firmware that uses mainline U-Boot as the payload, nothing stands in the way of booting with my kernel.
    And don't worry, prepare-jump-start only adds files to the target rootfs. Nothing will be overwritten or deleted.
    If you want to give it a whirl, speak up and I will upload a current image.

     

    On 12/8/2025 at 10:11 PM, gpupoor said:

    I'm understanding it will work successfully with 6.18, but I think the expectation is to start with the raw distro

    But you will be disappointed, because the stock kernel only provides the functionalities available with the officially released kernel. Both Armbian and my kernel build have already applied patches that may appear in a future official release.

     

    On 12/8/2025 at 10:11 PM, gpupoor said:

    start with Fedora and build it all from the ground up.


    I doubt that you will succeed, because even the fedora organization only does a full rebuild once per release cycle. There is no advantage in rebuilding a component unmodified that leads to the same result as the package already provided. It only makes sense to do this when branching off the release version in order to create a synchronized basis for further development.
    Only modified packages will be replaced or upgraded. My weekly upgrade gives me several gigabytes of new packages each time because I'm on Rawhide. Somehow, I can no longer manage to stick to any official release versions.

  10. 3 hours ago, gpupoor said:

    I haven't been able to figure out how you got there.

    I don't really do anything special. Since the architecture of the devices I use is quite up-to-date, development for their support is also at the bleeding edge. It is therefore also of essential importance to use the latest software releases. My chosen distribution provides me with this quite promptly. But that's where it ends. I receive no support at all for using my devices there. She doesn't even provide me with firmware for my devices to start the system.
    The kernel provided by my distribution is only the one based on the currently released mainline source code. So if I want to use functionalities whose development is still in progress, I have to build the kernel myself with the appropriate patches, which I do regularly (E.g., I'm just building one so I can play around with RGA3).
    I haven't done much work in user space for a long time, but recently I've been building the FFmpeg package myself again since the availability of RKVDEC2. v4l-request works out-of-the-box with the GStreamer framework, but for FFmpeg, it will probably take some time until support is available in a release version.

    OK, the kernel is done. Now I have to deal with another video device:

    lrwxrwxrwx 1 root root 12 Dec  7 22:44 platform-fdb50000.video-codec-video-index0 -> ../../video3
    lrwxrwxrwx 1 root root 12 Dec  7 22:44 platform-fdb60000.rga-video-index0         -> ../../video2
    lrwxrwxrwx 1 root root 12 Dec  7 22:44 platform-fdb80000.rga-video-index0         -> ../../video0
    lrwxrwxrwx 1 root root 12 Dec  7 22:44 platform-fdba0000.video-codec-video-index0 -> ../../video4
    lrwxrwxrwx 1 root root 12 Dec  7 22:44 platform-fdc38100.video-codec-video-index0 -> ../../video1
    lrwxrwxrwx 1 root root 12 Dec  7 22:44 platform-fdc70000.video-codec-video-index0 -> ../../video5

    v4l2-compliance-odroid-m2.log

  11. FWIW, on my rk3588 devices the NPUs are working with recent mainline releases:

    [    5.967316] [drm] Initialized rocket 0.0.0 for rknn on minor 0
    [    5.975499] rocket fdab0000.npu: Rockchip NPU core 0 version: 1179210309
    [    5.978652] rocket fdac0000.npu: Rockchip NPU core 1 version: 1179210309
    [    5.985602] rocket fdad0000.npu: Rockchip NPU core 2 version: 1179210309

    This script runs the Mesa example with the latest available working versions:

    Spoiler
    #!/bin/bash
    IMAGE="grace_hopper.bmp"
    WORKBENCH="."
    ENVIRONMENT="${WORKBENCH}/python/3.11"
    [ "${1}" == "setup" ] || [ ! -f ${ENVIRONMENT}/bin/activate ] && BOOTSTRAP="true"
    [ -v BOOTSTRAP ] && python3.11 -m venv ${ENVIRONMENT}
    source ${ENVIRONMENT}/bin/activate
    [ -v BOOTSTRAP ] && pip install numpy==1.26.4
    [ -v BOOTSTRAP ] && pip install pillow==12.0.0
    [ -v BOOTSTRAP ] && pip install tflite-runtime==2.14.0
    TEFLON_DEBUG=verbose ETNA_MESA_DEBUG=ml_dbgs python ${WORKBENCH}/classification-tflite.py \
              -i ${WORKBENCH}/${IMAGE} \
              -m ${WORKBENCH}/mobilenet_v1_1_224_quant.tflite \
              -l ${WORKBENCH}/labels_mobilenet_quant_v1_224.txt \
              -e /usr/lib64/libteflon.so
    deactivate

     

    And with this script, the Mesa example runs, with a small adjustment, also with the TFLite successor LiteRT:

    Spoiler
    #!/bin/bash
    IMAGE="grace_hopper.bmp"
    WORKBENCH="."
    ENVIRONMENT="${WORKBENCH}/python/3.13"
    [ "${1}" == "setup" ] || [ ! -f ${ENVIRONMENT}/bin/activate ] && BOOTSTRAP="true"
    [ -v BOOTSTRAP ] && python3.13 -m venv ${ENVIRONMENT}
    source ${ENVIRONMENT}/bin/activate
    [ -v BOOTSTRAP ] && pip install pillow
    [ -v BOOTSTRAP ] && pip install ai-edge-litert-nightly
    TEFLON_DEBUG=verbose ETNA_MESA_DEBUG=ml_dbgs python ${WORKBENCH}/classification-litert.py \
              -i ${WORKBENCH}/${IMAGE} \
              -m ${WORKBENCH}/mobilenet_v1_1_224_quant.tflite \
              -l ${WORKBENCH}/labels_mobilenet_quant_v1_224.txt \
              -e /usr/lib64/libteflon.so
    deactivate

     

    A MediaPipe sample can also be set up easily:

    Spoiler
    #!/bin/bash
    WORKBENCH="."
    ENVIRONMENT="${WORKBENCH}/python/3.12"
    [ "${1}" == "setup" ] || [ ! -f ${ENVIRONMENT}/bin/activate ] && BOOTSTRAP="true"
    [ -v BOOTSTRAP ] && python3.12 -m venv ${ENVIRONMENT}
    source ${ENVIRONMENT}/bin/activate
    [ -v BOOTSTRAP ] && pip install mediapipe
    [ -v BOOTSTRAP ] && pip install pillow
    [ -v BOOTSTRAP ] && pip install ai-edge-litert-nightly
    python ${WORKBENCH}/detect.py  --model efficientdet_lite0.tflite

     

    But unfortunately, the MediaPipe framework does not support the extended delegate functionality of LiteRT (TFLite).
    And therefore no NPU support.

    classification-3.11-tflite.logclassification-3.13-litert.logobject_detection-3.12-litert.log

  12. 4 hours ago, Mark Umina said:

    May I ask where to retrieve `idbloader.img`?

    `idbloader.img` is device-specific code that is created from firmware build artifacts with U-Boot as payload using proprietary tools available only in binary form.
    SBC providers rarely offer ready-to-use code.
    I therefore prefer to load my firmware from microSD cards.
    Since it has apparently become increasingly common lately to provide only the MASKROM mode as the sole reliable recovery method, I have started building my firmware for Rockchip devices as RAM images as well. I can then simply upload them using MASKROM mode and start my work from there without having to rely on proprietary, binary-only tools.

  13. 10 hours ago, iav said:

    What could be the reason?

    Is see no significant differences:

    ********************************************************************************
    ssd-006
    Hardkernel ODROID-N2Plus
    CPU 0-1: performance 1000 MHz - 2016 MHz
    CPU 2-5: performance 1000 MHz - 2400 MHz
    GPU: performance 124 MHz - 799 MHz
    6.16.0-0.rc1.17.fc43.aarch64 #1 SMP PREEMPT_DYNAMIC Sat Jun 14 11:19:02 CEST 2025
    ********************************************************************************
    7z b
    
    7-Zip 24.09 (arm64) : Copyright (c) 1999-2024 Igor Pavlov : 2024-11-29
     64-bit arm_v:8-A locale=en_US.UTF-8 Threads:6 OPEN_MAX:1024
    
    Compiler:  ver:15.2.1 20250924 (Red Hat 15.2.1-2) GCC 15.2.1 : UNALIGNED
    Linux : 6.16.0-0.rc1.17.fc43.aarch64 : #1 SMP PREEMPT_DYNAMIC Sat Jun 14 11:19:02 CEST 2025 : aarch64
    PageSize:4KB THP:madvise hwcap:8FF:CRC32:SHA1:SHA2:AES:ASIMD
    arm64
    
    1T CPU Freq (MHz):  2092  2387  2384  2390  2361  2389  2388
    3T CPU Freq (MHz): 282% 2239   296% 2352..
    6T CPU Freq (MHz): 538% 2040   497% 1893..
    
    RAM size:    3740 MB,  # CPU hardware threads:   6
    RAM usage:   1334 MB,  # Benchmark threads:      6
    
                           Compressing  |                  Decompressing
    Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
             KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS
    
    22:       7107   508   1362   6914  |     141426   488   2473  12058
    23:       6459   494   1332   6581  |     136949   489   2421  11847
    24:       6268   508   1327   6740  |     132118   485   2390  11593
    25:       5695   515   1264   6503  |     127085   483   2343  11310
    ----------------------------------  | ------------------------------
    Avr:      6382   506   1321   6684  |     134394   486   2407  11702
    Tot:             496   1864   9193
    ********************************************************************************
    ssd-006
    Hardkernel ODROID-N2Plus
    CPU 0-1: performance 1000 MHz - 2016 MHz
    CPU 2-5: performance 1000 MHz - 2400 MHz
    GPU: performance 124 MHz - 799 MHz
    6.18.0-0.rc3.30.fc44.aarch64 #1 SMP PREEMPT_DYNAMIC Mon Oct 27 21:17:35 CET 2025
    ********************************************************************************
    7z b
    
    7-Zip 24.09 (arm64) : Copyright (c) 1999-2024 Igor Pavlov : 2024-11-29
     64-bit arm_v:8-A locale=en_US.UTF-8 Threads:6 OPEN_MAX:1024
    
    Compiler:  ver:15.2.1 20250924 (Red Hat 15.2.1-2) GCC 15.2.1 : UNALIGNED
    Linux : 6.18.0-0.rc3.30.fc44.aarch64 : #1 SMP PREEMPT_DYNAMIC Mon Oct 27 21:17:35 CET 2025 : aarch64
    PageSize:4KB THP:madvise hwcap:8FF:CRC32:SHA1:SHA2:AES:ASIMD
    arm64
    
    1T CPU Freq (MHz):  2365  2380  2383  2390  2389  2391  2388
    3T CPU Freq (MHz): 277% 2162   274% 2095..
    6T CPU Freq (MHz): 533% 2021   508% 1926..
    
    RAM size:    3737 MB,  # CPU hardware threads:   6
    RAM usage:   1334 MB,  # Benchmark threads:      6
    
                           Compressing  |                  Decompressing
    Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
             KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS
    
    22:       6924   491   1373   6737  |     136783   473   2465  11662
    23:       6586   495   1356   6711  |     141391   506   2418  12232
    24:       6240   494   1359   6709  |     138310   510   2382  12137
    25:       5912   507   1330   6750  |     127792   487   2337  11373
    ----------------------------------  | ------------------------------
    Avr:      6415   497   1355   6727  |     136069   494   2401  11851
    Tot:             495   1878   9289

     

  14. 1 hour ago, specs said:

    To run mesa 25.3 with Debian you'll have to install mesa using source-packages.

    I was just about to rebuild the source package, but today's upgrade delivered everything turnkey since my development team was faster and had done everything for me.
    The previously referenced description does provide a working result, but it does not use the most current available releases. This script runs the NPU with the latest working releases:

    #!/bin/bash
    WORKBENCH="."
    python3.11 -m venv ${WORKBENCH}/python/3.11
    source ${WORKBENCH}/python/3.11/bin/activate
    pip install numpy==1.26.4
    pip install pillow==12.0.0
    pip install tflite-runtime==2.14.0
    TEFLON_DEBUG=verbose ETNA_MESA_DEBUG=ml_dbgs python ${WORKBENCH}/classification.py \
              -i ${WORKBENCH}/grace_hopper.bmp \
              -m ${WORKBENCH}/mobilenet_v1_1_224_quant.tflite \
              -l ${WORKBENCH}/labels_mobilenet_quant_v1_224.txt \
              -e /usr/lib64/libteflon.so
    deactivate

    classification-3.11.log

  15. 2 hours ago, robertoj said:

    i am not trying to backport it.

    Official development for inclusion in upstream are carried out by working on the project's main branch.
    That's exactly what this pull request branch does by tracking the project's master tree.
    Applying commits to anything else is backporting.

     

    2 hours ago, robertoj said:

    I want to make it work in trixie.

    Backporting is possible, but you will have to deal with the consequences yourself that arise from your outdated versions lacking the functions required for your backport, and you may need to backport those as well.
    Since this project is a user program and not a framework with libraries that other programs depend on, it is solely the user who has to deal with it as a dependency in its use.
    At first, I also tried to use the sources of the current release version as a base, but since the commits couldn't be applied without errors, I simply built the current master branch.
    For the FFMPEG framework, this is already a whole different ballgame. There are many programs, which mpv is just one, that depend on the libraries and need to be rebuilt accordingly in case of an upgrade.
    Since my chosen distribution recently released ffmpeg 8.0 and thus made all dependent programs available accordingly, I just had to rebuild the ffmpeg package with the v4l2request support commits and put that in place. And since the commits could be applied to the current release sources without any errors, I didn't even need to switch to the master branch.

×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines