Jump to content

usual user

Members
  • Posts

    536
  • Joined

  • Last visited

Posts posted by usual user

  1. It happened a few days ago that I rebuilt my complete firmware package to try something with another device. An HC4 firmware binary also automatically falls out in this process.
    If you like, you can put it on a microSD card (dd bs=512 seek=1 conv=notrunc,fsync if=u-boot-meson.bin of=/dev/${entire-device-to-be-used}), place the prepared microSD card in your HC4 and start it with the boot button pressed.
    Check whether it meets your expectations, and if all tests are successful, you can transfer it to the SPI flash.

  2. 1 hour ago, aleksandriusz said:

    Could anyone subscribed to this thread confirm that's still the case?

    Since I haven't restarted the M1 for some time, I am currently still at:

    # uptime
     12:56:23 up 115 days,  1:51,  5 users,  load average: 1.76, 1.26, 0.92
    # uname -a
    Linux micro-015 6.18.0-65.fc44.aarch64 #1 SMP PREEMPT_DYNAMIC Sun Dec  7 20:40:45 CET 2025 aarch64 GNU/Linux

    I still get:

    Spoiler
    # iperf3 -c odroid-m1
    Connecting to host odroid-m1, port 5201
    [  5] local odroid-m2 port 38866 connected to odroid-m1 port 5201
    [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
    [  5]   0.00-1.00   sec   114 MBytes   959 Mbits/sec    0    553 KBytes
    [  5]   1.00-2.00   sec   113 MBytes   946 Mbits/sec    0    648 KBytes
    [  5]   2.00-3.00   sec   112 MBytes   941 Mbits/sec    0    684 KBytes
    [  5]   3.00-4.00   sec   111 MBytes   930 Mbits/sec    0    809 KBytes
    [  5]   4.00-5.00   sec   113 MBytes   949 Mbits/sec    0    809 KBytes
    [  5]   5.00-6.00   sec   112 MBytes   943 Mbits/sec    0    809 KBytes
    [  5]   6.00-7.00   sec   112 MBytes   940 Mbits/sec    0    809 KBytes
    [  5]   7.00-8.00   sec   112 MBytes   940 Mbits/sec    0    809 KBytes
    [  5]   8.00-9.00   sec   112 MBytes   935 Mbits/sec    0    850 KBytes
    [  5]   9.00-10.00  sec   112 MBytes   942 Mbits/sec    0    850 KBytes
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0            sender
    [  5]   0.00-10.01  sec  1.09 GBytes   940 Mbits/sec                  receiver
    
    iperf Done.
    
    # iperf3 -R -c odroid-m1
    Connecting to host odroid-m1, port 5201
    Reverse mode, remote host odroid-m1 is sending
    [  5] local odroid-m2 port 50806 connected to odroid-m1 port 5201
    [ ID] Interval           Transfer     Bitrate
    [  5]   0.00-1.00   sec  92.2 MBytes   773 Mbits/sec
    [  5]   1.00-2.00   sec  89.0 MBytes   746 Mbits/sec
    [  5]   2.00-3.00   sec  88.1 MBytes   739 Mbits/sec
    [  5]   3.00-4.00   sec  73.6 MBytes   618 Mbits/sec
    [  5]   4.00-5.00   sec  89.5 MBytes   751 Mbits/sec
    [  5]   5.00-6.00   sec  89.9 MBytes   754 Mbits/sec
    [  5]   6.00-7.00   sec  77.5 MBytes   650 Mbits/sec
    [  5]   7.00-8.00   sec  83.5 MBytes   701 Mbits/sec
    [  5]   8.00-9.00   sec  84.6 MBytes   709 Mbits/sec
    [  5]   9.00-10.00  sec  84.0 MBytes   705 Mbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bitrate         Retr
    [  5]   0.00-10.00  sec   853 MBytes   715 Mbits/sec  214            sender
    [  5]   0.00-10.00  sec   852 MBytes   715 Mbits/sec                  receiver
    
    iperf Done.

    So nothing to complain about.

  3. 22 hours ago, Sharkam said:

    can we achieve in browser video hardware acceleration now, and how?

    I haven't looked at this use case for a very long time. I can no longer remember since when it has worked out-of-the-box for me. Since decoder support has been part of the GStreamer framework for a very long time, hardware-supported video decoding works for all browsers that use this framework with the standard packages of the distribution of my choice.
    When v4lrequest support was still implemented with the out-of-tree patches using the stateful method, it also worked with Firefox out-of-the-box. Just an accordingly patched FFmpeg framework was required. This is likely no longer going to work with the current patches for the FFmpeg framework and requires an additional implementation in Firefox. I suspect, however, that this will only happen after the official inclusion of v4lrequest support in the FFmpeg framework, as is also the case with MPV. To what extent patches for Firefox are already available is unknown to me. For the distribution of my choice, I have in any case rebuilt the FFmpeg and MPV packages with the corresponding patches.
    I have to confess that I usually use Firefox and the video decoding works flawlessly for my use cases. However, I cannot say whether this is actually hardware-accelerated, because the SBCs I use with a graphical Desktop are powerful enough to function sufficiently even with only software decoding.
    I'm just taking the lazy way here and waiting for it to end up in Manline. For SBCs that need hardware acceleration, I simply use a browser that uses the GStreamer framework.

  4. 10 hours ago, mircsicz said:

    So now I need to ask @usual user what specific kernel are you running?

    I am currently at 7.0.0-rc1. I can upload my jump-start image so you can check if my kernel build works with your device. If you like what you see, it is only a 'prepare-jump-start ${target-mount-point}' away to install the kernel package alongside your existing system.

    1 hour ago, flappyjet said:

    I found an interesting project https://github.com/NotPunchnox/rkllama

    I know about it, but since it is just another not mainline solution with another dependency mess, I am not particularly interested.

  5. On 3/2/2026 at 2:22 AM, flappyjet said:

    Would you be generous to post your NPU device info?

    Since the hardware support for Rockchip SoCs in the mainline kernelis generally already very outstanding and their further development is also being actively pursued, I only have SBCs with integrated NPUs that are based on them. Among them are ODROID-M2, NanoPC-T6, and ROCK-5-ITX. But since the NPU is an integral part of the SoC, the board manufacturer and the design of the SBC are not necessarily of importance.

     

    On 3/2/2026 at 2:22 AM, flappyjet said:

    I'm also looking for some w8a8 llm models working on NPU (the GEMM capability is perfect, isn't it).

    As far as I understand, edge-class NPUs are best suited for computer vision tasks.
    I am therefore engaged in object detection:Object-detector_6_simultaneous_executions.thumb.png.248f0b4317f1e416e40431c97fe5e5f3.png

    and super-resolution:0851x4-crop-sumary-4-esrgan_quant.tflite.thumb.png.9b5b856224ed8f11ea46c8a9ea8819e9.png

  6. 8 hours ago, flappyjet said:

    Do you have rocket kernel with mesa that enable NPU working now?

    This is what my software stack looks like:

    NPU-software-stack.png.1e724afa3aca9b9c2c6eb9144897d13e.png

    My kernel is build as a generic one, hence my OS is working on any device equipped with a VeriSilicon VIPNano, a Rockchip RK3588 or an Arm Ethos-U65/U85 NPU.
    The application can be written NPU-agnostic, as long as a model.tflite file suitable for the NPU is used.

  7. 8 hours ago, BoringName said:

    drm has been changed to v4l2request.

    The patches that were available out-of-tree for a long time were a kind of hack using the DRM subsystem for decoding. For inclusion in mainline, they were further developed into a more correct request method.

     

    8 hours ago, BoringName said:

    I'm going to assume you have hwdec=v4l2request in your mpv.conf file.

    It is hwdec=v4l2request-copy in fact, because the stateless decoder is an m2m device and the scan-out is still carried out via the DRM subsystem. However, the copy is cheap because it is executed via dmabuf as zero copy.

  8. 6 hours ago, BoringName said:

    Run mpv --hwdec=help

    If drm is not listed as one of the options it will not work. You need to find a build with drm support.

    mpv_--hwdec=help.logis what I get, and everything works as expected, but I am on current mainline releases with in-flight patches for mpv and ffmpeg on top. Gstreamer framework based applications work out-of-the-box.
    The log entries that contain the 'request' component are the ones that matter.
    But you're right, it can still take a while before current mainline releases are declared stable by some distributions and adopted. But this is not the fault of mainline development, which continues to progress and does not take outdated versions into account any longer. 

  9. Lately, I've been playing around a bit with computer vision detection. I managed to patch together a PoC script with which I conducted some tests. The results are quite promising. The frame rate is just based on the round trip time of my test script, so it only roughly reflects the inference time. The throughput includes all additional overhead but is sufficiently informative for a relative comparison.

     

    Inference on a single CPU core delivers an image throughput of about 4 images:Object-detector_single_core_CPU.png.b9c3f53528570da02da465b5674dc3a1.png

     

    Inference on a single NPU core delivers an image throughput of about 17 images:Object-detector_single_core_NPU.png.c89b59dac1a31d96007a4e567e96cdb5.png

     

    Inference on eight CPU cores delivers an image throughput of about 21 images. But all eight cores run over 80% during this, and after a short time the fan kicks in. The headroom is also quite limited, for e.g., to perform other tasks concurrently. Running several similar inference tasks concurrently immediately results in a proportional drop in frame rate per task.


    When six similar inference tasks are executed simultaneously with NPU delegates, they are distributed across the three available NPU cores, and the SoC utilization is moderate enough that the fan doesn't even turn on. The throughput does not degrade and the CPU cores remain available for other tasks as well:Object-detector_6_simultaneous_executions.thumb.png.14c64261127cefad3db33e0042ff1429.png

     

    For my tests, I used a random video clip. For the inference, I used a model pre-trained with the COCO dataset. With its 4.1MB memory size and its 80 object classes, it delivers surprisingly good results. Using the NPU hardware not only reduces the load on the CPU cores but also provides additional acceleration of processing. But the best part is that only current mainline code is required for use. No dependencies on proprietary implementations or outdated software stacks. It just works out-of-the-box, you just need to know how to use it.

  10. 51 minutes ago, fever_wits said:

    I'll probably wait for the new kernel and test it.

    Waiting for others to do the work is a good strategy. If everything goes well, it will land at the earliest in 6.20, if improvements are needed, 7.x is probably more likely. When the corresponding kernel will be included in the chosen distribution is the next data point. You are clearly aware of what you have to wait for.
    Until then, I will simply continue using the necessary modifications as a self-written DTBO, which I have been doing for some time.
    But maybe you'll get lucky and someone else takes the burdon to apply the patches as an early adaptation, so you can just sit back and do nothing and still enjoy the upcoming availability.


  11. The rk3588 NPU support has been working as expected for me for some time now, but I don't use legacy software.

    $ ./classification
    Loading external delegate from /usr/lib64/libteflon.so with args: {}
    INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
    0.901961: military uniform
    0.023529: Windsor tie
    0.011765: mortarboard
    0.007843: bulletproof vest
    0.003922: cornet
    time: 15.706ms
    $ ./classification
    INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
    0.901961: military uniform
    0.023529: Windsor tie
    0.007843: bulletproof vest
    0.007843: mortarboard
    0.003922: cornet
    time: 70.607ms

    versus

  12. On 12/8/2025 at 10:11 PM, gpupoor said:

    I'm stuck with how to use Fedora as a root filesystem with Armbian.

    That's not really possible either. The only thing that is possible is using my kernel build in Armbian environment to see how it is performing for you. With my jumpstart image, this is actually quite easy to implement.
    For this, both root filesystems just need to be mounted and

    extlinux/prepare-jump-start ${target-rootfs-mount-point}

    needs to be executed in the jumpstart rootfs.
    If the target system is now being booted with firmware that uses mainline U-Boot as the payload, nothing stands in the way of booting with my kernel.
    And don't worry, prepare-jump-start only adds files to the target rootfs. Nothing will be overwritten or deleted.
    If you want to give it a whirl, speak up and I will upload a current image.

     

    On 12/8/2025 at 10:11 PM, gpupoor said:

    I'm understanding it will work successfully with 6.18, but I think the expectation is to start with the raw distro

    But you will be disappointed, because the stock kernel only provides the functionalities available with the officially released kernel. Both Armbian and my kernel build have already applied patches that may appear in a future official release.

     

    On 12/8/2025 at 10:11 PM, gpupoor said:

    start with Fedora and build it all from the ground up.


    I doubt that you will succeed, because even the fedora organization only does a full rebuild once per release cycle. There is no advantage in rebuilding a component unmodified that leads to the same result as the package already provided. It only makes sense to do this when branching off the release version in order to create a synchronized basis for further development.
    Only modified packages will be replaced or upgraded. My weekly upgrade gives me several gigabytes of new packages each time because I'm on Rawhide. Somehow, I can no longer manage to stick to any official release versions.

  13. 3 hours ago, gpupoor said:

    I haven't been able to figure out how you got there.

    I don't really do anything special. Since the architecture of the devices I use is quite up-to-date, development for their support is also at the bleeding edge. It is therefore also of essential importance to use the latest software releases. My chosen distribution provides me with this quite promptly. But that's where it ends. I receive no support at all for using my devices there. She doesn't even provide me with firmware for my devices to start the system.
    The kernel provided by my distribution is only the one based on the currently released mainline source code. So if I want to use functionalities whose development is still in progress, I have to build the kernel myself with the appropriate patches, which I do regularly (E.g., I'm just building one so I can play around with RGA3).
    I haven't done much work in user space for a long time, but recently I've been building the FFmpeg package myself again since the availability of RKVDEC2. v4l-request works out-of-the-box with the GStreamer framework, but for FFmpeg, it will probably take some time until support is available in a release version.

    OK, the kernel is done. Now I have to deal with another video device:

    lrwxrwxrwx 1 root root 12 Dec  7 22:44 platform-fdb50000.video-codec-video-index0 -> ../../video3
    lrwxrwxrwx 1 root root 12 Dec  7 22:44 platform-fdb60000.rga-video-index0         -> ../../video2
    lrwxrwxrwx 1 root root 12 Dec  7 22:44 platform-fdb80000.rga-video-index0         -> ../../video0
    lrwxrwxrwx 1 root root 12 Dec  7 22:44 platform-fdba0000.video-codec-video-index0 -> ../../video4
    lrwxrwxrwx 1 root root 12 Dec  7 22:44 platform-fdc38100.video-codec-video-index0 -> ../../video1
    lrwxrwxrwx 1 root root 12 Dec  7 22:44 platform-fdc70000.video-codec-video-index0 -> ../../video5

    v4l2-compliance-odroid-m2.log

  14. FWIW, on my rk3588 devices the NPUs are working with recent mainline releases:

    [    5.967316] [drm] Initialized rocket 0.0.0 for rknn on minor 0
    [    5.975499] rocket fdab0000.npu: Rockchip NPU core 0 version: 1179210309
    [    5.978652] rocket fdac0000.npu: Rockchip NPU core 1 version: 1179210309
    [    5.985602] rocket fdad0000.npu: Rockchip NPU core 2 version: 1179210309

    This script runs the Mesa example with the latest available working versions:

    Spoiler
    #!/bin/bash
    IMAGE="grace_hopper.bmp"
    WORKBENCH="."
    ENVIRONMENT="${WORKBENCH}/python/3.11"
    [ "${1}" == "setup" ] || [ ! -f ${ENVIRONMENT}/bin/activate ] && BOOTSTRAP="true"
    [ -v BOOTSTRAP ] && python3.11 -m venv ${ENVIRONMENT}
    source ${ENVIRONMENT}/bin/activate
    [ -v BOOTSTRAP ] && pip install numpy==1.26.4
    [ -v BOOTSTRAP ] && pip install pillow==12.0.0
    [ -v BOOTSTRAP ] && pip install tflite-runtime==2.14.0
    TEFLON_DEBUG=verbose ETNA_MESA_DEBUG=ml_dbgs python ${WORKBENCH}/classification-tflite.py \
              -i ${WORKBENCH}/${IMAGE} \
              -m ${WORKBENCH}/mobilenet_v1_1_224_quant.tflite \
              -l ${WORKBENCH}/labels_mobilenet_quant_v1_224.txt \
              -e /usr/lib64/libteflon.so
    deactivate

     

    And with this script, the Mesa example runs, with a small adjustment, also with the TFLite successor LiteRT:

    Spoiler
    #!/bin/bash
    IMAGE="grace_hopper.bmp"
    WORKBENCH="."
    ENVIRONMENT="${WORKBENCH}/python/3.13"
    [ "${1}" == "setup" ] || [ ! -f ${ENVIRONMENT}/bin/activate ] && BOOTSTRAP="true"
    [ -v BOOTSTRAP ] && python3.13 -m venv ${ENVIRONMENT}
    source ${ENVIRONMENT}/bin/activate
    [ -v BOOTSTRAP ] && pip install pillow
    [ -v BOOTSTRAP ] && pip install ai-edge-litert-nightly
    TEFLON_DEBUG=verbose ETNA_MESA_DEBUG=ml_dbgs python ${WORKBENCH}/classification-litert.py \
              -i ${WORKBENCH}/${IMAGE} \
              -m ${WORKBENCH}/mobilenet_v1_1_224_quant.tflite \
              -l ${WORKBENCH}/labels_mobilenet_quant_v1_224.txt \
              -e /usr/lib64/libteflon.so
    deactivate

     

    A MediaPipe sample can also be set up easily:

    Spoiler
    #!/bin/bash
    WORKBENCH="."
    ENVIRONMENT="${WORKBENCH}/python/3.12"
    [ "${1}" == "setup" ] || [ ! -f ${ENVIRONMENT}/bin/activate ] && BOOTSTRAP="true"
    [ -v BOOTSTRAP ] && python3.12 -m venv ${ENVIRONMENT}
    source ${ENVIRONMENT}/bin/activate
    [ -v BOOTSTRAP ] && pip install mediapipe
    [ -v BOOTSTRAP ] && pip install pillow
    [ -v BOOTSTRAP ] && pip install ai-edge-litert-nightly
    python ${WORKBENCH}/detect.py  --model efficientdet_lite0.tflite

     

    But unfortunately, the MediaPipe framework does not support the extended delegate functionality of LiteRT (TFLite).
    And therefore no NPU support.

    classification-3.11-tflite.logclassification-3.13-litert.logobject_detection-3.12-litert.log

×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines