JMCC

  • Content Count

    651
  • Joined

  • Last visited

Reputation Activity

  1. Like
    JMCC reacted to Werner in Armbian 20.11 Tamandua   
    Release info:
    https://www.armbian.com/newsflash/armbian-20-11-tamandua/
     
    Downloads:
    https://www.armbian.com/download/
     

  2. Like
    JMCC reacted to edrex in SBC with proper software support for hardware video transcoding   
    Hi @JMCC, I happened upon this thread as I was experimenting with HW encoding for one of my two HC1s, so I was very happy to read your posts and test out your custom ffmpeg build. I'm seeing a modestsubstantial speed boost (1.8x vs 0.9x) compared to the Armbian build.
     
    My encode params are `-vcodec h264_v4l2m2m -num_capture_buffers 32  -pix_fmt nv21`. Any additional suggestions? Thanks
     
  3. Like
    JMCC got a reaction from TRS-80 in AMD Threadripper 3990X Armbian Build Server Review   
    Okay, another use case. This one will bring some surprises.
     
    Let us imagine we want to compile natively armhf/arm64 binaries. Like, for example, making the new Armbian multimedia packages that we will announce very soon
     
    In this case, the Threadripper will be in clear disadvantage, since it needs to virtualize the ARM CPU through Qemu. But, will it be able to make up with core count and sheer processing power? Here are the numbers. We will compare the Threadripper with the Ampere ARM server, and with my highly optimized Odroid XU4 (good cooling and slight overclock).
     
    First, a single thread 7-zip bench (Decompressing MIPS, higher is better):
    $ 7z b -mmt1 Threadripper (native amd64): 4793 Threadripper (emulating armhf): 1529 Ampere ARM server (native armhf): 2889 Odroid XU4 (native armhf): 2160 As you can see, the single-core performance of the Threadripper is reduced to 1/3 of its natiive performance when emulating through Qemu, leaving it well below the Odroid XU4 and the Ampere.
     
    Now, a real-world use case: let us compile our customized version of Kodi for armhf (compilation time, lower is better):
    $ time cmake --build . -- -j$(nproc --all) Threadripper (emulating armhf): 18m9.696s Ampere (native armhf): 5m50.033s Odroid XU4 (native armhf): 45m50.711s The 32-core ARM server beats here the 64C/128T AMD server for more than three times shorter compile time. And Odroid XU4 gets just slightly above double the compile time of the AMD. If we factor in power consumption, it becomes very clear that compiling in an emulated environment is very suboptimal.
     
    Now, we must remember that for building Armbian images we don't emulate, but instead cross-compile. In that case, the AMD is working natively, and that is another story. In that case, the AMD has absolutely no match with the ARM server, or anything else I ever tested. We will probably post numbers about this in some other opportunity.
  4. Like
    JMCC got a reaction from piter75 in AMD Threadripper 3990X Armbian Build Server Review   
    Okay, another use case. This one will bring some surprises.
     
    Let us imagine we want to compile natively armhf/arm64 binaries. Like, for example, making the new Armbian multimedia packages that we will announce very soon
     
    In this case, the Threadripper will be in clear disadvantage, since it needs to virtualize the ARM CPU through Qemu. But, will it be able to make up with core count and sheer processing power? Here are the numbers. We will compare the Threadripper with the Ampere ARM server, and with my highly optimized Odroid XU4 (good cooling and slight overclock).
     
    First, a single thread 7-zip bench (Decompressing MIPS, higher is better):
    $ 7z b -mmt1 Threadripper (native amd64): 4793 Threadripper (emulating armhf): 1529 Ampere ARM server (native armhf): 2889 Odroid XU4 (native armhf): 2160 As you can see, the single-core performance of the Threadripper is reduced to 1/3 of its natiive performance when emulating through Qemu, leaving it well below the Odroid XU4 and the Ampere.
     
    Now, a real-world use case: let us compile our customized version of Kodi for armhf (compilation time, lower is better):
    $ time cmake --build . -- -j$(nproc --all) Threadripper (emulating armhf): 18m9.696s Ampere (native armhf): 5m50.033s Odroid XU4 (native armhf): 45m50.711s The 32-core ARM server beats here the 64C/128T AMD server for more than three times shorter compile time. And Odroid XU4 gets just slightly above double the compile time of the AMD. If we factor in power consumption, it becomes very clear that compiling in an emulated environment is very suboptimal.
     
    Now, we must remember that for building Armbian images we don't emulate, but instead cross-compile. In that case, the AMD is working natively, and that is another story. In that case, the AMD has absolutely no match with the ARM server, or anything else I ever tested. We will probably post numbers about this in some other opportunity.
  5. Like
    JMCC got a reaction from NicoD in AMD Threadripper 3990X Armbian Build Server Review   
    Okay, another use case. This one will bring some surprises.
     
    Let us imagine we want to compile natively armhf/arm64 binaries. Like, for example, making the new Armbian multimedia packages that we will announce very soon
     
    In this case, the Threadripper will be in clear disadvantage, since it needs to virtualize the ARM CPU through Qemu. But, will it be able to make up with core count and sheer processing power? Here are the numbers. We will compare the Threadripper with the Ampere ARM server, and with my highly optimized Odroid XU4 (good cooling and slight overclock).
     
    First, a single thread 7-zip bench (Decompressing MIPS, higher is better):
    $ 7z b -mmt1 Threadripper (native amd64): 4793 Threadripper (emulating armhf): 1529 Ampere ARM server (native armhf): 2889 Odroid XU4 (native armhf): 2160 As you can see, the single-core performance of the Threadripper is reduced to 1/3 of its natiive performance when emulating through Qemu, leaving it well below the Odroid XU4 and the Ampere.
     
    Now, a real-world use case: let us compile our customized version of Kodi for armhf (compilation time, lower is better):
    $ time cmake --build . -- -j$(nproc --all) Threadripper (emulating armhf): 18m9.696s Ampere (native armhf): 5m50.033s Odroid XU4 (native armhf): 45m50.711s The 32-core ARM server beats here the 64C/128T AMD server for more than three times shorter compile time. And Odroid XU4 gets just slightly above double the compile time of the AMD. If we factor in power consumption, it becomes very clear that compiling in an emulated environment is very suboptimal.
     
    Now, we must remember that for building Armbian images we don't emulate, but instead cross-compile. In that case, the AMD is working natively, and that is another story. In that case, the AMD has absolutely no match with the ARM server, or anything else I ever tested. We will probably post numbers about this in some other opportunity.
  6. Like
    JMCC got a reaction from Werner in Emby Server with hardware transcoding in XU4/HC1/HC2 Armbian Stretch   
    The new version of ffmpeg, compiled for xu4 with HW acceleration, will come soon (hopefully) to the Armbian repos. In the meantime, you can download it from this link. It is completely static, so you can install it on any distro. After that, just install Jellyfin and enter in the "FFmpeg path" field "/opt/ffmpeg-xu4/bin/ffmeg" (or also the symlink, "/usr/local/bin/ffmpeg-xu4")
  7. Like
    JMCC got a reaction from TRS-80 in Emby Server with hardware transcoding in XU4/HC1/HC2 Armbian Stretch   
    The new version of ffmpeg, compiled for xu4 with HW acceleration, will come soon (hopefully) to the Armbian repos. In the meantime, you can download it from this link. It is completely static, so you can install it on any distro. After that, just install Jellyfin and enter in the "FFmpeg path" field "/opt/ffmpeg-xu4/bin/ffmeg" (or also the symlink, "/usr/local/bin/ffmpeg-xu4")
  8. Like
    JMCC got a reaction from TRS-80 in Emby Server with hardware transcoding in XU4/HC1/HC2 Armbian Stretch   
    As a result of all the work that Armbian developers put into the upgrade to kernel 4.14 for the XU4 board family, now we can enjoy many new features. One of them is the access to the SoC video encoding capabilities.
     
    Emby Media Server can take advantage of the Exynos 5422 MFC video engine for transcoding. That means lower CPU usage, lower temperatures, and the possibility of encoding in real time higher resolutions or more simultaneous streams. In my tests, I've been able to transcode one HEVC 1080p and one 480p at the same time, or five 480p (though it will depend on the bitrate of the source material).
     
    However, the ffmpeg version shipped with official Emby is quite unstable when using this feature. For that reason, I compiled a better and more stable version from @memeka's repo. I've been using it for over a month without a single crash.
     
    So this is a step-by step guide on how to make everything work:
     
    0. [PREREQUISITE]: You must be running an Armbian Strech XU4 "Next" image, like the one you can download here.
     
    >> DOWNLOAD the emby and ffmpeg packages from this link << Install them (Note: this will install Emby Server version 3.5.3, which is the last at the writing of this tutorial. It has been tested to work with this version, and may or may not work with any other): $ tar xvf emby-server-stretch-xu4_1.0.tar.xz $ sudo dpkg -i ffmpeg/*.deb $ sudo dpkg -i emby-server/*.deb $ sudo apt -f install  
    Hold the ffmpeg packages, so they don't get upgraded:  
    $ sudo apt-mark hold ffmpeg-doc ffmpeg libavcodec-dev libavcodec-extra libavdevice-dev libavfilter-dev libavfilter-extra libavformat-dev libavresample-dev libavutil-dev libmysofa-dev libmysofa-utils libmysofa0 libpostproc-dev libswresample-dev libswscale-dev  
    Add the user "emby" to the video group, so it can have access to the transcoding engine: $ sudo usermod -aG video emby  
    Modify the emby executable, to use our custom ffmpeg (Note: you will need to repeat this step every time you update the emby deb package): $ sudo nano /opt/emby-server/bin/emby-server # Change the following line: ffmpeg $APP_DIR/bin/ffmpeg \ # to: ffmpeg /usr/bin/ffmpeg \  
    Restart the service:
    $ sudo service emby-server restart  
    Now, you can open the web browser, point to your Emby server (e.g. http://odroidxu4.local:8096), and configure it as described in the official tutorial (https://github.com/MediaBrowser/Wiki/wiki/Installation).
    For last, you need to enable Hardware video transcoding in the web interface. The option is under the "Transcoding" submenu. Don't forget to click on "Save" when you are done:
     
     

     
    And that's it!
     
    As an additional tip, I recommend disabling UPnP in Emby, because it causes the program to crash frequently when enabled (this is just a general recommendation, it has nothing to do with hardware encoding).
     
    Enjoy! And please, share your experiences and comments here.
  9. Like
    JMCC reacted to sgjava in Cross platform high performance GPIO   
    So my whole goal with java-periphery was to create a decent cross platform userspace IO library which I believe I have achieved. One of my other goals was to create a high performance GPIO API that's also cross platform. If you look around there aren't any from what I can tell. Most high performance GPIO libraries support only a few boards. What I've done is use a well known interface (GPIO device) and some MMIO code to detect deltas and generate the required resisters and masks from an input file. This surely beats doing this by hand which I tried at first. The result is code that supports Allwinner H2+/H3 and H5/H6 by using just an input file. I'm going to look at other CPU types next, but this is promising. For more info check out High performance GPIO using MMIO.
  10. Like
    JMCC reacted to piter75 in Armbian v20.11 (Tamandua) Planning Thread   
    I took a shortcut and enabled SPI by default for ROCK Pi 4.
    I verified it with SPI CLK shorted on v1.4 but obviously it was not enough
    The fix is in master.
  11. Like
    JMCC reacted to NicoD in 32-core 3.3Ghz ARM Server Review   
    Today I had the pleasure of benchmarking an ARM64 server.
    This server has been made available for Armbian to test native ARM64 image building.
    I knew nothing about the server. Nobody told me any details.
    So everything was an adventure for me to find out. I got SSH access, so my research began.

    A lscpu informed me it had 32-cores all clocked at 3.3Ghz. 
     
    cat /proc/cpuinfo confirmed these 32-cores
     
    Checking on what kernel we're on. Ubuntu Focal 5.4.0-52-generic. 
    And how much memory. 128GB RAM.

    So first thing I wanted to know, how does one core perform with 7-zip benchmark?
    The record I had seen until now was from the A73 cores from the Odroid N2+ clocked at 2.4Ghz. 2504MIPS decompression.
    So :
    taskset -c 31 7z b
     
    This beats the Odroid N2+ its A73 cores clocked at 2.4Ghz. 2763 vs 2504MIPS decompression. 
    This also tells me these cores do not perform as good per clock as a high performance core. 
    While doing the single core benchmark I checked the sensors to know the wattage and temperature.
     
    CPU power is about 20W for a single core tasks. 
    Without a load the CPU consumes between 10W-15W. So in total it consumes a bit more than 20W in idle.
    Temperature never went under 49C even after +5 minutes in idle. 
     
    Of course, the next thing to do is an all-core 7zip benchmark. 
    This gives an amazing result. Way higher than anything I had ever seen on ARM.
     
    85975MIPS decompression. This is amazing.
    Best I had seen was 11000MIPS of the Odroid N2+. So this server does 8 x better than the N2+. 
    Tho, I must say. 7zip does bad with unequal clusers. The N2+ has a great difference in cluster frequencies. So it performs worse then expected here. 

    The wattage went a lot higher, up to 110W. And the temperature rose quickly up to 75C in seconds.
     
    To test the internet connection I downloaded an Armbian image multiple times. Sometimes it was as low as 3MB/s. 
    Highest average speed I've seen was 12.5MB/s
     
    Next test. BMW Blender render benchmark. 
    Here the fastest I had ever seen was by the Khadas VIM3. That did it in 42m51s.
    I haven't done this yet with the N2+ in Armbian. In Odroid's Ubuntu it was a little slower. I expect it to be a little faster than the VIM3 in Armbian Bionic. 
    This is a tile based test. So every core gets its own task, until all tiles are done. 

    Well, this ARM64 server did this in 8m27s. 
    5 x faster compared to the Khadas VIM3. 

    For this the wattage didn't go over 85W. But the temperature did rise to 83C. So it started to throttle. 

    @lanefu already had done SBC-Bench on it when it was free. So this I didn't have to do myself.
    http://ix.io/2Dcc
    Here we see a lot. For example the CPUMiner did : 81.0kH/s 
    The Odroid N2+                                                         : 14 kH/s         5.7 x less 
    RK3399 does a maximum of                                     : 10.23kH/s     8 x less
    Odroid C2 clocked at 1.75Ghz                                   : 4.65kH/s       17 x less

    So this server clearly can move a lot of bits around. 
    Now, what is this server? Ask google if nobody else tells me. "32 core ARM server 3.3Ghz"
    First answer : https://www.theregister.com/2018/09/18/ampere_shipping/
    That looks like it is this CPU. But still I can't find the exact name. 
    2nd answer : https://www.servethehome.com/ampere-32-core-64-bit-arm-chip-x-gene-3-ip/
    So this is the Ampere 32-core 64-bit from X-Gene 3 IP.

    Here the wikichip : https://en.wikichip.org/wiki/apm/x-gene/apm883832-x3?fbclid=IwAR0ljCQ61DY8Zwh_VyZd0fQH43dmPUTJA-CGLiQKYqU2fWwszFm1CPjH6Zo

    This supports up to 1TB RAM. 8 channels @ 2666Mhz. With a maximum memory bandwidth of 158.95 GiB/s.
    42 lanes of PCIe Gen 3, with 8 controllers
    – x16 or two x8/x4
    – x16 or two x8/x4
    – x8 or two x4
    – Two x1

    4 x SATA Gen 3 ports, 2 x USB2. And a TDP of 125W TDP.

    For me this is just an awesome thing to behold. I use ARM for almost everything.
    The NanoPi M4V2 is my main desktop computer.
    It isn't as powerful as my PC, but does the task for 10 x less power consumption, while being completely silent.

    But when I need a big CPU, it isn't enough.
    Even the more powerful Odroid N2+ isn't powerful enough to render long, +20minutes 1440p video's for example for my Youtube channel.
    So then i need to use my x86/amd64 PC. 

    Today I have seen and tasted the future. 
    While this doesn't use the most modern Cortex/clusters. And it is only 16nm.
    So there is still a lot of room for improvements in performance and lower power consumption. 

    ARM for desktop is possible, and ARM servers for big datacenters is possible(AWS). I have seen the future, I loved every second of it. 

    Here benchmarks compared to my SBCs

     

    Greetings, NicoD
  12. Like
    JMCC got a reaction from gounthar in SBC with proper software support for hardware video transcoding   
    Excellent! Good luck with it.
     
    However, I want to make some clarifications for people following the thread. As I said above, when I talk about transcoding, I am always thinking about HEVC sources, since h.264 sources don't need transcoding. XU4, therefore, will use SW to decode the HEVC, and HW to encode into h.264. That is how acceleration works.
     
    As a matter of fact, if you force ffmpeg in command line to do both HW decoding and encoding (e.g., from h.264 to h.264 in XU4), it will have a negative impact on performance, versus using SW decoding and HW encoding. Again, we are always talking about the XU4 v4l2-m2m encoder, AFAIK the only ARM encoder currently supported by ffmpeg besides RPi.
  13. Like
    JMCC got a reaction from gounthar in SBC with proper software support for hardware video transcoding   
    Good news! I got to compile a highly optimized ffmpeg binary from a current git, with XU4 tweaks, that is able to encode two simultaneous 1080p@25, as long as the source bitrate is not excessive.
     
    You can download and give it a try from here. It is completely static, so you can install it on any distro. It will install the binary in "/opt/ffmpeg-xu4/bin/ffmpeg", and a symlink "/usr/local/bin/ffmpeg-xu4". Therefore, it can be installed along the system ffmpeg without conflicts.
  14. Like
    JMCC got a reaction from TRS-80 in SBC with proper software support for hardware video transcoding   
    Good news! I got to compile a highly optimized ffmpeg binary from a current git, with XU4 tweaks, that is able to encode two simultaneous 1080p@25, as long as the source bitrate is not excessive.
     
    You can download and give it a try from here. It is completely static, so you can install it on any distro. It will install the binary in "/opt/ffmpeg-xu4/bin/ffmpeg", and a symlink "/usr/local/bin/ffmpeg-xu4". Therefore, it can be installed along the system ffmpeg without conflicts.
  15. Like
    JMCC got a reaction from TRS-80 in SBC with proper software support for hardware video transcoding   
    TL;DR: Odroid XU4/HC1/MC1

    Sorry, I didn't see this post before. I have a jellyfin server on a Odroid HC1, working like a charm since long ago. HW transcoding can do 1080p with no problem. I have special ffmpeg packages for Armbian buster, in case you want them.



    Enviado desde mi moto g(6) plus mediante Tapatalk

  16. Like
    JMCC got a reaction from Werner in SBC with proper software support for hardware video transcoding   
    Excellent! Good luck with it.
     
    However, I want to make some clarifications for people following the thread. As I said above, when I talk about transcoding, I am always thinking about HEVC sources, since h.264 sources don't need transcoding. XU4, therefore, will use SW to decode the HEVC, and HW to encode into h.264. That is how acceleration works.
     
    As a matter of fact, if you force ffmpeg in command line to do both HW decoding and encoding (e.g., from h.264 to h.264 in XU4), it will have a negative impact on performance, versus using SW decoding and HW encoding. Again, we are always talking about the XU4 v4l2-m2m encoder, AFAIK the only ARM encoder currently supported by ffmpeg besides RPi.
  17. Like
    JMCC got a reaction from Werner in SBC with proper software support for hardware video transcoding   
    Good news! I got to compile a highly optimized ffmpeg binary from a current git, with XU4 tweaks, that is able to encode two simultaneous 1080p@25, as long as the source bitrate is not excessive.
     
    You can download and give it a try from here. It is completely static, so you can install it on any distro. It will install the binary in "/opt/ffmpeg-xu4/bin/ffmpeg", and a symlink "/usr/local/bin/ffmpeg-xu4". Therefore, it can be installed along the system ffmpeg without conflicts.
  18. Like
    JMCC got a reaction from devman in SBC with proper software support for hardware video transcoding   
    Good news! I got to compile a highly optimized ffmpeg binary from a current git, with XU4 tweaks, that is able to encode two simultaneous 1080p@25, as long as the source bitrate is not excessive.
     
    You can download and give it a try from here. It is completely static, so you can install it on any distro. It will install the binary in "/opt/ffmpeg-xu4/bin/ffmpeg", and a symlink "/usr/local/bin/ffmpeg-xu4". Therefore, it can be installed along the system ffmpeg without conflicts.
  19. Like
    JMCC got a reaction from lanefu in SBC with proper software support for hardware video transcoding   
    Good news! I got to compile a highly optimized ffmpeg binary from a current git, with XU4 tweaks, that is able to encode two simultaneous 1080p@25, as long as the source bitrate is not excessive.
     
    You can download and give it a try from here. It is completely static, so you can install it on any distro. It will install the binary in "/opt/ffmpeg-xu4/bin/ffmpeg", and a symlink "/usr/local/bin/ffmpeg-xu4". Therefore, it can be installed along the system ffmpeg without conflicts.
  20. Like
    JMCC got a reaction from Igor in odroid-N2 wol missing   
    The solution is that it works with Kernel 4.9, it doesn't with 5.9. If you want that feature, then you can download the "legacy" version with the proper kernel: https://redirect.armbian.com/odroidn2/Buster_legacy.torrent
  21. Like
    JMCC got a reaction from gounthar in SBC with proper software support for hardware video transcoding   
    The only HW acceleration that is going to work among Armbian-supported boards is XU4. Besides that, it's Intel or RPi. Or, of course, SW transcoding.
  22. Like
    JMCC got a reaction from Igor in Armbian Donations   
    Nice!
    BTW, someone might have noticed my new "aggresive" signature. In case someone wants to use the same image, or modify it (change the color, etc.), it is available here: https://users.armbian.com/jmcc/images/
  23. Like
    JMCC got a reaction from lanefu in Armbian Donations   
    Nice!
    BTW, someone might have noticed my new "aggresive" signature. In case someone wants to use the same image, or modify it (change the color, etc.), it is available here: https://users.armbian.com/jmcc/images/
  24. Like
    JMCC reacted to Igor in Armbian Donations   
    Board is here 
     

  25. Like
    JMCC got a reaction from gounthar in SBC with proper software support for hardware video transcoding   
    Wait, you mentioned h264. You are aware that h264 normally doesn't need transcoding, right? The server just feeds the stream to the client, so the limit is set by your network connection and/or your storage device. You can stream many h264 streams of any resolution at once, even 4k.
     
    Transcoding is only needed when the file is in a format that the client cannot handle natively. So, for example, if you are using Chrome browser as client, it cannot handle HEVC, and therefore it needs transcoding. But, if your client is Kodi with Jellyfin plugin, then you won't need transcode either for HEVC, since Kodi can handle it.
     
    Since almost all clients support h264, it will normally not be transcoded. And, if you want, you can convert all your videos to h264, and then you can use any SBC as server, even with a weak CPU.