tparys Posted 1 hour ago Posted 1 hour ago Sorry in advance, but this is a bit of a dive. But, I haven't seen this fully captured and explained anywhere, and think this would be a useful thing to know for anyone who's doing arm64 development on a PC, which seems very relevant here ... I started pulling this thread when moving some software from Ubuntu Jammy to Noble, where the arm64 build process went from 20 mins to 50 mins. When I was done, it below to 15 min, and all I did was poke some flags for qemu's emulated CPU. The TL;DR version is that setting the environment variable QEMU_CPU to something like "cortex-a53" or "max,pauth-impdef=on" or even what I settled on, "max,pauth=off" saves massive amounts of time, and should be used anywhere you are running debootstrap, apt, or python in an emulated environment. However, you can't blindly set it everywhere because these flags are not defined for other architectures and will cause a hard stop when emulating PC or RISC-V. This has also shown up in the Armbian Build System. If you want to see this first hand, use the example here. Just save the following in test.sh: echo -n "testing... " for i in $(seq 2 10000); do is_prime=1 j=2 while ((j*j <= i)); do if (( i % j == 0)); then is_prime=0 break fi ((j++)) done if (( is_prime == 1 )); then echo $i > /dev/null fi done echo "done" And then run it in docker after ensuring a few things are installed ... ~ $ sudo apt install binfmt-support qemu-user-static ~ $ time docker run --rm -it --platform linux/arm64 -v .:/test ubuntu:noble /test/test.sh testing... done real 0m37.620s user 0m0.013s sys 0m0.022s ~ $ time docker run --rm -it --platform linux/arm64 -v .:/test ubuntu:jammy /test/test.sh testing... done real 0m4.700s user 0m0.011s sys 0m0.023s And we can wrestle that performance back by fiddling with QEMU flags ... ~ $ time docker run --rm -it --platform linux/arm64 -v .:/test --env QEMU_CPU=max,pauth=off ubuntu:noble /test/test.sh testing... done real 0m4.694s user 0m0.011s sys 0m0.024s So qemu bug? Not quite. The qemu emulator is a host application, and is the same both jammy and noble docker images, and I think the root cause was found here, and first appears in Ubuntu Lunar (23.10). The short version looks like gcc's stack protection logic wasn't operating as expected as the stack layout is a little different than it is on PC, a CVE was filed, and the "fix" is now stressing a slow code path in qemu. For the record, my heart goes out to "steev" and his slow, hours-per-bisect march to the answer. Pulling that thread a bit more, the QEMU Documentation has the following to say on the subject of arm64 pointer authentication: Quote The architected QARMA5 and QARMA3 algorithms have good cryptographic properties, but can be quite slow to emulate. The impdef algorithm used by QEMU is non-cryptographic but significantly faster. The qemu docs also suggest that the qemu impdef algorithm is the default, but I've not seen this on my version. It's possible this may be addressed in a much newer version of qemu, but that's not available in the Noble repos. It could be overridden via tonistiigi/binfmt (but I've not yet tested that). For what it's worth, it's not possible to just add a simple wrapper to qemu fix this either, as seen in this Github Example, and that's due to the Linux Kernel Binfmt Interface. The 'F' flag forces the kernel to store a handle to the specified emulator, and make it available in chroot and Docker contexts, and that doesn't help if it's only the wrapper it grabs and not the emulator binary. Similarly, it's not possible to pass additional flags via this interface, so without a change to the qemu binary, the QEMU_CPU environment variable may be the only way to work around this immediately. The other curious bit is what does QEMU_CPU=cortex-a53 enable to recover qemu speed? The answer is absolutely nothing. it just turns CPU features off. At a glance, I'm not sure if that something qemu is doing indirectly, or glibc conditionally enables at runtime. If anyone knows better than I here, please drop a comment. The curious can check can via: ~ $ docker run --rm -it --platform linux/arm64 ubuntu:noble cat /proc/cpuinfo processor : 0 model name : ARMv8 Processor rev 0 (v8l) BogoMIPS : 100.00 Features : fp asimd aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svef32mm svef64mm svebf16 i8mm bf16 rng bti mte mte3 sme smei16i64 smef64f64 smei8i32 smef16f32 smeb16f32 smef32f32 smefa64 mops hbc CPU implementer : 0x00 CPU architecture: 8 CPU variant : 0x0 CPU part : 0x051 CPU revision : 0 ~ $ docker run --rm -it --platform linux/arm64 --env QEMU_CPU=cortex-a53 ubuntu:noble cat /proc/cpuinfo processor : 0 model name : ARMv8 Processor rev 4 (v8l) BogoMIPS : 100.00 Features : fp asimd aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 0 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.