Jump to content

Recommended Posts

Posted

Sorry in advance, but this is a bit of a dive. But, I haven't seen this fully captured and explained anywhere, and think this would be a useful thing to know for anyone who's doing arm64 development on a PC, which seems very relevant here ...

 

I started pulling this thread when moving some software from Ubuntu Jammy to Noble, where the arm64 build process went from 20 mins to 50 mins. When I was done, it below to 15 min, and all I did was poke some flags for qemu's emulated CPU.

 

The TL;DR version is that setting the environment variable QEMU_CPU to something like "cortex-a53" or "max,pauth-impdef=on" or even what I settled on, "max,pauth=off" saves massive amounts of time, and should be used anywhere you are running debootstrap, apt, or python in an emulated environment. However, you can't blindly set it everywhere because these flags are not defined for other architectures and will cause a hard stop when emulating PC or RISC-V. This has also shown up in the Armbian Build System.

 

If you want to see this first hand, use the example here. Just save the following in test.sh:

 

echo -n "testing...  "
for i in $(seq 2 10000); do
    is_prime=1
    j=2
    while ((j*j <= i)); do
        if (( i % j == 0)); then
            is_prime=0
            break
        fi
        ((j++))
    done
    if (( is_prime == 1 )); then
        echo $i > /dev/null
    fi
done
echo "done"

 

And then run it in docker after ensuring a few things are installed ...

 

~ $ sudo apt install binfmt-support qemu-user-static

~ $ time docker run --rm -it --platform linux/arm64 -v .:/test ubuntu:noble /test/test.sh
testing...  done

real	0m37.620s
user	0m0.013s
sys	0m0.022s

~ $ time docker run --rm -it --platform linux/arm64 -v .:/test ubuntu:jammy /test/test.sh
testing...  done

real	0m4.700s
user	0m0.011s
sys	0m0.023s

 

And we can wrestle that performance back by fiddling with QEMU flags ...

 

~ $ time docker run --rm -it --platform linux/arm64 -v .:/test --env QEMU_CPU=max,pauth=off ubuntu:noble /test/test.sh
testing...  done

real	0m4.694s
user	0m0.011s
sys	0m0.024s

 

So qemu bug? Not quite. The qemu emulator is a host application, and is the same both jammy and noble docker images, and I think the root cause was found here, and first appears in Ubuntu Lunar (23.10). The short version looks like gcc's stack protection logic wasn't operating as expected as the stack layout is a little different than it is on PC, a CVE was filed, and the "fix" is now stressing a slow code path in qemu. For the record, my heart goes out to "steev" and his slow, hours-per-bisect march to the answer.

 

Pulling that thread a bit more, the QEMU Documentation has the following to say on the subject of arm64 pointer authentication:

 

Quote

The architected QARMA5 and QARMA3 algorithms have good cryptographic properties, but can be quite slow to emulate. The impdef algorithm used by QEMU is non-cryptographic but significantly faster.

 

The qemu docs also suggest that the qemu impdef algorithm is the default, but I've not seen this on my version. It's possible this may be addressed in a much newer version of qemu, but that's not available in the Noble repos. It could be overridden via  tonistiigi/binfmt (but I've not yet tested that).

 

For what it's worth, it's not possible to just add a simple wrapper to qemu fix this either, as seen in this Github Example, and that's due to the Linux Kernel Binfmt Interface. The 'F' flag forces the kernel to store a handle to the specified emulator, and make it available in chroot and Docker contexts, and that doesn't help if it's only the wrapper it grabs and not the emulator binary. Similarly, it's not possible to pass additional flags via this interface, so without a change to the qemu binary, the QEMU_CPU environment variable may be the only way to work around this immediately.

 

The other curious bit is what does QEMU_CPU=cortex-a53 enable to recover qemu speed? The answer is absolutely nothing. it just turns CPU features off. At a glance, I'm not sure if that something qemu is doing indirectly, or glibc conditionally enables at runtime. If anyone knows better than I here, please drop a comment.

 

The curious can check can via:

 

~ $ docker run --rm -it --platform linux/arm64 ubuntu:noble cat /proc/cpuinfo
processor	: 0
model name	: ARMv8 Processor rev 0 (v8l)
BogoMIPS	: 100.00
Features	: fp asimd aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svef32mm svef64mm svebf16 i8mm bf16 rng bti mte mte3 sme smei16i64 smef64f64 smei8i32 smef16f32 smeb16f32 smef32f32 smefa64 mops hbc
CPU implementer	: 0x00
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x051
CPU revision	: 0

~ $ docker run --rm -it --platform linux/arm64 --env QEMU_CPU=cortex-a53 ubuntu:noble cat /proc/cpuinfo
processor	: 0
model name	: ARMv8 Processor rev 4 (v8l)
BogoMIPS	: 100.00
Features	: fp asimd aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines