NicoD Posted November 20, 2020 Posted November 20, 2020 Hi all. I again had the pleasure of working with an amazing server. This time the AMD Threadripper 3990X, 64-cores and 128 threads. After last week working on a 32-core ARM server I thought I had seen performance. This is again not comparable with anything before. I again got private SSH access. So I opened 3 terminals. One with HTop, another to check sensors. And the 3th to execute my benchmarks. First thing I saw were the 128-threads. Being used to seeing 6, this was almost unbelievable. Spoiler With light loads it turbo's up to 4.3Ghz. All cores maxed out @ 3Ghz while consuming 400W. Reaching a single core 7zip decompression score of 4545MIPS @ 4.3Ghz. The Ampere 32-core ARM server at 3.3Ghz reached 2763. This again shows the Ampere server doesn't use high performance cores. It doesn't perform great per clock. Coming soon is a benchmark of an AWS server. This uses high performance cores based on the ARM N1 cores. A derivative of the A76. This reaches 3393. This clocked at only 2.5Ghz. So this does perform better per clock. Do know this is comparing peers with bananas(don't want to confuse with apples). And scoring 391809MIPS with 7zip multi-core decompression with default settings. Then with an overclock to 3.9Ghz all cores it consumed +600W. With a 7zip decompression score of 433702MIPS Spoiler 7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,128 CPUs AMD Ryzen Threadripper 3990X 64-Core Processor (830F10),ASM,AES-NI) AMD Ryzen Threadripper 3990X 64-Core Processor (830F10) CPU Freq: - 64000000 64000000 - - - - - - RAM size: 257677 MB, # CPU hardware threads: 128 RAM usage: 28240 MB, # Benchmark threads: 128 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 272228 11672 2269 264824 | 5112130 12467 3499 435912 23: 165008 11150 1508 168123 | 5086525 12595 3496 440122 24: 121378 11578 1128 130506 | 4972483 12594 3467 436364 25: 103873 11805 1005 118599 | 4746979 12362 3419 422410 ---------------------------------- | ------------------------------ Avr: 11551 1478 170513 | 12505 3470 433702 Tot: 12028 2474 302107 This is again so many levels better than the Ampere 32-core ARM server which got 85975MIPS. 32-cores of the AWS graviton2 does 110628. So this AMD server is up to 5 x more powerful when overclocked, than the Ampere 32-core server. Consuming 6 x as much. With normal configuration they both perform almost as well in performance/watt. Spoiler 7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,32 CPUs LE) Ampere 32-core ARM Server LE CPU Freq: - - - - - - - - - RAM size: 128285 MB, # CPU hardware threads: 32 RAM usage: 7060 MB, # Benchmark threads: 32 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 58550 2735 2082 56958 | 1027823 3151 2782 87652 23: 56799 2761 2096 57872 | 1002539 3141 2762 86751 24: 54973 2821 2096 59107 | 976406 3142 2728 85702 25: 52913 2838 2129 60414 | 941588 3120 2685 83795 ---------------------------------- | ------------------------------ Avr: 2789 2101 58588 | 3139 2739 85975 Tot: 2964 2420 72281 In idle the Threadripper sonsumed 100W, what is a lot for doing nothing. The 32-core ARM server only consumed a bit more than 100W maxed out. And about 20W in idle. The BMW Blender benchmark, which takes 29m23s on the fastest ARM SBC the Odroid N2+. The Ampere ARM server did it in 8m27s. For the Threadripper this was a way too light load, it did it in 30s. Even when doing this render 10 x after each other it didn't raise the temperatures much. The maximum I've seen was 50C. To try a heavier load I downloaded the Barber Shop Blender render. This was 6912 tiles to render. But again the Threadripper wasn't impressed by this load. 2m18s79. The AWS with 32-cores (of 64) done this in 8m28s. So this ARM server does compete well per clock for a floating point task with TR. ARM may be great, but AMD is mighty. Intel does not have anything to compete with this. Certainly not performance/watt. It was a pleasure benchmarking this server. I learned a lot, like that I need to find better tools for these amazing machines. The specs of this monster : ASRock Rack TRX40D8-2N2T AMD Ryzen Threadripper 3990x 256GB memory (8 x 32Gb) ECC 2 x 1TB PCI 4.0 Nvme SSD Water Cooling The specs of the Threadripper 3990x 64-cores 128-threads AMD64 Zen2 Matisse 2.9Ghz - 4.3Ghz 4-channel DDR4-3200 MHz 256GB RAM 88 lanes PCIe4 TSMC's 7nm process node 280W - +400W 32 KB L1 per core (64x) 64 x 512 KB L2 256 MB L3 cache shared You can see my full review video here, greetings. NicoD 3
JMCC Posted November 20, 2020 Posted November 20, 2020 Okay, another use case. This one will bring some surprises. Let us imagine we want to compile natively armhf/arm64 binaries. Like, for example, making the new Armbian multimedia packages that we will announce very soon In this case, the Threadripper will be in clear disadvantage, since it needs to virtualize the ARM CPU through Qemu. But, will it be able to make up with core count and sheer processing power? Here are the numbers. We will compare the Threadripper with the Ampere ARM server, and with my highly optimized Odroid XU4 (good cooling and slight overclock). First, a single thread 7-zip bench (Decompressing MIPS, higher is better): $ 7z b -mmt1 Threadripper (native amd64): 4793 Threadripper (emulating armhf): 1529 Ampere ARM server (native armhf): 2889 Odroid XU4 (native armhf): 2160 As you can see, the single-core performance of the Threadripper is reduced to 1/3 of its natiive performance when emulating through Qemu, leaving it well below the Odroid XU4 and the Ampere. Now, a real-world use case: let us compile our customized version of Kodi for armhf (compilation time, lower is better): $ time cmake --build . -- -j$(nproc --all) Threadripper (emulating armhf): 18m9.696s Ampere (native armhf): 5m50.033s Odroid XU4 (native armhf): 45m50.711s The 32-core ARM server beats here the 64C/128T AMD server for more than three times shorter compile time. And Odroid XU4 gets just slightly above double the compile time of the AMD. If we factor in power consumption, it becomes very clear that compiling in an emulated environment is very suboptimal. Now, we must remember that for building Armbian images we don't emulate, but instead cross-compile. In that case, the AMD is working natively, and that is another story. In that case, the AMD has absolutely no match with the ARM server, or anything else I ever tested. We will probably post numbers about this in some other opportunity. 3
johanvdw Posted December 22, 2020 Posted December 22, 2020 Going a bit offtopic, but can you (or one of the server admins) post some info on the bios+kernel used? We have the same motherboard (with the 32 core threadripper) and we see a lot of stability issues which I can not explain yet. Have you seen anything similar?
Werner Posted December 22, 2020 Posted December 22, 2020 1 hour ago, johanvdw said: post some info on the bios+kernel used? @Igor
Igor Posted December 22, 2020 Posted December 22, 2020 1 hour ago, johanvdw said: We have the same motherboard (with the 32 core threadripper) and we see a lot of stability issues which I can not explain yet. Have you seen anything similar? We use the one and only / latest BIOS v1.1. OS is unchanged Ubuntu Mate 20.04 LTS, Linux 5.4.0-58-generic which overall proved to be fine. We don't experience general stability issues - system works stable, it doesn't crash but this is the list of the problems: - nvme drives sometimes doesn't init in PCI4.0 and are slow like a normal SSD - there was a plan to boot from dual SATA dom in hw raid1. It doesn't work, we boot from one. - UEFI boot doesn't work, legacy only - fan control doesn't work at all - 2.5G network unstable I hope there will be some update on BIOS, but we at least can use it fairly normal. Overclocking capabilities exceeds expectations unlike the +300W power consumption
johanvdw Posted December 22, 2020 Posted December 22, 2020 Thanks for your reply. We experienced problems with both the 1.10 bios and 1.12 (from asrockrack ftp, a bit vague, I wouldn't use it if you have no issues). I see random crashes and kernel traces in dmesg. I think it may be bad memory, but I failed to get memtest86 or memtest86+ working. As I remembered you had more or less a similar setup I was curious if you had any similar issues. Luckily you don't :-)
Igor Posted December 22, 2020 Posted December 22, 2020 11 minutes ago, johanvdw said: I think it may be bad memory We couldn't purchase the exact memory modules which we initially planned, but similar ones: KSM32ED8/32ME Kingston Server Premier - DDR4 - 32 GB - DIMM 288-PIN Kingston Technology KSM32ED8/32ME. Komponente für: PC / Server, RAM-Speicher: 32 GB, Interner Speichertyp: DDR4, Speichertaktfrequenz: 3200 MHz, Memory Formfaktor: 288-pin DIMM, CAS Latenz: 22, ECC That are running at their stock speed. Nothing has been changed in the BIOS regarding memory. I was only playing with AMD override, but that is now back on stock.
johanvdw Posted December 22, 2020 Posted December 22, 2020 That's exactly what we have KSM32ED8/32ME https://azerty.nl/product/kingston/4310306/server-premier-ddr4-module It is actually quite interesting we came up with more or less the same build.
Igor Posted December 22, 2020 Posted December 22, 2020 7 minutes ago, johanvdw said: It is actually quite interesting we came up with more or less the same build. Then it must be the PSU Downgrade bios, reset to defaults and start over. Ubuntu mate (since it comes with HWE kernel by default, server images don't), SATA boot drive, NVME's are MP600 in soft raid 0.
johanvdw Posted December 22, 2020 Posted December 22, 2020 I'm using debian buster with kernel 5.9 (backports). I could try using ubuntu. Anyway, everything works quite well, but after approx one day at least one kernel trace ends up in the logs. The PSU is not the one I wanted (hard to find ones with a lot of capacity currently), but 850W should still be sufficient I guess (there is only an nvme in the machine (samsung 970 evo). Also I would suspect that power problems would lead to reboots instead of just single programs failing.
Igor Posted December 22, 2020 Posted December 22, 2020 8 minutes ago, johanvdw said: I'm using debian buster with kernel 5.9 (backports). I could try using ubuntu. Do that 1st. I tried many variants, also tried Arch out of despair when trying to get NVME working at full speed, which could be BIOS problem ... since it actually work, just sometimes doesn't. At the end settled with Ubuntu 20.04 Mate. Light desktop was anyway planned. 8 minutes ago, johanvdw said: but 850W should still be sufficient Absolutely. Ours is on (a quality) 650W (stronger is in another machine and swap was planned) and it didn't burst into the flames when overclocked to max. But UPS started to sing an overload song when I tried that. 1
johanvdw Posted April 11, 2021 Posted April 11, 2021 A final note in case someone else bumps on something similar: stability problems persisted, so we returned the computer and the shop switched both CPU and motherboard. We have been running without issues since that. 1
Recommended Posts