AMD Threadripper 3990X Armbian Build Server Review


NicoD
 Share

1 1

Recommended Posts

Hi all. 
I again had the pleasure of working with an amazing server. This time the AMD Threadripper 3990X, 64-cores and 128 threads.
After last week working on a 32-core ARM server I thought I had seen performance.

This is again not comparable with anything before.

 

I again got private SSH access. So I opened 3 terminals. One with HTop, another to check sensors. And the 3th to execute my benchmarks.
First thing I saw were the 128-threads. Being used to seeing 6, this was almost unbelievable.

Spoiler

image.thumb.png.e8954d0c7e4d525a6a747c16b932ccfa.pngimage.thumb.png.7f75430b384c568856ee88891d063a5e.png


With light loads it turbo's up to 4.3Ghz. All cores maxed out @ 3Ghz while consuming 400W.
Reaching a single core 7zip decompression score of 4545MIPS @ 4.3Ghz.
The Ampere 32-core ARM server at 3.3Ghz reached 2763.
This again shows the Ampere server doesn't use high performance cores. It doesn't perform great per clock.

Coming soon is a benchmark of an AWS server. This uses high performance cores based on the ARM N1 cores. A derivative of the A76.
This reaches 3393. This clocked at only 2.5Ghz. So this does perform better per clock. Do know this is comparing peers with bananas(don't want to confuse with apples).

And scoring 391809MIPS with 7zip multi-core decompression with default settings.

Then with an overclock to 3.9Ghz all cores it consumed +600W. With a 7zip decompression score of 433702MIPS

Spoiler








7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,128 CPUs AMD Ryzen Threadripper 3990X 64-Core Processor  (830F10),ASM,AES-NI)

AMD Ryzen Threadripper 3990X 64-Core Processor  (830F10)
CPU Freq: - 64000000 64000000 - - - - - -

RAM size:  257677 MB,  # CPU hardware threads: 128
RAM usage:  28240 MB,  # Benchmark threads:    128

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:     272228 11672   2269 264824  |    5112130 12467   3499 435912
23:     165008 11150   1508 168123  |    5086525 12595   3496 440122
24:     121378 11578   1128 130506  |    4972483 12594   3467 436364
25:     103873 11805   1005 118599  |    4746979 12362   3419 422410
----------------------------------  | ------------------------------
Avr:           11551   1478 170513  |            12505   3470 433702
Tot:           12028   2474 302107

 

This is again so many levels better than the Ampere 32-core ARM server which got 85975MIPS. 32-cores of the AWS graviton2 does 110628.
So this AMD server is up to 5 x more powerful when overclocked, than the Ampere 32-core server. Consuming 6 x as much. 
With normal configuration they both perform almost as well in performance/watt.

Spoiler








7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,32 CPUs LE)
Ampere 32-core ARM Server
LE
CPU Freq: - - - - - - - - -

RAM size:  128285 MB,  # CPU hardware threads:  32
RAM usage:   7060 MB,  # Benchmark threads:     32

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:      58550  2735   2082  56958  |    1027823  3151   2782  87652
23:      56799  2761   2096  57872  |    1002539  3141   2762  86751
24:      54973  2821   2096  59107  |     976406  3142   2728  85702
25:      52913  2838   2129  60414  |     941588  3120   2685  83795
----------------------------------  | ------------------------------
Avr:            2789   2101  58588  |             3139   2739  85975
Tot:            2964   2420  72281

 


In idle the Threadripper sonsumed 100W, what is a lot for doing nothing.
The 32-core ARM server only consumed a bit more than 100W maxed out. And about 20W in idle.

The BMW Blender benchmark, which takes 29m23s on the fastest ARM SBC the Odroid N2+. The Ampere ARM server did it in 8m27s.

For the Threadripper this was a way too light load, it did it in 30s. 

Even when doing this render 10 x after each other it didn't raise the temperatures much. The maximum I've seen was 50C.

To try a heavier load I downloaded the Barber Shop Blender render. This was 6912 tiles to render. But again the Threadripper wasn't impressed by this load. 2m18s79. The AWS with 32-cores (of 64) done this in 8m28s. So this ARM server does compete well per clock for a floating point task with TR.

ARM may be great, but AMD is mighty. Intel does not have anything to compete with this. Certainly not performance/watt. 
It was a pleasure benchmarking this server. 
I learned a lot, like that I need to find better tools for these amazing machines.  

 

The specs of this monster :
 

  • ASRock Rack TRX40D8-2N2T
  • AMD Ryzen Threadripper 3990x
  • 256GB memory (8 x 32Gb) ECC
  • 2 x 1TB PCI 4.0 Nvme SSD
  • Water Cooling

 

The specs of the Threadripper 3990x

 

  • 64-cores 128-threads AMD64
  • Zen2 Matisse
  • 2.9Ghz - 4.3Ghz
  • 4-channel DDR4-3200 MHz
  • 256GB RAM
  • 88 lanes PCIe4
  • TSMC's 7nm process node
  • 280W - +400W
  • 32 KB L1 per core (64x)
  • 64 x 512 KB L2
  • 256 MB L3 cache shared


You can see my full review video here, greetings.
NicoD


 

Link to post
Share on other sites

  • NicoD changed the title to AMD Threadripper 3990X Armbian Build Server Review
Armbian is a community driven open source project. Do you like to contribute your code?

Okay, another use case. This one will bring some surprises.

 

Let us imagine we want to compile natively armhf/arm64 binaries. Like, for example, making the new Armbian multimedia packages that we will announce very soon ;)

 

In this case, the Threadripper will be in clear disadvantage, since it needs to virtualize the ARM CPU through Qemu. But, will it be able to make up with core count and sheer processing power? Here are the numbers. We will compare the Threadripper with the Ampere ARM server, and with my highly optimized Odroid XU4 (good cooling and slight overclock).

 

First, a single thread 7-zip bench (Decompressing MIPS, higher is better):

$ 7z b -mmt1

Threadripper (native amd64):		4793
Threadripper (emulating armhf):		1529
Ampere ARM server (native armhf):	2889
Odroid XU4 (native armhf):		2160

As you can see, the single-core performance of the Threadripper is reduced to 1/3 of its natiive performance when emulating through Qemu, leaving it well below the Odroid XU4 and the Ampere.

 

Now, a real-world use case: let us compile our customized version of Kodi for armhf (compilation time, lower is better):

$ time cmake --build . -- -j$(nproc --all)

Threadripper (emulating armhf):		18m9.696s
Ampere (native armhf):			5m50.033s
Odroid XU4 (native armhf):		45m50.711s

The 32-core ARM server beats here the 64C/128T AMD server for more than three times shorter compile time. And Odroid XU4 gets just slightly above double the compile time of the AMD. If we factor in power consumption, it becomes very clear that compiling in an emulated environment is very suboptimal.

 

Now, we must remember that for building Armbian images we don't emulate, but instead cross-compile. In that case, the AMD is working natively, and that is another story. In that case, the AMD has absolutely no match with the ARM server, or anything else I ever tested. We will probably post numbers about this in some other opportunity.

Link to post
Share on other sites

1 hour ago, johanvdw said:

We have the same motherboard (with the 32 core threadripper) and we see a lot of stability issues which I can not explain yet. Have you seen anything similar?


We use the one and only / latest BIOS v1.1. OS is unchanged Ubuntu Mate 20.04 LTS, Linux 5.4.0-58-generic which overall proved to be fine. We don't experience general stability issues - system works stable, it doesn't crash but this is the list of the problems:
- nvme drives sometimes doesn't init in PCI4.0 and are slow like a normal SSD

- there was a plan to boot from dual SATA dom in hw raid1. It doesn't work, we boot from one.

- UEFI boot doesn't work, legacy only

- fan control doesn't work at all

- 2.5G network unstable

 

I hope there will be some update on BIOS, but we at least can use it fairly normal.

 

Overclocking capabilities exceeds expectations unlike the +300W power consumption :)

Link to post
Share on other sites

Thanks for your reply.

We experienced problems with both the 1.10 bios and 1.12 (from asrockrack ftp, a bit vague, I wouldn't use it if you have no issues). I see random crashes and kernel traces in dmesg.

I think it may be bad memory, but I failed to get memtest86 or memtest86+ working.

As I remembered you had more or less a similar setup I was curious if you had any similar issues. Luckily you don't :-)

Link to post
Share on other sites

11 minutes ago, johanvdw said:

I think it may be bad memory


We couldn't purchase the exact memory modules which we initially planned, but similar ones:

KSM32ED8/32ME Kingston Server Premier - DDR4 - 32 GB - DIMM 288-PIN
Kingston Technology KSM32ED8/32ME. Komponente für: PC / Server,
RAM-Speicher: 32 GB, Interner Speichertyp: DDR4, Speichertaktfrequenz:
3200 MHz, Memory Formfaktor: 288-pin DIMM, CAS Latenz: 22, ECC

 

That are running at their stock speed. Nothing has been changed in the BIOS regarding memory. I was only playing with AMD override, but that is now back on stock.

Link to post
Share on other sites

7 minutes ago, johanvdw said:

It is actually quite interesting we came up with more or less the same build.


Then it must be the PSU :) 

 

Downgrade bios, reset to defaults and start over. Ubuntu mate (since it comes with HWE kernel by default, server images don't), SATA boot drive, NVME's are MP600 in soft raid 0.

Link to post
Share on other sites

I'm using debian buster with kernel 5.9 (backports). I could try using ubuntu.

Anyway, everything works quite well, but after approx one day at least one kernel trace ends up in the logs.

 

The PSU is not the one I wanted (hard to find ones with a lot of capacity currently), but 850W should still be sufficient I guess (there is only an nvme in the machine (samsung 970 evo).

 

Also I would suspect that power problems would lead to reboots instead of just single programs failing.

Link to post
Share on other sites

8 minutes ago, johanvdw said:

I'm using debian buster with kernel 5.9 (backports). I could try using ubuntu.

 

Do that 1st. I tried many variants, also tried Arch out of despair when trying to get NVME working at full speed, which could be BIOS problem ... since it actually work, just sometimes doesn't. At the end settled with Ubuntu 20.04 Mate. Light desktop was anyway planned.

 

8 minutes ago, johanvdw said:

but 850W should still be sufficient


Absolutely. Ours is on (a quality) 650W (stronger is in another machine and swap was planned) and it didn't burst into the flames when overclocked to max. But UPS started to sing an overload song when I tried that.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

1 1