CPU Tests: Microbenchmarks

Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

On all our Threadripper Pro CPUs, we saw:

  • a thread-to-thread latency of 7ns,
  • a core-to-core in the same CCX latency as 17-18 nanoseconds,
  • a core-to-core in a different CCX scale from 80 ns with no IO die hops to 113 with 3 IO die hops

Here we can distinuguish how long it takes for threads to ping back and forth with cores that are different hops across the IO die.

A y-Cruncher Sprint

The y-cruncher website has a large about of benchmark data showing how different CPUs perform to calculate specific values of pi. Below these there are a few CPUs where it shows the time to compute moving from 25 million digits to 50 million, 100 million, 250 million, and all the way up to 10 billion, to showcase how the performance scales with digits (assuming everything is in memory). This range of results, from 25 million to 250 billion, is something I’ve dubbed a ‘sprint’.

I have written some code in order to perform a sprint on every CPU we test. It detects the DRAM, works out the biggest value that can be calculated with that amount of memory, and works up from 25 million digits. For the tests that go up to the ~25 billion digits, it only adds an extra 15 minutes to the suite for an 8-core Ryzen CPU.

With this test, we can see the effect of increasing memory requirements on the workload and the scaling factor for a workload such as this. We're plotting milllions of digits calculated per second.

The 64C/64T processor obtains the peak efficiency here, although as more digits are calculated, the memory requirements come into play.

CPU Tests: SPEC Conclusion
Comments Locked

98 Comments

View All Comments

  • mode_13h - Saturday, July 17, 2021 - link

    Sometimes they do show it. I wonder why not, this time?

    One thing to note is how some of the same applications they benchmark in standalone tests are *also* included in SPEC17. So, those tests can get over-represented.
  • Blastdoor - Wednesday, July 14, 2021 - link

    Re:

    "We are patiently waiting for AMD to launch Threadripper versions with Zen 3 – we hoped it would have been at Computex in June, but now we’re not sure exactly when."

    Maybe it will happen when Intel offers something remotely competitive in this market? Or maybe when supply constraints ease and AMD can fully meet demand?
  • mode_13h - Wednesday, July 14, 2021 - link

    Chagall (Threadripper 5000-series) is rumored to launch in August.
  • Threska - Wednesday, July 14, 2021 - link

    Long as AMD sticks to the same socket the platform should have longevity just like AM4.
  • Bernecky - Wednesday, July 14, 2021 - link

    Your "AMD Comparison" shows Threadripper DRAM as: 4×DDR4-3200.
    This is incorrect: I have a 3970X running on an ASUS ROG Zenith Extreme II Alpha, with
    256GB: 8×DDR4-3600(OC slightly).

    The Alpha no longer appears on the ASUS web site. Not sure what happened to it.
  • JMC2000 - Wednesday, July 14, 2021 - link

    The "4xDDR-3200" is referencing 4 channels @ a non-overclocked speed of 3200; what you have is 8 DDR4 DIMMs in 4 channels.
  • Railgun - Sunday, July 18, 2021 - link

    Still here on the UK site.

    https://www.asus.com/uk/Motherboards-Components/Mo...
  • Oxford Guy - Wednesday, July 14, 2021 - link

    ‘The only downside to EPYC is that it can only be used in single socket systems, and the peak memory support is halved (from 4 TB to 2 TB).’

    Eh?

    I assume you meant TR Pro. A big downside is that it’s Zen 2.
  • Thanny - Wednesday, July 14, 2021 - link

    Ryzen and Threadripper support ECC memory just fine. It's only registered memory that isn't supported, which is why you can only get 128GB into a Ryzen platform and 256GB into a Threadripper platform (32GB is the largest unbuffered DIMM you can get).

    The motherboard must also support it, which not all Ryzen motherboards do. But all Threadripper boards support ECC. I'm using 128GB of unbuffered ECC right now with a 3960X.
  • willis936 - Thursday, July 15, 2021 - link

    >Ryzen and Threadripper support ECC memory just fine

    A common misconception. Error reporting does not work with any AM4 chipset on non-pro AMD processors. Sure you have ECC, maybe. How do you know the soft error rate isn't massive?

Log in

Don't have an account? Sign up now