CPU Tests: Microbenchmarks

Core-to-Core Latency

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test built by Andrei, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

On all our Threadripper Pro CPUs, we saw:

  • a thread-to-thread latency of 7ns,
  • a core-to-core in the same CCX latency as 17-18 nanoseconds,
  • a core-to-core in a different CCX scale from 80 ns with no IO die hops to 113 with 3 IO die hops

Here we can distinuguish how long it takes for threads to ping back and forth with cores that are different hops across the IO die.

A y-Cruncher Sprint

The y-cruncher website has a large about of benchmark data showing how different CPUs perform to calculate specific values of pi. Below these there are a few CPUs where it shows the time to compute moving from 25 million digits to 50 million, 100 million, 250 million, and all the way up to 10 billion, to showcase how the performance scales with digits (assuming everything is in memory). This range of results, from 25 million to 250 billion, is something I’ve dubbed a ‘sprint’.

I have written some code in order to perform a sprint on every CPU we test. It detects the DRAM, works out the biggest value that can be calculated with that amount of memory, and works up from 25 million digits. For the tests that go up to the ~25 billion digits, it only adds an extra 15 minutes to the suite for an 8-core Ryzen CPU.

With this test, we can see the effect of increasing memory requirements on the workload and the scaling factor for a workload such as this. We're plotting milllions of digits calculated per second.

The 64C/64T processor obtains the peak efficiency here, although as more digits are calculated, the memory requirements come into play.

CPU Tests: SPEC Conclusion
Comments Locked

98 Comments

View All Comments

  • Thanny - Thursday, July 15, 2021 - link

    Your Blender results for the 3960X are off by a lot. I rendered the same scene with mine in 173 seconds. That's with PBO enabled, so it'll be a bit faster than stock, but not 20% faster.

    My guess is that you didn't warm Blender up properly first. When starting a render for the first time, it has to do some setup work, which is timed with the rest of the render, but only needs to be done once.

    I'd expect a stock 3960X to be in the neighborhood of 180 seconds.
  • 29a - Thursday, July 15, 2021 - link

    "Firstly, because we need an AI benchmark, and a bad one is still better than not having one at all."

    I 100% disagree with this statement. Bad data is worse than no data at all.
  • arashi - Saturday, July 17, 2021 - link

    But but but what about the few (<10) clicks they'd lose for not having lousy CPU based AI benchmarks!
  • willis936 - Thursday, July 15, 2021 - link

    Availability of entry level ECC CPUs (AMD pro and Intel Xeon E-2200/W) is really low. It's unfortunate. People don't have the cash for $10k systems right now but the need for ECC has only gone up. I hope for more editorials calling for mainstream ECC.
  • Threska - Thursday, July 15, 2021 - link

    Linus is mainstream enough.

    https://arstechnica.com/gadgets/2021/01/linus-torv...
  • Mikewind Dale - Thursday, July 15, 2021 - link

    At least mainstream desktop Ryzens tend to support ECC, even if not officially validated.

    What frustrates me is that laptop Ryzens don't support ECC at all - not even the Ryzen Pros.

    Every Ryzen Pro laptop I've seen lacks ECC support, and some of them even have non-ECC memory soldered to the motherboard.

    If you want an ECC laptop, it appears you have literally no choice at all but a Xeon laptop for $5,000.
  • mode_13h - Friday, July 16, 2021 - link

    > laptop Ryzens don't support ECC at all - not even the Ryzen Pros.

    It probably depends on the laptop. If its motherboard doesn't have the extra traces for the ECC bits, then of course it won't.
  • Mikewind Dale - Saturday, July 17, 2021 - link

    It depends on the laptop, yes. But I haven't found a single Ryzen Pro laptop from a single company that supports ECC.

    AMD's website ("Where to Buy AMD Ryzen™ PRO Powered Laptops") lists HP ProBook, HP EliteBook, and Lenovo Thinkpad. But none of them support ECC.
  • mode_13h - Saturday, July 17, 2021 - link

    > I haven't found a single Ryzen Pro laptop from a single company that supports ECC.

    Thanks for the datapoint. Maybe someone will buck the trend, but it's also possible they judged the laptop users who really care about ECC would also prefer a dGPU and therefore won't be using APUs.
  • mode_13h - Friday, July 16, 2021 - link

    > I hope for more editorials calling for mainstream ECC.

    You'll probably just get inferior in-band ECC.

Log in

Don't have an account? Sign up now