Core-to-Core Latency: Zen 5 Strix Point Vs. Zen 4 Phoenix Point

As the core count of modern CPUs is growing, we are reaching a time when the time to access each core from a different core is no longer a constant. Even before the advent of heterogeneous SoC designs, processors built on large rings or meshes can have different latencies to access the nearest core compared to the furthest core. This rings true especially in multi-socket server environments.

But modern CPUs, even desktop and consumer CPUs, can have variable access latency to get to another core. For example, in the first generation Threadripper CPUs, we had four chips on the package, each with 8 threads, and each with a different core-to-core latency depending on if it was on-die or off-die. This gets more complex with products like Lakefield, which has two different communication buses depending on which core is talking to which.

If you are a regular reader of AnandTech’s CPU reviews, you will recognize our Core-to-Core latency test. It’s a great way to show exactly how groups of cores are laid out on the silicon. This is a custom in-house test, and we know there are competing tests out there, but we feel ours is the most accurate to how quick an access between two cores can happen.

In this core-to-core latency analysis, we'll be comparing the AMD Ryzen AI 9 HX 370, which uses a combination of full-size Zen 5 cores with the more compact Zen 5c cores, directly to its predecessor, the Zen 4 Ryzen 9 7940HS.

Looking at the core-to-core latencies of the AMD Ryzen AI 9 HX 370, we can see that AMD has changed the core structure from previous generations. AMD, for Strix Point, has restructured the core architecture compared to what we've seen on Zen 4 (Phoenix Point), with the full-sized Zen 5 cores and the smaller Zen 5 cores being placed on separate core complexes. This change in the fundamental architecture of previous generations does impact latency regarding the efficiency of inter-core communication.

Although on a separate CCX from the full-sized Zen 5 cores, the compact Zen 5c cores are present on the same die. Observing the inter-core latencies of the full-sized Zen 5 cores, which are shown as cores 0-7 in the above chart, range from 19.6 ns to 24.5 ns, which shows that the Zen 5 cores are communicating efficiently with each other on the core complex.

Communicating outside of the Zen 5 cluster and contacting the Zen 5c cores, the latency inherently increases to between 163.7 and 188.7 ns, which shows a significant latency penalty and much more than we anticipated. This can have a negative impact on workloads that potentially require frequent cross-cluster data exchanges. The Zen 5c cluster itself within the complex has higher latencies per each L1 access point, which shows there's an overhead within the pathway when communicating cross-cluster.

Having multiple clusters of cores on the same die does pose significant penalties across the longer pathways, especially when hopping out of the cluster to contact cores on the other. The latencies vary massively depending on the length of the pathway. Each cluster (Zen 5 + Zen 5c) operates efficiently on its own and poses an added level of complexity, but it is more of a trade-off when using multiple core clusters on a single package.

ASUS Zenbook S 16: Power Consumption & Performance Modes SPEC2017 Single And Multi-Threaded Results
Comments Locked

72 Comments

View All Comments

  • Dante Verizon - Sunday, July 28, 2024 - link

    Why are you comparing an ultra-thin design to a CPU with PL2 at almost 90w? The notebookcheck tests show that the Zenbook runs up to 50% slower than the ProArt chassy.
  • Ryan Smith - Sunday, July 28, 2024 - link

    Sorry, which notebook are you referring to? We have multiple Zenbooks here.
  • Terry_Craig - Sunday, July 28, 2024 - link

    He's probably talking about the Zenbook in the review: https://www.notebookcheck.net/Asus-Zenbook-S-16-la...

    Strix performs much worse on the Zenbook than inside the ProArt, probably due to more aggressive power and temperature management.
  • Ryan Smith - Sunday, July 28, 2024 - link

    It's definitely not a high performance chassis, despite being 16-inches. The default TDP is just 17 Watts; AMD asked reviewers to bump it up to 28W.

    But this is what AMD sent out for review. Given the wide range of laptop TDPs out there, these review unit laptops can never cover the full spectrum. So it's more a reflection of what power level/form factor the chipmaker is choosing to prioritize in this generation.
  • The Hardcard - Sunday, July 28, 2024 - link

    What is the 90W laptop in this review? The other laptops are listed at 28W and 35W here. I did not see any indication of the power specifications of the ProArt on the other site, just some numbers provided. I strongly suspect that laptop is running at top TDP, 45-54W.

    So, like, a different comparison.
  • Terry_Craig - Sunday, July 28, 2024 - link

    https://www.notebookcheck.net/AMD-Ryzen-9-7940HS-P...

    Depending on the model, the 7940HS goes up to 100w.
  • The Hardcard - Monday, July 29, 2024 - link

    But, is the 7940HS pulling 100w in this review instead of the reported 35w? Otherwise,, what is the point of the complaint?

    https://www.ultrabookreview.com/69005-asus-proart-...

    The HX 370 is in a different chassis pulling 80w sustained. Does that make the 35w vs 28w happening here more fair? I mean, if what the chips can draw elsewhere somehow matters here at all?
  • eastcoast_pete - Monday, July 29, 2024 - link

    It matters if one wants to look at the maximum performance possible, regardless of power draw. But, in addition to what Ryan wrote, the attraction of the HX Series to me is the strong performance at lower power draws. I would have actually liked to see performance comparisons at 17 W, which IMHO is of special interest in such thin and light notebooks. The higher end (> 50 W) will be if interest for Strict Halo, which as far as I can tell is supposed to take on notebooks with smaller dGPUs.
  • ET - Sunday, July 28, 2024 - link

    From the benchmarks here, the 370 looks somewhat disappointing on the CPU front, with some losses to 8 cores Zen 4. A hybrid architecture is always a problem. I wonder if future scheduling changes will help or if the small 8MB L3 for the Zen 5c cores is a problem that can't be overcome.

    The new GPU however looks like a good upgrade over the previous gen.
  • nandnandnand - Sunday, July 28, 2024 - link

    It's hybrid with different cache amounts, but it's also two CCXs instead of one after 3.5 generations of simple 8-cores. It's hard to say what's screwing it up.

    Phoronix's review was more positive for the CPU. The main benefit is power efficiency:
    https://www.phoronix.com/review/amd-ryzen-ai-9-hx-...

    The reviews I looked at didn't look too good for the GPU. Maybe it will do better with more power, but what it really needs is memory bandwidth. Hopefully AMD brings some Infinity Cache to its mainstream 128-bit APUs in the future.

Log in

Don't have an account? Sign up now