Memory Performance

Seeing as how the huge L3 cache and quad-channel memory interface are big parts of what makes Ivy Bridge E unique, I thought it might make sense to look at memory latency and bandwidth. We'll start with memory latency, compared to Ivy Bridge, Haswell and Haswell + Crystalwell.

The larger L3 cache buys IVB-E lower latency accesses for a wider range of addresses, but once you exceed the 15MB L3 cache space we see latency about on par with everything else. Only Haswell + Crystalwell manages to hold out for longer. Unfortunately that's not really a part desktop enthusiasts can buy so it's mostly an academic comparison.

The bandwidth story is an interesting one. Sandra maxes out bandwidth by driving all cores at the same time, so you get some uplift here by there simply being more cores under IVB-E's hood. But even if you divide out the number of cores, you get per core cache bandwidth figures that are extremely high (at least outside of L1). The L3 cache in particular is quite bandwidth happy.

Going outside of the L3 cache, we also see a doubling of memory bandwidth - which is expected given the doubling of memory interface width. In reality the peak memory bandwidth advantage would be even larger as IVB-E officially supports DDR3-1866 (if you only populate 1 DIMM per channel, otherwise either 1333 or 1600 is officially supported).

General Performance

I don't know that I've ever seen an Intel slide before that called out a performance degradation, but there's a first time for everything:

The problem with IVB-E vs. Haswell is that the extra large L3 cache and quad-channel memory interface are generally only useful in heavily threaded applications, which of course benefit from its 6-core configuration. In those tests that aren't heavily threaded however, IVB-E typically sees a single threaded performance deficit compared to Haswell. Given that the 4960X and Haswell based Core i7-4770K run at very similar frequencies, it's not surprising to see IVB-E take a backseat to Haswell in in "everyday computing" tasks. Intel's slide above claims about a 18% reduction in "everyday computing" performance compared to the 4770K, but in practice I found the gap to be much narrower.

Although not the best indication of overall system performance, the SYSMark 2012 suite does give us a good idea of lighter workloads than we're used to testing.

SYSMark 2012 - Overall

There's pretty much no advantage to the 4960X over the 3970X here. Remember Ivy Bridge's architectural improvements were very limited on the CPU side. As clock speeds didn't really go up between the 3970X and 4960X, the performance parity here isn't surprising. Haswell manages a ~6% performance advantage over the 4960X at an obviously lower power and price point.

Although I retired SYSMark 2007 a while ago, I do have much older performance data here which lets us compare the 4960X back as far as the early Pentium 4 based Extreme Edition parts:

SYSMark 2007 - Overall

The Haswell advantage grows a bit here to around 8%, but the 4960X remains in the top three performers here. It's very clear that for most users, there are far more cost effective ways of getting great performance than IVB-E.

Our final lightly threaded test is Mozilla's Kraken JavaScript benchmark. This test includes some forward looking js code designed to showcase performance of future rich web applications on today's software and hardware. We run the test under IE10:

Windows 8 - Mozilla Kraken Javascript Benchmark

Ivy Bridge always had good single threaded performance, but once again these lightly threaded use cases are better served by an architecture with higher IPC. The Haswell advantage isn't huge, but it's a lower power/more cost effective way to get the best performance here.

If you are still on LGA-1366, you'll note that the performance gains here are good, but not earth shattering. Comparing to Intel's first 6-core platform, the 4960X manages a 27% increase in performance over the Core i7-990X. That's a healthy gain, but it's still small enough where there's no immediate need to upgrade.

Introduction & The Details Video Transcoding & 3D Rendering Performance
Comments Locked

120 Comments

View All Comments

  • 1Angelreloaded - Tuesday, September 3, 2013 - link

    Can you have a comparison chart please for the 4770k, E5-8core Xenon, 4960X, with benchmarks included. This kind of makes little sense to me X-79 was behind on feature sets like full SATA3 when in reality a lot of these boards will be used as workstation/normal/gaming computers, performance on those boards tends to suffer because lack there of native support. Instead 3rd party chips are used to add extra features which have significant drawbacks. I understand using the socket for 2 gen in order to extend life of boards however 1336 and the next leap to haswell should have been taken, making a board last 2 years with the prime features that defined that generation. This just seams like intel is ignoring its higher end market due to lack of competition out there.
  • sabarjp - Tuesday, September 3, 2013 - link

    Kind of depressing that 3 years of technology only took the compile of Firefox from 23 minutes to 20 minutes. The high-end isn't looking so high these days.
  • dgingeri - Tuesday, September 3, 2013 - link

    So where's the 4820k review? I don't care much about more than 4 cores, but I need more I/O than Haswell offers. (crappy motherboards that offer either 8/4/4 or 8/8/2 are just unacceptable.) I'd like to know how the 4820k overclocks and handles I/O from dual and triple SLi/Crossfire.
  • Eidigean - Tuesday, September 3, 2013 - link

    Visual Studio unfortunately does not compile in parallel the way you might think. In a solution you may have multiple projects. If one project depends on four other projects, those four will be compiled in parallel; one project per thread. Once the four dependencies are built, it can build the fifth; however, that last project will be built single-threaded.

    Xcode and native Android projects (with gcc) can actually build multiple files from one project in parallel. On an i7 with hyperthreading, all eight logical processors can build up to eight files simultaneously. This scales with more cores very nicely.

    In summary, VS builds multiple projects from one solution in parallel, while gcc builds multiple files from one project in parallel; the latter of which is much faster.

    I'm curious now to see the build times of Firefox for Mac on a rMBP with an i7. Eagerly waiting for a 12 core Mac Pro with 24 logical processors.
  • BrightCandle - Tuesday, September 3, 2013 - link

    Visual Studio is a very poor parallel compilation test. GCC with make -p can really utilise a lot more cores but its not very Windows like to use GCC (although I suspect many developers do that).

    I haven't found many Java builds doing well on multiple cores, and neither Scala. Its the unit tests where I get the cores going, I can saturate hundreds of cores with unit tests if I had them, and since I run them in the background on every change I certainly do get a lot of usage out of the extra cores. But a clean compile is not one of those cases where I see any benefit from the 6 cores. Of course I would hope these days we don't do that very often.
  • althaz - Tuesday, September 3, 2013 - link

    It is a poor parallel test, but it is a fantastic real-world test for a lot of devs.
  • madmilk - Tuesday, September 3, 2013 - link

    About 25 minutes here on an 2.6GHz/16GB rMBP. Pretty much as expected for quad Ivy Bridge.
  • bminor13 - Tuesday, September 3, 2013 - link

    Parallel file-level compilation is possible in VS2010 and up with the /MP project switch. This is not enabled by default I believe for compatibility reasons.
  • BSMonitor - Tuesday, September 3, 2013 - link

    A Haswell-E will most likely bring a different pin-count, correct?? So this X79 is a dead end platform any way you look at it. Buying the Quad IVB-E makes almost no sense whatsoever.
  • Casper42 - Tuesday, September 3, 2013 - link

    Most Intel chips use a Tick Tock release cycle. Tick Tock Tick Tock Tick Tock etc
    Tick is an Incremental upgrade. Same socket and largely same design, but reduced lithography (32nm down to 22nm for example). Sometimes new Instructions but often not.
    Tock is an Overhaul upgrade. Uses same Lithography as the previous gen, but is a new internal architecture, often a new Socket, and where most new Instruction sets show up.
    Then you get another Tick.

    Core 2/Conroe was a Tock and was 65nm
    Core 2/Penryn was a Tick and was 45nm
    Core iX/Nehalem was a Tock and was 45nm
    Core iX/Westmere was a Tick and was 32nm
    Core iX/Sandy Br was a Tock and was 32nm
    Core iX/Ivy Bridge is a Tick and is 22nm
    Core iX/Haswell is a Tock and is 22nm

    So to say that X79 is a dead platform should not really be a shock to anyone. They got Sandy and Ivy out of it. Thats 1 Tock and 1 Tick and now its time to move on. They do this exact same thing in the 2P Server market where people spend $10K or more per server. The fact of the matter is the server market has already pretty much learned. Don't bother upgrading that server/machine, just ride it for 3-4 years and then replace it completely. SATA, Memory and CPUs have all changed enough by then you want to reset everything anyway.

Log in

Don't have an account? Sign up now