Memory Scaling on Haswell CPU, IGP and dGPU: DDR3-1333 to DDR3-3000 Tested with G.Skill
by Ian Cutress on September 26, 2013 4:00 PM ESTThe major enthusiast memory manufacturers are all playing a MHz race, to retail the most MHz possible. These are pretty much all Hynix MFR memory kits, known for their high MHz, but these kits are still binned in their thousands to get one or two modules that hit the high notes. With Haswell memory controllers happily taking DDR3-2800 in their stride, it all boils down to how useful is this increase in memory versus the price at which it costs to bin and produce.
In terms of competitive overclocking, the results are real – where every MHz matters and performance is not on the menu, enthusiasts are happy to take this high end kits to over 4000 MHz+ (recent news from G.Skill shows 4x4GB DDR3-4072 in Ivy Bridge-E, the fastest result ever in quad channel). The reality of that memory is often that enthusiasts are not purchasing but are being seeded the memory, taking the cost-effectiveness out of the ratio.
In terms of real world usage, on our Haswell platform, there are some recommendations to be made.
Avoid DDR3-1333 (and DDR3-1600)
While memory speed did not necessarily affect our single GPU gaming results, for real-world or IGP use, memory speed above these sinks can afford a tangible (5%+) difference in throughput. Based on current pricing, after the Hynix fire, it may be worthwhile, as memory kits above DDR3-1600 are now around the same price.
MHz Matters more than tCL, unless you compare over large MHz ranges
When discussing memory kits, there is often little difference in our testing when comparing different Command Rate numbers – the only issues ever came along with DDR3-1333 C9 and DDR3-1600 C11, which we already suggested are best avoided. Even still, above DDR3-2400 the benefits are minimal at best, with perhaps a few % points afforded in multiple GPU setups.
However, tCL might play a role when comparing large MHz differences, such as 2400 C12 vs. 1866 C8. In order to provide an apt comparison, I mentally use a ‘performance index’, which is a value of MHz divided by tCL:
As a general rule, below 2666 MHz, my Performance Index provides an extremely rough guide as to which kits offer more performance than others. In general we see 1333 C7 > 1333 C9, but 1333 C7 is worse than 2133 C11, for example.
When presented with two kits, calculate this Performance Index. If the kits are similar in number (within 10%), then take the kit with the higher MHz. If we take the results of Dirt 3 minimum frame rates in a CFX configuration, and plot against the Performance Index listed above, we plot the following curve:
The graph shows the trend of diminishing returns in this benchmark - as the PI is higher, we reach an asymptotic limit. Note the several results below the line at PIs of 167, 190 and 229 - these are the 1333 MHz memory kits, reinforcing the idea that at similar PI values, the higher MHz kit should be the one to go for.
Of course, cost is a factor – price raises more with MHz than with tCL, thus the ‘work benefit’ analysis comes in – if buying a kit boosts your productivity by x percent, how long will it take to recover the cost? In single GPU gaming, for our setup, the benefits seemed to be minimal, but I can see workload improvements going for something faster than 1600 C9.
Remember the Order of Importance
As I mentioned at the beginning of this overview, the order of importance for memory should be:
1. Amount of memory
2. Number of sticks of memory
3. Placement of those sticks in the motherboard
4. The MHz of the memory
5. If XMP/AMP is enabled
6. The subtimings of the memory
I would always suggest a user buy more memory if they need it over buying a smaller amount of faster memory. For gamers, common advice to hear on forums is to take sum of the memory of your GPUs and add 4GB or so for Windows 7/8 – that should be the minimum amount of memory in your system. For most single GPU gamers that would put the number at 8GB (for anything bar a Titan or dual GPU), or dual GPU gamers, above 8GB (suggestion is to stick to a power of two).
G.Skill 2x4GB DDR3-3000 12-14-14 1.65V Kit: Do we need it?
Firstly, many thanks to G.Skill for the memory kit on which these tests were performed – 3000 MHz on air can be a tough thing to do without a kit that is actually rated do it. This kit has done the rounds on all the major Z87 overclocking motherboards, including the ASRock Z87 OC Formula in this test.
On the face of it, investing in this kit means a small bump to BCLK, which has additional performance gains purely based on bus speed even at a slightly lower CPU multiplier. Beyond this, there are only very few scenarios where DDR3-3000 C12 beats anything 5x cheaper – a couple of our IGP compute benchmarks, a couple of IGP gaming scenarios and a couple of tri-GPU Crossfire games as well. However, the argument then becomes if going for the extra cost of the memory is not worth buying a discrete GPU outright (or better GPUs).
The memory kit has one thing going for it – overclockers aiming for high MHz love the stuff. In our testing, we hit 3100 C12 stable across the kit for daily use, one of the sticks hit 3300 MHz on air at very loose timings, and fellow overclocker K404 got one of the sticks to 3500 MHz on liquid nitrogen. In the hands of overclockers with much more time on their hands (and knowledge of the subsystem), we have seen DDR3-4400 MHz as well.
At $690 for a 2x4 GB kit, veteran system builders are laughing. It is a high price for a kit that offers little apart from a number parade. Perhaps the thing to remember is that plenty of memory manufacturers are also aiming at high MHz – Corsair, Avexir, TeamGroup, Apacer and others. If I had that money to spend on a daily Haswell system, I might plump for 4x8GB of DDR3-2400 C10 and upgrade the GPU with money left over.
Haswell Recommendations
For discrete GPU users, recommending any kit over another is a tough call. In light of daily workloads, a good DDR3-1866 C9 MHz kit will hit the curve on the right spot to remain cost effective. Users with a few extra dollars in their back pocket might look towards 2133 C9/2400 C10, which moves a little up the curve and has the potential should a game come out that is heavily memory dependent. Ultimately the same advice also applies to multi-GPU users as well as IGP: avoid 1600 MHz and below.
One relevant question to this is whether memory speed matters in the laptop space. It remains an untapped resource for memory manufacturers to pursue, mainly because it is an area where saving $5 here and there could mean the difference between a good and great priced product. But even when faced with $2000+ laptops, 1333 MHz C9 and 1600 C9-11 still reign supreme. I have been told that often XMP is not even an option on many models, meaning there are few opportunities for some pumped up SO-DIMM kits that have recently hit the market.
Addendum:
One point I should address which I failed to in the article. XMP rating for a memory kit are made for the density of that kit - i.e. a 2x8 GB DDR3-2400 C11 memory kit might not be stable at 2400 C11 when you add the two kits together in the same system. If a kit has a lot of headroom, then it may be possible, but this is no guarantee. The only guarantee is if you purchase a single kit (4x8 2400 C11, for example) then it will be confirmed to run at the rated timings. This in certain circumstances may be slightly more expensive, but it saves a headache if the kits you buy will not work in a full density system. I would certainly recommend buying a single kit, rather than gambling with two lower density kits, even if they are from the same family. The rating on the kit is for the density of that kit.
89 Comments
View All Comments
ShieTar - Friday, September 27, 2013 - link
I think you would have to propose a software benchmark which benefits from actually running from a Ramdisk. Testing the RD itself with some kind of synthetic HD-Benchmark will not give you much different results than a synthetic memory benchmark, unless the software implementation is rubbish.So if you want to see this happen, I suggest you explain to everybody what kind of software you use in combination with your Ramdisk, and why it benefits from it. And hope that this software is sufficiently relevant to get a large number of people interested in this kind of benchmark.
ShieTar - Friday, September 27, 2013 - link
Two comments on the "Performance Index" used in this article:1. It is calculated as the reverse of the actual access latency (in nanoseconds). Using the reverse of a physically meaningful number will always make the relationship exhibit much more of an "diminishing return" then when using the phyical attribute directly.
2. As no algorithm should care directly about the latency, but rather about the combined time to get the full data set it requested, it would be interesting to understand which is the typical size of a data set affecting the benchmarks indicate. If your software is randomly picking single bytes from the memory, you expect performance to only depend on the latency. On the other hand, if the software is reading complete rows (512 bytes), the bandwidth becomes more relevant than the latency.
Of course figuring out the best performance metric for any kind of review can take a lot of time and effort. But when you do a review generating this large amount of data anyways, would it be possible to make the raw data available to the readers, so they can try to get their own understanding on the matter?
Death666Angel - Friday, September 27, 2013 - link
First of all, great article and really good chart layout, very easy to read! :DBut one thing seems strange, the WinRAR 3.93 test, 2800MHz/C12 performs better than 2800MHz/C11, but you call out ...C11 in the text as performing well, even though anyone can increase their latencies without incurring stability issues (that's my experience at least). Switched numbers? :)
willis936 - Friday, September 27, 2013 - link
I too thought this was strange. You could see higher latencies clock for clock performing better which doesn't seem intuitive. I couldn't work out why those results were the way they were.ShieTar - Friday, September 27, 2013 - link
In reality, there really should be no reason why a longer latency should increase performance (unless you are programming some real-time code which depends on algorithm synchronization). Therefore it seems safe to interpret the difference as the measurement noise of this specific benchmark.Urbanos - Friday, September 27, 2013 - link
excellent article! i was waiting for one of these! great work, masterful :)jaydee - Friday, September 27, 2013 - link
Great work, I'd like to see a future article look at single-channel vs dual channel RAM in laptops/mITX/NUC configurations. With only two SO-DIMM slots, people have to really evaluate whether or not you want to fill both DIMM slots knowing you'd have to replace both of them if you want to upgrade but able to utilize the dual channels, or going with a single SO-DIMM, losing the dual channel but having an easier memory upgrade path down the road.Thanks and great work!
Hrel - Friday, September 27, 2013 - link
How do you get such nice screenshots of the BIOS? They look much nicer than when people just use a camera so what did you use to take those screenshots?merikafyeah - Friday, September 27, 2013 - link
Probably used a video capture card. These are also used to objectively evaluate GPU frame-pacing in a way that software like FRAPS cannot.Rob94hawk - Saturday, September 28, 2013 - link
Moder BIOS allow you to upload screenshots to USB. My MSI Z87 Gaming does it. No more picture taking. It's a great feature long overdue!