Original Link: https://www.anandtech.com/show/16087/the-samsung-980-pro-pcie-4-ssd-review
The Samsung 980 PRO PCIe 4.0 SSD Review: A Spirit of Hope
by Billy Tallis on September 22, 2020 11:20 AM ESTIt may be a bit later than originally planned, but Samsung's first consumer SSD to support PCIe 4.0 is here. The Samsung 980 PRO was first previewed at CES in January, but we didn't hear anything further until leaks started appearing towards the end of summer. Now the 980 PRO is set to kick off a new wave of PCIe 4.0 SSD releases. These new changes are the most significant changes to Samsung's PRO SSD line since the debut of its first NVMe drive.
The Samsung 980 PRO PCIe 4.0 SSD
As we march onto a world where PCIe 4.0 is going to start being offered to the vast majority of consumers in all segments of computing, the move is on in order to enable support for these new standards. Benefits such as increased peak speed, or decreased power consumption, are obvious vital specifications that PCIe 4.0 brings to the table, and thus having optimized products to go along with it has always been the case as new generations trump the told. Samsung's first PCIe 4.0 x4 offering for consumers is the 980 PRO, a series of M.2 drives with capacities up to 2.0 TB.
These new drives feature the latest in Samsung's controller design, but also mark the change from 2-bit cells to 3-bit cells for the Pro line of drives. This creates changes for increased capacity and decreased cost, and due to Samsung's controller technology the lower theoretical endurace we might expect with TLC is still covered by the warranty. Samsung's Pro line of storage drives have always been designed to impress, always being in the upper echelons for performance for the market. This is what we're here to test in this review.
Samsung 980 PRO SSD Specifications | ||||||
Capacity | 250 GB | 500 GB | 1 TB | 2 TB | ||
Interface | PCIe 4 x4, NVMe 1.3c | |||||
Form Factor | M.2 2280 Single-sided | |||||
Controller | Samsung Elpis | |||||
NAND | Samsung 128L 3D TLC | |||||
LPDDR4 DRAM | 512MB | 1GB | 2GB | |||
SLC Write Cache Size |
Min | 4 GB | 4 GB | 6 GB | TBD | |
Max | 49 GB | 94 GB | 114 GB | |||
Sequential Read | 6400 MB/s | 6900 MB/s | 7000 MB/s | |||
Sequential Write | SLC | 2700 MB/s | 5000 MB/s | 5000 MB/s | ||
TLC | 500 MB/s | 1000 MB/s | 2000 MB/s | |||
Random Read IOPS (4kB) |
QD1 | 22k | 22k | 22k | ||
Max | 500k | 800k | 1000k | |||
Random Write IOPS (4kB) |
QD1 | 60k | 60k | 60k | ||
Max | 600k | 1000k | 1000k | |||
Active Power | Read | 5.0 W | 5.9 W | 6.2 W | ||
Write | 3.9 W | 5.4 W | 5.7 W | |||
Idle Power | APST | 35 mW | ||||
L1.2 | 5 mW | |||||
Write Endurance | 150 TB 0.3 DWPD |
300 TB 0.3 DWPD |
600 TB 0.3 DWPD |
1200 TB 0.3 DWPD |
||
Warranty | 5 years | |||||
Launch MSRP | $89.99 (36¢/GB) |
$149.99 (30¢/GB) |
$229.99 (23¢/GB) |
TBD |
Today we are testing the 250 GB and 1 TB models, representing the current minimum and maximum of what is on offer. The 2 TB model is set to come to retail at a later date, along with its respective specifications.
Two Waves of PCIe 4.0 Storage: Wave One
AMD kicked off the transition to PCIe 4.0 last year with the release of their Zen 2 family of CPUs. This started the first phase of PCIe 4.0 SSDs, starting with Phison.
Phison was the only SSD controller vendor ready with a PCIe 4.0 solution at the time; its E16 controller has enjoyed over a year on the market as the only option for consumer PCIe 4.0 SSDs. We're reported a lot about it as well. But the Phison E16 was a bit of a rushed design, with a minimum of changes to their highly successful E12 controller to enable PCIe 4.0 support. That left the E16 with some notable shortcomings: it only offers slightly more peak bandwidth provided by the upgrade to PCIe 4.0, and the extra performance comes with a lot of extra power consumption. The rest of the SSD industry decided to take the PCIe 4.0 transition a bit more slowly, preparing more mature controller designs to be manufactured on smaller process nodes that can provide the efficiency necessary to use the full speed of a PCIe 4.0 x4 link while staying within the thermal and power constraints of a M.2 drive.
All the major players in the SSD controller market have been preparing for this second wave of Gen4 drives, but Samsung is making the first move in this round. The 980 PRO introduces the new Samsung Elpis controller, built on their 8nm process and designed to double the peak performance offered by PCIe Gen3 SSDs.
Wave Two Starts With Samsung
In addition to the new Elpis controller, the 980 PRO introduces a new generation of 3D NAND flash memory from Samsung. Officially, Samsung isn't disclosing the layer count, but they've claimed it's 40% more than their previous generation which was 92L, so this should be 128L 3D NAND. Samsung isn't the first to market with 128L NAND (SK hynix by less than a month), but it shows that layer counts are increasing and capacity should be as well.
Historically, the PRO line of SSDs use some of Samsung's fastest and most durable NAND available - this is what gives the products the PRO name. This time around, Samsung is changing things to help expand its Pro line customer base - Samsung is abandoning the use of the two bit per cell (MLC) memory that has been the hallmark of the PRO product lines, and with the 980 PRO, Samsung is finally switching to three bit per cell (TLC) NAND flash memory. This change is not unprecedented: Samsung has been almost completely alone in their continued use of 2-bit MLC NAND. By comparison, the rest of the SSD industry (consumer and enterprise) has moved from MLC to TLC, even on the leading edge designs.
The historical reasoning for choosing MLC NAND over TLC NAND has always boiled down to two main factors: MLC tends to be faster, and it has higher write endurance. TLC NAND may be slower than MLC NAND in general, but that doesn't mean that TLC SSDs have to be slower than MLC SSDs. The performance advantages of MLC NAND for consumer use have been greatly reduced by the universal adoption of SLC write caching on TLC SSDs, and the trend toward larger SLC caches. Unless a user is going well beyond the SLC cache, a TLC drive with an SLC cache is noticably more preferable from a capacity standpoint than an MLC drive.
Write endurance has always been an important issue to keep an eye on, but the SSD industry has successfully prevented it from becoming a serious problem for consumers. Improved error correction and the fundamental advantages of 3D NAND over older planar NAND have helped, but the biggest factor has been the growth of drive capacities. The total write endurance of a SSD scales roughly linearly with its capacity: a 2TB drive can handle about twice as many TB of writes over its lifespan as a 1TB drive. However, consumer storage needs don't grow in the same way as enterprise storage needs. When a consumer moves from a 512GB drive to 1TB or 2TB, most of the extra capacity that gets used is for relatively static data: photos, videos and games that aren't modified all that often, and certainly not multiple times a day.
The 980 PRO's use of TLC instead of MLC may be the end of an era for SSDs, but it doesn't necessarily mean that the drive isn't worthy of the "PRO" name; the 980 PRO is still very clearly at the high end of the consumer market.
In many ways, this drive could have easily been labeled the 980 EVO as a replacement for the 970 EVO Plus. Along with switching to TLC NAND, Samsung has cut the write endurance ratings in half to 0.3 DWPD and dropped the usable capacities down to the typical TLC/EVO levels of 250/500/1000 GB instead of 256/512/1024 GB. TLC means the 980 PRO now relies on SLC caching for its peak write speeds, and write performance will drop substantially if the SLC cache is ever filled. However, Samsung has offset this by configuring the 980 PRO to use substantially larger SLC cache sizes than their previous EVO drives, and this is what will give it the Pro name more than anything else.. MSRPs are also now much lower, and comparable to other TLC-based PCIe 4 SSDs.
The basic layout and look of Samsung's M.2 NVMe SSDs has changed little over the years even as the components have been upgraded. The Elpis controller is their second to feature a metal heatspreader on the controller package. This is the third generation of drives to use copper foil in the label on the back of the drive as an additional heatspreader.
After the PCIe 4 support and 8nm fab process, the next most important feature of the new Samsung Elpis is its support for 128 IO queues, up from 32 in the previous Phoenix controller. The most common use case for multiple IO queues on a NVMe SSD is for the OS to assign one queue per CPU core, so that no core-to-core synchronization is required for software to submit new IO commands to the SSD. Now that CPU core counts have grown well beyond 32, it makes sense for Samsung to support more queues—especially since these NVMe controllers are also used in Samsung's entry-level enterprise and datacenter NVMe SSDs.
Samsung Client/Consumer PCIe SSD Controller History | |||||||
Codename | Elpis | Phoenix | Polaris | UBX | UAX | ||
Part Number | S4LV003 | S4LR020 | S4LP077 | S4LN058 | S4LN053 | ||
Host Interface | PCIe 4.0 x4 | PCIe 3.0 x4 | PCIe 3.0 x4 | PCIe 3.0 x4 | PCIe 2.0 x4 | ||
Protocol Support | NVMe 1.3c | NVMe 1.3 | NVMe 1.2 | NVMe 1.1 (or AHCI) |
AHCI | ||
Number of IO Queues | 128 | 32 | 7 | 8 | 1 | ||
Max Queue Size | 16384 (per queue) | 32 | |||||
Fabrication Process | 8nm | 14nm | ? | ? | ? | ||
DRAM Support | LPDDR4 | LPDDR4 | LPDDR3 | LPDDR3 | LPDDR2 | ||
Retail Consumer Products |
980 PRO | 970 PRO 970 EVO 970 EVO Plus |
960 PRO 960 EVO |
950 PRO | (None) | ||
Client OEM Products |
PM9A1 | SM981 PM981 |
SM961 PM961 |
SM951 PM951 |
XP941 |
The NVMe protocol hasn't added any major must-have features since the version 1.1 used by the 950 PRO, but Samsung has maintained compliance with later versions and implemented some of the new optional features. The 980 PRO does not advertise compliance with the latest NVMe 1.4 specification and instead claims compliance with version 1.3c, but this has basically no practical impact.
This Review
For today's review, we're focused specifically on high-end consumer SSDs. The drives to pay attention to are:
- Samsung 970 Pro (64L MLC)
- Samsung 970 Evo (64L TLC)
- Samsung 970 Evo Plus (92L TLC)
- Any Phison E16 Drive at PCIe 4.0, such as Seagate FireCuda 520 (96L TLC)
- Any Phison E12 Drive at PCIe 3.0, such as Seagate FireCuda 510 (64L TLC)
- Any Silicon Motion SM2262 PCIe 3.0 drive, such as Kingston KC2500
- Other Flagships: WD Black SN750, Intel Optane 905P, SK hynix Gold P31
We're also going to add in a PCIe 3.0 x8 enterprise drive, the Samsung PM1725a. The PM1725a is an interesting choice as it is a 6.4TB high-end enterprise SSD that's a few years old. It has as much PCIe bandwidth as new PCIe 4.0 x4 SSDs, but the PM1725a is tuned for enterprise use cases: its lack of SLC caching hurts peak write performance, but its read throughput is still impressive at over 6GB/s for sequential and over 1M IOPS for random reads. The downside is that it can require 20+ Watts to deliver that kind of performance. We shall see if the jump from PCIe 3.0x8 to PCIe4.0x4 of its own makes it worth it.
As always, comparisons against other drives can be made using our Bench tool.
Testing PCIe 4.0
It's been over a year since the first consumer CPUs and SSDs supporting PCIe 4.0 hit the market, so we're a bit overdue for a testbed upgrade. Our Skylake system was adequate for even the fastest PCIe gen3 drives, but is finally a serious bottleneck.
We have years of archived results from the old testbed, which are still relevant to the vast majority of SSDs and computers out there that do not yet support PCIe gen4. We're not ready to throw out all that work quite yet; we will still be adding new test results measured on the old system until PCIe gen4 support is more widespread, or my office gets too crowded with computers—whichever happens first. (Side note: some rackmount cases for all these test systems would be greatly appreciated.)
AnandTech 2017-2020 Skylake Consumer SSD Testbed | |
CPU | Intel Xeon E3 1240 v5 |
Motherboard | ASRock Fatal1ty E3V5 Performance Gaming/OC |
Chipset | Intel C232 |
Memory | 4x 8GB G.SKILL Ripjaws DDR4-2400 CL15 |
Software | Windows 10 x64, version 1709 |
Linux kernel version 4.14, fio version 3.6 | |
Spectre/Meltdown microcode and OS patches current as of May 2018 |
- Thanks to Intel for the Xeon E3 1240 v5 CPU
- Thanks to ASRock for the E3V5 Performance Gaming/OC
- Thanks to G.SKILL for the Ripjaws DDR4-2400 RAM
- Thanks to Corsair for the RM750 power supply, Carbide 200R case, and Hydro H60 CPU cooler
- Thanks to Quarch for the HD Programmable Power Module and accessories
- Thanks to StarTech for providing a RK2236BKF 22U rack cabinet.
Since introducing the Skylake SSD testbed in 2017, we have made few changes to our testing configurations and procedures. In December 2017, we started using a Quarch XLC programmable power module (PPM), providing far more detailed and accurate power measurements than our old multimeter setup. In May 2019, we upgraded to a Quarch HD PPM, which can automatically compensate for voltage drop along the power cable to the drive. This allowed us to more directly measure M.2 PCIe SSD power: these drives can pull well over 2A from the 3.3V supply which can easily lead to more than the 5% supply voltage drop that drives are supposed to tolerate. At the same time, we introduced a new set of idle power measurements conducted on a newer Coffee Lake system. This is our first (and for the moment, only) SSD testbed that is capable of using the full range of PCIe power management features without crashing or other bugs. This allowed us to start reporting idle power levels for typical desktop and best-case laptop configurations.
Coffee Lake SSD Testbed for Idle Power | |
CPU | Intel Core i7-8700K |
Motherboard | Gigabyte Aorus H370 Gaming 3 WiFi |
Memory | 2x 8GB Kingston DDR4-2666 |
On the software side, the disclosure of the Meltdown and Spectre CPU vulnerabilities at the beginning of 2018 led to numerous mitigations that affected overall system performance. The most severe effects were to system call overhead, which has a measurable impact on high-IOPS synthetic benchmarks. In May 2018, after the dust started to settle from the first round of vulnerability disclosures, we updated the firmware, microcode and operating systems on our testbed and took the opportunity to slightly tweak some of our synthetic benchmarks. Our pre-Spectre results are archived in the SSD 2017 section of our Bench database while the current post-Spectre results are in the SSD 2018 section. Of course, since May 2018 there have been many further related CPU security vulnerabilities found, and many changes to the mitigation techniques. Our SSD testing has not been tracking those software and microcode updates to avoid again invalidating previous scores. However, our new gen4-capable Ryzen test system is fully up to date with the latest firmware, microcode and OS versions.
AnandTech Ryzen PCIe 4.0 Consumer SSD Testbed | |
CPU | AMD Ryzen 5 3600X |
Motherboard | ASRock B550 Pro |
Memory | 2x 16GB Mushkin DDR4-3600 |
Software | Linux kernel version 5.8, fio version 3.23 |
Our new PCIe 4 test system uses an AMD Ryzen 5 3600X processor and an ASRock B550 motherboard. This provides PCIe 4 lanes from the CPU but not from the chipset. Whenever possible, we test NVMe SSDs with CPU-provided PCIe lanes rather than going through the chipset, so the lack of PCIe gen4 from the chipset isn't an issue. (We had a similar situation back when we were using a Haswell system that supported gen3 on the CPU lanes but only gen2 on the chipset.) Going with B550 instead of X570 also avoids the potential noise of a chipset fan. The DDR4-3600 is a big jump compared to our previous testbed, but is a fairly typical speed for current desktop builds and is a reasonable overclock. We're using the stock Wraith Spire 2 cooler; our current SSD tests are mostly single-threaded, so there's no need for a bigger heatsink.
For now, we are still using the same test scripts to generate the same workloads as on our older Skylake testbed. We haven't tried to control for all possible factors that could lead to different scores between the two testbeds. For this review, we have re-tested several drives on the new testbed to illustrate the scale of these effects. In future reviews, we will be rolling out new synthetic benchmarks that will not be directly comparable to the tests in this review and past reviews. Several of our older benchmarks do a poor job of capturing the behavior of the increasingly common QLC SSDs, but that's not important for today's review. The performance differences between new and old testbeds should be minor, except where the CPU speed is a bottleneck. This mostly happens when testing random IO at high queue depths.
More important for today is the fact that our old benchmarks only test queue depths up to 32 (the limit for SATA drives), and that's not always enough to use the full theoretical performance of a high-end NVMe drive—especially since our old tests only use one CPU core to stress the SSD. We'll be introducing a few new tests to better show these theoretical limits, but unfortunately the changes required to measure those advertised speeds also make the tests much less realistic for the context of desktop workloads, so we'll continue to emphasize the more relevant low queue depth performance.
Whole-Drive Fill
This test starts with a freshly-erased drive and fills it with 128kB sequential writes at queue depth 32, recording the write speed for each 1GB segment. This test is not representative of any ordinary client/consumer usage pattern, but it does allow us to observe transitions in the drive's behavior as it fills up. This can allow us to estimate the size of any SLC write cache, and get a sense for how much performance remains on the rare occasions where real-world usage keeps writing data after filling the cache.
Both tested capacities of the 980 PRO perform more or less as advertised at the start of the test: 5GB/s writing to the SLC cache on the 1TB model and 2.6GB/s writing to the cache on the 250GB model - the 1 TB model only hits 3.3 GB/s when in PCIe 3.0 mode. Surprisingly, the apparent size of the SLC caches is larger than advertised, and larger when testing on PCIe 4 than on PCIe 3: the 1TB model's cache (rated for 114GB) lasts about 170GB @ Gen4 speeeds and about 128GB @ Gen3 speeds, and the 250GB model's cache (rated for 49GB) lasts for about 60GB on Gen4 and about 49GB on Gen3. If anything it seems that these SLC cache areas are quoted more for PCIe 3.0 than PCIe 4.0 - under PCIe 4.0 however, there might be a chance to free up some of the SLC as the drive writes to other SLC, hence the increase.
An extra twist for the 1TB model is that partway through the drive fill process, performance returns to SLC speeds and stays there just as long as it did initially: another 170GB written at 5GB/s (124GB written at 3.3GB/s on Gen3). Looking back at the 970 EVO Plus and 970 EVO we can see similar behavior, but it's impressive Samsung was able to continue this with the 980 PRO while providing much larger SLC caches—in total, over a third of the drive fill process ran at the 5GB/s SLC speed, and performance in the TLC writing phases was still good in spite of the background work to flush the SLC cache.
Average Throughput for last 16 GB | Overall Average Throughput |
On the Gen4 testbed, the overall average throughput of filling the 1TB 980 PRO is only slightly slower than filling the MLC-based 970 PRO, and far faster than the other 1TB TLC drives. Even when limited by PCIe Gen3, the 980 Pro's throughput remains in the lead. The smaller 250GB model doesn't make good use of PCIe Gen4 bandwidth during this sequential write test, but it is a clear improvement over the same capacity of the 970 EVO Plus.
Working Set Size
Most mainstream SSDs have enough DRAM to store the entire mapping table that translates logical block addresses into physical flash memory addresses. DRAMless drives only have small buffers to cache a portion of this mapping information. Some NVMe SSDs support the Host Memory Buffer feature and can borrow a piece of the host system's DRAM for this cache rather needing lots of on-controller memory.
When accessing a logical block whose mapping is not cached, the drive needs to read the mapping from the full table stored on the flash memory before it can read the user data stored at that logical block. This adds extra latency to read operations and in the worst case may double random read latency.
We can see the effects of the size of any mapping buffer by performing random reads from different sized portions of the drive. When performing random reads from a small slice of the drive, we expect the mappings to all fit in the cache, and when performing random reads from the entire drive, we expect mostly cache misses.
When performing this test on mainstream drives with a full-sized DRAM cache, we expect performance to be generally constant regardless of the working set size, or for performance to drop only slightly as the working set size increases.
Since these are all high-end drives, we don't see any of the read performance drop-off we expect from SSDs with limited or no DRAM buffers. The two drives using Silicon Motion controllers show a little bit of variation depending on the working set size, but ultimately are just as fast when performing random reads across the whole drive as they are reading from a narrow range. The read latency measured here for the 980 PRO is an improvement of about 15% over the 970 EVO Plus, but is not as fast as the MLC-based 970 PRO.
Note: Our SSD testbed is currently producing suspiciously slow scores for The Destroyer, so those results have been omitted pending further investigation.
Note2: We are currently in the process of testing these benchmarks in PCIe 4.0 mode. Results will be added as they finish.
AnandTech Storage Bench - Heavy
Our Heavy storage benchmark is proportionally more write-heavy than The Destroyer, but much shorter overall. The total writes in the Heavy test aren't enough to fill the drive, so performance never drops down to steady state. This test is far more representative of a power user's day to day usage, and is heavily influenced by the drive's peak performance. The Heavy workload test details can be found here. This test is run twice, once on a freshly erased drive and once after filling the drive with sequential writes.
Average Data Rate | |||||||||
Average Latency | Average Read Latency | Average Write Latency | |||||||
99th Percentile Latency | 99th Percentile Read Latency | 99th Percentile Write Latency | |||||||
Energy Usage |
The 250GB Samsung 980 PRO is a clear improvement across the board relative to the 970 EVO Plus. It still has some fairly high latency scores, especially for the full drive test run, but that's to be expected for this capacity class. The 1TB model seems to have sacrificed a bit of its full drive performance for in favor of a slight increase in empty-drive performance—the enlarged SLC caches are probably a contributing factor here.
Both drives show a significant reduction in energy usage compared to the older generation of Samsung M.2 NVMe drives, but there's still a ways to go before Samsung catches up to the most efficient 8-channel drives.
AnandTech Storage Bench - Light
Our Light storage test has relatively more sequential accesses and lower queue depths than The Destroyer or the Heavy test, and it's by far the shortest test overall. It's based largely on applications that aren't highly dependent on storage performance, so this is a test more of application launch times and file load times. This test can be seen as the sum of all the little delays in daily usage, but with the idle times trimmed to 25ms it takes less than half an hour to run. Details of the Light test can be found here. As with the ATSB Heavy test, this test is run with the drive both freshly erased and empty, and after filling the drive with sequential writes.
Average Data Rate | |||||||||
Average Latency | Average Read Latency | Average Write Latency | |||||||
99th Percentile Latency | 99th Percentile Read Latency | 99th Percentile Write Latency | |||||||
Energy Usage |
The Samsung 980 PRO does not bring any significant improvements to performance on the Light test. Peak performance from most high-end NVMe drives is essentially the same, and the only meaningful differences are on the full-drive test runs. Aside from a relatively high 99th percentile write latency from the 250GB 980 PRO, neither capacity has any trouble with the full-drive test run.
Samsung has made significant improvements to energy efficiency with the 980 PRO. Samsung's previous generation of M.2 NVMe drives were among the most power-hungry in this segment, with their performance potential largely wasted on such a light workload. The 980 PRO cuts energy usage by a third compared to the 970 generation drives, bringing them more into competition with other high-end M.2 drives. But as with the Heavy test, there's still a lot of room for improvement as illustrated by drives like the WD Black SN750.
Note: All our previous testing has been on an Intel test bed. Because of the move to PCIe 4.0, we have upgraded to Ryzen. Devices tested under Ryzen in time for this review are identified in the charts.
Random Read Performance
Our first test of random read performance uses very short bursts of operations issued one at a time with no queuing. The drives are given enough idle time between bursts to yield an overall duty cycle of 20%, so thermal throttling is impossible. Each burst consists of a total of 32MB of 4kB random reads, from a 16GB span of the disk. The total data read is 1GB.
The burst random read latency of Samsung's 128L TLC as used in the 980 PRO is faster than their earlier TLC, but still lags behind some of the competition—as does their 64L MLC used in the 970 PRO. Our new Ryzen testbed consistently imposes a bit more overhead on drives than our older Skylake-based testbed, and PCIe Gen4 bandwidth is no help here.
Our sustained random read performance is similar to the random read test from our 2015 test suite: queue depths from 1 to 32 are tested, and the average performance and power efficiency across QD1, QD2 and QD4 are reported as the primary scores. Each queue depth is tested for one minute or 32GB of data transferred, whichever is shorter. After each queue depth is tested, the drive is given up to one minute to cool off so that the higher queue depths are unlikely to be affected by accumulated heat build-up. The individual read operations are again 4kB, and cover a 64GB span of the drive.
On the longer random read test that adds in some slightly higher queue depths, the 980 PRO catches up to the 970 PRO's performance while the Phison-based Seagate drives fall behind. The SK hynix drive and the two drives with the Silicon Motion SM2262EN controller remain the fastest flash-based drives. Our Ryzen testbed has a slight advantage over the old Skylake testbed here, but it's due to the faster CPU rather than the extra PCIe bandwidth: the PCIe Gen3 drives benefit as well.
Power Efficiency in MB/s/W | Average Power in W |
Samsung's power efficiency on the random read test is clearly improved with the new NAND and the 8nm Elpis controller, but that really only brings the 980 PRO's efficiency up to par at best. The SMI-based drives from Kingston and ADATA are a bit more efficient, and the SK hynix Gold P31 with its extremely efficient 4-channel controller is still far beyond the 8-channel competitors.
Compared to the 970 EVO Plus, the 980 PRO's random read performance is increased across the board and power consumption is reduced, but the differences are slight. The 980 PRO's improvement is greatest at the highest queue depths we test, and running on our new Ryzen testbed helps a bit more—but the PCIe 3 SK hynix Gold P31 gets most of the same benefits at high queue depths and matches the QD32 random read throughput of the 980 PRO. We would need to use multiple CPU cores for this test in order to reach the performance levels where the 980 PRO could build a big lead over the Gen3 drives. The 250GB model shows more significant improvement than the 1TB model, but again this is mainly at high queue depths.
The random read performance and power consumption of the 980 PRO start out in mundane territory for the lower queue depths. At the highest queue depths tested, it is largely CPU-limited and stands out only because we haven't tested many drives on our new, faster Ryzen testbed. The PCIe Gen3 SK hynix Gold P31 is still able to keep pace with the 980 PRO under these conditions, and it still uses far less power than the competition.
Random Write Performance
Our test of random write burst performance is structured similarly to the random read burst test, but each burst is only 4MB and the total test length is 128MB. The 4kB random write operations are distributed over a 16GB span of the drive, and the operations are issued one at a time with no queuing.
The burst random write performance of the Samsung 980 PRO is an improvement over its predecessors, but Samsung's SLC write cache latency is still significantly slower than many of their competitors. PCIe Gen4 support isn't a factor for the 980 PRO here at QD1, and the two capacities of the 980 PRO seem to disagree whether the other differences between our old and new testbeds help or hurt. Meanwhile, the Phison E16-based Seagate FireCuda 520 does seem to be able to benefit significantly on our Gen4 testbed, where it takes a clear lead.
As with the sustained random read test, our sustained 4kB random write test runs for up to one minute or 32GB per queue depth, covering a 64GB span of the drive and giving the drive up to 1 minute of idle time between queue depths to allow for write caches to be flushed and for the drive to cool down.
On the longer random write test with some higher queue depths, it's more clear that our new Ryzen testbed performs a bit better than our old Skylake testbed, and that PCIe Gen4 support is only responsible for part of that advantage. Even using PCIe Gen4, the 1TB 980 PRO is not able to establish a clear lead over the PCIe Gen3 drives and is a bit slower than the Phison E16 drive, but the smaller 250GB 980 PRO is a big improvement over the 970 EVO Plus thanks to the larger SLC cache (now up to 49GB, compared to 13GB).
Power Efficiency in MB/s/W | Average Power in W |
The 980 PRO brings significant and much-needed power efficiency improvements over its predecessors, and takes first place among the high-end 8-channel SSDs. But the 4-channel SK hynix Gold P31 still has a wide lead, since its random write performance is very competitive while its power requirements are much lower.
The 1TB 980 PRO offers basically the same performance profile on this test as its predecessors from the 970 generation: performance tops out around QD4, where CPU overhead becomes the limiting factor. (This test is single-threaded, so higher throughput could be achieved on either testbed using a multi-threaded benchmark, but real-world applications need a lot of CPU power left over to actually *do something* with the data they're shuffling around.)
The 250GB 980 PRO briefly reaches the same peak performance as the 1TB model, but in the second half of the test it still overflows the SLC cache. It's a big improvement over the 250GB 970 EVO Plus, but the low capacity still imposes a significant performance handicap when writing a lot of data.
At QD1 the 980 PRO's random write performance is still in SATA territory, but it quickly moves to much higher performance ranges without much increase in power consumption. At the highest queue depths tested, the 1TB 980 PRO's performance is tied with the other CPU-limited drives and its power consumption is about midway between the fairly power-hungry Phison E16 drive and the stunningly efficient SK hynix drive.
Note: All our previous testing has been on an Intel test bed. Because of the move to PCIe 4.0, we have upgraded to Ryzen. Devices tested under Ryzen in time for this review are identified in the charts.
Sequential Read Performance
Our first test of sequential read performance uses short bursts of 128MB, issued as 128kB operations with no queuing. The test averages performance across eight bursts for a total of 1GB of data transferred from a drive containing 16GB of data. Between each burst the drive is given enough idle time to keep the overall duty cycle at 20%.
The burst sequential read performance of the Samsung 980 PRO is marginally faster than its predecessors, but the extra PCIe Gen4 bandwidth doesn't matter with a queue depth of just one. The drives using the SM2262EN controller stay on the top of this chart.
Our test of sustained sequential reads uses queue depths from 1 to 32, with the performance and power scores computed as the average of QD1, QD2 and QD4. Each queue depth is tested for up to one minute or 32GB transferred, from a drive containing 64GB of data. This test is run twice: once with the drive prepared by sequentially writing the test data, and again after the random write test has mixed things up, causing fragmentation inside the SSD that isn't visible to the OS. These two scores represent the two extremes of how the drive would perform under real-world usage, where wear leveling and modifications to some existing data will create some internal fragmentation that degrades performance, but usually not to the extent shown here.
On the longer sequential read test, the 980 PRO no longer has a clear advantage over its predecessors. The 250GB 980 PRO is slightly slower than the 970 EVO Plus even on our new testbed. The 1TB 980 PRO shows slight improvement in its performance reading back data that wasn't written sequentially, but the 970 PRO and the SK hynix Gold P31 are still significantly faster for that task.
Power Efficiency in MB/s/W | Average Power in W |
The power efficiency scores for the 980 PRO on the sequential read test are a mixed bag. Overall, the scores are still good for a high-end NVMe drive, but it doesn't consistently improve over its predecessors, and when it does score better the improvement is small.
The 980 PRO's sequential read performance doesn't saturate until around QD16: rather late in the test compared to most drives, but that's because high-end PCIe Gen3 drives have been hitting the host bandwidth limit at moderate queue depths. The 1TB 980 PRO does show decent performance scaling through the lower queue depths, taking it past the PCIe Gen3 limits by QD8. This is a clear improvement over the Phison E16-based Seagate FireCuda 520, which doesn't start gaining speed until after QD4.
The 250GB 980 PRO falters midway through the sequential read test, with performance dropping at QD4 and QD8, on both of our testbeds. At QD16 and higher it's still well above the PCIe Gen3 speed limit, but at lower queue depths it isn't an improvement over the 970 EVO Plus.
The sequential read performance of the 980 PRO—with sufficiently high queue depths—goes far beyond what's possible with PCIe Gen3, and the 1TB model stands out dramatically as significantly faster than even the Phison E16 drive. The E16 looks like an extrapolation of the high side of the general power/performance curve, but the 980 PRO blows past 6GB/s with power draw that would still be reasonable at half the speed.
Sequential Write Performance
Our test of sequential write burst performance is structured identically to the sequential read burst performance test save for the direction of the data transfer. Each burst writes 128MB as 128kB operations issued at QD1, for a total of 1GB of data written to a drive containing 16GB of data.
The burst sequential write speed scores for high-end NVMe drives have been fairly boring, with a narrow spread of scores for a wide variety of drives. The PCIe Gen4 drives break out of that rut and deliver real improvement to this QD1 performance, but the Phison E16-based Seagate FireCuda 520 is well ahead of the Samsung 980 PRO on this test.
Our test of sustained sequential writes is structured identically to our sustained sequential read test, save for the direction of the data transfers. Queue depths range from 1 to 32 and each queue depth is tested for up to one minute or 32GB, followed by up to one minute of idle time for the drive to cool off and perform garbage collection. The test is confined to a 64GB span of the drive.
On the longer sequential write test that includes low to moderate queue depths, the 980 PRO and the Phison E16 drive end up roughly tied, with the 980 PRO only 1% ahead overall. The smaller 250GB 980 PRO is a bit on the slow side compared to most of the 1TB drives, but it's several times faster than the 250GB 970 EVO Plus thanks to the larger SLC cache.
Power Efficiency in MB/s/W | Average Power in W |
Since the 980 PROs are able to make good use of their high performance on this test, it's not too surprising that they post good efficiency scores for sequential writes. But even when tested on a PCIe Gen3 system the 980 PROs remain significantly more efficient than the 8-channel Gen3 drives, so the 980 PROs are also doing a good job of scaling down power consumption at lower speeds.
At QD2 the 1TB 980 PRO's sequential write speed is already well above the practical limit for PCIe Gen3, but further increases in queue depth don't bring much more performance. The 980 PRO is generally a bit faster and more consistent than the Seagate FireCuda 520 on this test. The 250GB 980 PRO doesn't see any benefit from PCIe Gen4 execpt at QD1, because its SLC cache write speed doesn't come close to the PCIe Gen3 limit. Unlike the random write test, the 250GB 980 PRO makes it all the way through the sequential write test without running out of cache or experiencing a performance drop.
The two 1TB PCIe Gen4 drives extend the same power/performance trend set by most of the high-end Gen3 NVMe SSDs. The 980 PRO falls toward the more efficient side of that trend while the Phison E16-based Seagate drive is more power hungry and approaches the reasonable limits for M.2 drives.
Note: All our previous testing has been on an Intel test bed. Because of the move to PCIe 4.0, we have upgraded to Ryzen. Devices tested under Ryzen in time for this review are identified in the charts.
Mixed Random Performance
Our test of mixed random reads and writes covers mixes varying from pure reads to pure writes at 10% increments. Each mix is tested for up to 1 minute or 32GB of data transferred. The test is conducted with a queue depth of 4, and is limited to a 64GB span of the drive. In between each mix, the drive is given idle time of up to one minute so that the overall duty cycle is 50%.
Since our mixed random IO test uses a moderate queue depth of 4, the PCIe Gen4 drives don't get much chance to flex their muscle. The overall scores are still generally bound by NAND flash latency, which doesn't vary too widely between current generation drives. There's also a small performance boost when running this test on our newer, faster Ryzen testbed. The Samsung 980 PRO is clearly an improvement over its predecessors, but is merely tied for first place among flash-based drives with the SK hynix Gold P31.
Power Efficiency in MB/s/W | Average Power in W |
Both capacities of the 980 PRO turn in good efficiency scores for the mixed random IO test, substantially improving on Samsung's previously mediocre standing. The 1TB 980 PRO's efficiency is second only to the SK hynix Gold P31. The 980 PROs are a bit more efficient running at PCIe Gen3 speeds than on the Gen4 platform, despite the ~10% performance boost on the faster system.
There are no real surprises in the performance profiles of the 980 PROs. Both capacities show the same general behavior as earlier Samsung drives, albeit with small improvements to performance and power consumption across the board.
Mixed Sequential Performance
Our test of mixed sequential reads and writes differs from the mixed random I/O test by performing 128kB sequential accesses rather than 4kB accesses at random locations, and the sequential test is conducted at queue depth 1. The range of mixes tested is the same, and the timing and limits on data transfers are also the same as above.
The Samsung 980 PROs take the top spots for our mixed sequential IO test, with even the 250GB 980 PRO edging out the 1TB Seagate FireCuda 520. Even when limited to PCIe Gen3, the 980s are a clear step up in performance from eariler high-end drives. The improvement for the 250GB model is the most impressive, since the 250GB 970 EVO Plus is significantly slower than most of the 1TB drives.
Power Efficiency in MB/s/W | Average Power in W |
The 980 PROs turn in more good power efficiency numbers that place them clearly ahead of everything other than the SK hynix Gold P31. And this time, the P31's efficiency lead relatively small at no more than about 25%.
The 980 PROs show a drastically different performance profile compared to earlier Samsung drives. The 970s tend to bottom out during the write-heavy half of the test and recover some performance toward the end. Now with the 980 PRO, performance in the write-heavy half doesn't drop precipitously, so we see a steady decline that most closely resembles how the Intel Optane SSD handles this test
Power Management Features
Real-world client storage workloads leave SSDs idle most of the time, so the active power measurements presented earlier in this review only account for a small part of what determines a drive's suitability for battery-powered use. Especially under light use, the power efficiency of a SSD is determined mostly be how well it can save power when idle.
For many NVMe SSDs, the closely related matter of thermal management can also be important. M.2 SSDs can concentrate a lot of power in a very small space. They may also be used in locations with high ambient temperatures and poor cooling, such as tucked under a GPU on a desktop motherboard, or in a poorly-ventilated notebook.
Samsung 980 PRO NVMe Power and Thermal Management Features |
|||
Controller | Samsung Elpis | ||
Firmware | 1B2QGXA7 | ||
NVMe Version |
Feature | Status | |
1.0 | Number of operational (active) power states | 3 | |
1.1 | Number of non-operational (idle) power states | 2 | |
Autonomous Power State Transition (APST) | Supported | ||
1.2 | Warning Temperature | 82°C | |
Critical Temperature | 85°C | ||
1.3 | Host Controlled Thermal Management | Supported | |
Non-Operational Power State Permissive Mode | Not Supported |
The set of power management features supported by the 980 PRO is the same as what the 970 generation offered. The active state power levels have been tweaked and the highest power state can now reach 8.49W: definitely high for a M.2 drive, but not as problematic as the 10.73W declared by the Phison E16-based Seagate FireCuda 520. Power state transition latencies for the 980 PRO have also been adjusted slightly, but the overall picture is still a promise of very quick state changes.
Samsung 980 PRO NVMe Power States |
|||||
Controller | Samsung Elpis | ||||
Firmware | 1B2QGXA7 | ||||
Power State |
Maximum Power |
Active/Idle | Entry Latency |
Exit Latency |
|
PS 0 | 8.49 W | Active | - | - | |
PS 1 | 4.48 W | Active | - | 0.2 ms | |
PS 2 | 3.18 W | Active | - | 1.0 ms | |
PS 3 | 40 mW | Idle | 2.0 ms | 1.2 ms | |
PS 4 | 5 mW | Idle | 0.5 ms | 9.5 ms |
Note that the above tables reflect only the information provided by the drive to the OS. The power and latency numbers are often very conservative estimates, but they are what the OS uses to determine which idle states to use and how long to wait before dropping to a deeper idle state.
Idle Power Measurement
SATA SSDs are tested with SATA link power management disabled to measure their active idle power draw, and with it enabled for the deeper idle power consumption score and the idle wake-up latency test. Our testbed, like any ordinary desktop system, cannot trigger the deepest DevSleep idle state.
Idle power management for NVMe SSDs is far more complicated than for SATA SSDs. NVMe SSDs can support several different idle power states, and through the Autonomous Power State Transition (APST) feature the operating system can set a drive's policy for when to drop down to a lower power state. There is typically a tradeoff in that lower-power states take longer to enter and wake up from, so the choice about what power states to use may differ for desktop and notebooks, and depending on which NVMe driver is in use. Additionally, there are multiple degrees of PCIe link power savings possible through Active State Power Management (APSM).
We report three idle power measurements. Active idle is representative of a typical desktop, where none of the advanced PCIe link power saving features are enabled and the drive is immediately ready to process new commands. Our Desktop Idle number represents what can usually be expected from a desktop system that is configured to enable SATA link power management, PCIe ASPM and NVMe APST, but where the lowest PCIe L1.2 link power states are not available. The Laptop Idle number represents the maximum power savings possible with all the NVMe and PCIe power management features in use—usually the default for a battery-powered system but rarely achievable on a desktop even after changing BIOS and OS settings. Since we don't have a way to enable SATA DevSleep on any of our testbeds, SATA drives are omitted from the Laptop Idle charts.
We haven't sorted out all the power management quirks (or, less politely: bugs) on our new Ryzen testbed, so the idle power results below are mostly from our Coffee Lake system. The PCIe Gen4 drives have been tested on both systems, but for now we are unable to use the lowest-power idle states on the Ryzen system.
Since AMD has not enabled PCIe 4 on their Renoir mobile platform and Intel's Tiger Lake isn't quite shipping yet, these scores are still fairly representative of how these Gen4-capable drives handle power management in a typical mobile setting. Once we're able to get PCIe power management fully working crash-free on our Ryzen testbed, we'll update these scores in our Bench database.
The active idle power draw from the 980 PRO unsurprisingly differs quite a bit depending on whether it's running the PCIe link at Gen3 or Gen4 speeds. At Gen3 speeds, the active idle power is decently low for an 8-channel controller and is an improvement over the 970 generation. At Gen4 speeds the active idle power is a bit on the high side of normal, but still lower than the Phison E16 and the WD Black that is something of an outlier.
The desktop idle power draw for the 980 PROs is less than half what we saw with the Samsung 970 generation drives, but not quite as low as the Silicon Motion SM2262EN achieves. On our Coffee Lake system, the 980 PROs are both able to achieve single digit milliwatt idle power.
The idle wake-up times for the 980 PROs are all very quick, though waking up from the desktop idle state to Gen4 speed does seem to take longer than reestablishing a Gen3 link. Some of the previous-generation Samsung drives we tested exhibited wake-up latencies of several milliseconds, but so far the 980 PRO doesn't seem to do that and aggressively using the deepest idle states achievable won't noticeably hurt system responsiveness.
Samsung 980 Pro: Top Shelf, No Drama
With the release this month of the Samsung 980 PRO, a new round of competition in the high-end SSD market is beginning. The 980 PRO boasts higher performance than any other consumer SSD currently available, including sequential reads at 7GB/s and random reads at up to a million IOs per second. Samsung is continuing their habit of retaking the SSD performance crown, and almost making it look easy. At the same time, the 980 PRO will be priced a bit more reasonably than previous Samsung PRO models thanks to the switch to more affordable TLC NAND. But focusing only on the raw performance capabilities of the 980 PRO can distract from its true purpose, and the real impact of the 980 PRO won't be as dramatic as those top line performance numbers would suggest.
The fundamental problem facing the 980 PRO and other high-end NVMe drives is that the rest of the system can't keep up. Very few real-world consumer workloads can keep this SSD busy enough to make good use of its full performance potential. Hitting 5-7GB/s or 1M IOPS certainly sounds impressive, but that's only possible in fairly unrealistic conditions. The high sequential transfer speeds can be of some use when transferring data between the 980 PRO and RAM or an equally fast SSD, but the peak random IO performance of the drive simply does not matter to consumers today.
PCIe 4.0 Testing Wasn't Easy: Watch This Space
Our soon-to-be-retired synthetic benchmark suite is single-threaded, and the new Ryzen-based testbed highlights the places where the old Skylake CPU has been the bottleneck for random IO. I don't consider this to be a serious problem with the results we've been reporting, because real-world applications need a lot more CPU time for processing data than they do for managing IO transfers. Hype for the upcoming generation of game consoles has suggested that future video games may reach the point of needing the equivalent of an entire CPU core to manage IO, but that's only after using the equivalent of several more cores to decompress data and feed it to a powerful GPU running the kind of game engine that doesn't exist yet. Our new benchmark suite will be designed with such workloads in mind, but current consumer workloads aren't there yet and won't be for at least a few years.
This is our PCIe 3.0 Intel Testing System.
We're building something similar for AMD Ryzen PCIe 4.0
Setting aside the issue of what the 980 PRO can do in contrived circumstances, it still offers improvements over Samsung's earlier TLC SSDs, but these are incremental changes rather than revolutionary. The 980 PRO is still constrained by the latency of NAND flash memory, even though Samsung's 128L TLC is a bit faster than their 92L and 64L generations. The switch to offering much larger SLC cache sizes probably matters a lot more than the addition of PCIe Gen4 support, and the modest power efficiency improvements are overdue.
With Enough Performance, Efficiency Should Be A Target
Moving to the latest NAND and using an 8nm process for the controller helps with power efficiency, but has nowhere near the impact of SK hynix's decision to build a high-end PCIe Gen3 SSD with a four-channel controller. For most consumer workloads that SK hynix Gold P31 is just as fast as the 980 PRO with its eight channel controller and twice the PCIe bandwidth.
Samsung's decision to use TLC NAND in the 980 PRO instead of the traditional MLC NAND for their PRO SSDs has raised some eyebrows, to say the least. Their PRO product line has long stood as one of the most premium options in the SSD market, and this change raises the question of whether the 980 PRO actually deserves that "PRO" moniker. This drive could easily have been labeled the 980 EVO instead, and it would have been a great successor to that product line.
By most measures and for most use cases, the 980 PRO is actually superior to the MLC-based 970 PRO. The addition of PCIe 4 support helps the 980 PRO deliver higher speeds than its predecessors, even though that's more forward-looking than an immediate benefit. The shortcomings relative to a hypothetical MLC-based PCIe 4 drive are also mostly hypothetical; workloads that truly require more write endurance than the TLC-based 980 PRO can provide should probably be handled by an enterprise SSD rather than any consumer/prosumer product. Even with TLC NAND, the 980 PRO offers buyers the security of knowing that the drive is more than capable of handling whatever they will throw at it, and that's reason enough for it to deserve the PRO label.
Samsung's Dilemma: What Goes Into A Mainstream 980 Evo?
But that does leave a gaping hole in Samsung's lineup where a more mainstream 980 EVO might go. Samsung probably wouldn't release a QLC-based NVMe drive using the EVO suffix while they are still trying to establish their QVO branding in the SATA SSD market. But using QLC NAND isn't the only way to make a more affordable mainstream alternative to the overkill that is the 980 EVO.
My bet is that Samsung is considering releasing another PCIe Gen3 drive, or a PCIe Gen4 drive that is significantly slower, cheaper and more power efficient. They've produced low-end client NVMe SSDs for the OEM market before, but never made a retail product out of them. Now might be the time for a successor to the PM971 and PM991 to find its way to the retail SSD market. Watch this space.