This is probably the most excited I've been about any SSD launch in quite a while. At CES this year, Crucial announced its M500 SSD - the world's first to use Micron's new 128Gbit MLC NAND die. Courtesy of the cost savings and density increase associated with this new 128Gbit NAND, the M500 would be available in a 960GB capacity, priced at $599. That works out to be around $0.62 per GB for a truly gigantic drive by today's standards. It's exciting. For the past five years I've been learning to live off of less storage that I thought I needed, but the M500 had the potential to spoil me once again.

The M500 starts out with a familiar refrain: a Marvell controller with custom firmware from Crucial/Micron and of course, Micron NAND. All of these parts get updated though, some in more interesting ways than others. The controller is now Marvell’s 88SS9187, an updated version of the 9174 used in the m4. The 9187 is a speed/feature bump over the 9174 and is also used in Plextor’s M5 Pro. I should note that this time around both the Crucial (end user) and Micron (OEM) drives will feature the same M500 branding.

One of the benefits of Marvell’s 9187 is the support for DDR3 memory, which we see exercised on the M500. In its largest configuration, the M500 features 1GB of DDR3-1600. Crucial claims only 2 - 4MB of user data ever ends up in this DRAM, the overwhelming majority of the DRAM is used to cache the page/indirection table that maps logical block addresses to pages in NAND. Like most SSD makers, Crucial won’t talk about the structure of its mapping table but given the size of the DRAM I think it’s safe to assume that we’re looking at a relatively flat structure that should be easy to manage (more on this later).

Crucial / Micron M500 Specifications
  120GB 240GB 480GB 960GB
Controller Marvell 88SS9187
NAND Micron 20nm 2bpc MLC NAND (128Gbit die)
Form Factor 2.5" 7mm/9.5mm, mSATA, M.2 2.5" 7mm/9.5mm, mSATA, M.2 2.5" 7mm/9.5mm, mSATA, M.2 2.5" 7mm/9.5mm
Sequential Read
500MB/s
500MB/s
500MB/s
500MB/s
Sequential Write
130MB/s
250MB/s
400MB/s
400MB/s
4KB Random Read
62K IOPS
72K IOPS
80K IOPS
80K IOPS
4KB Random Write
35K IOPS
60K IOPS
80K IOPS
80K IOPS
Drive Lifetime 72TB Writes (90% full, 25/75% sequential/random IO - 50% 4KB, 40% 64KB, 10% 128KB)
Warranty 3 years

While the M500’s controller is nothing new, its NAND is. The M500 is the first drive to ship with the latest version of IMFT’s 20nm MLC NAND, featuring 128Gbit die. All previous NAND devices from IMFT (as well as its competitors) top out at 64Gbit (8GB) per 2-bit MLC NAND die. The move to larger die decreases the number of die/devices needed to hit each capacity point, and it also makes 1TB SSDs cost effective for the first time ever.

The cost savings come from the fact that these 128Gbit die aren’t simple doublings of last year’s 64Gbit devices; they include a few changes. The most prominent is a shift in page size from 8KB to 16KB. Larger page sizes are more desirable to implement at smaller NAND geometries, which is why you normally see these page size transitions with major shifts in process technology (e.g. 4KB to 8KB page size transition back at 25nm). The good news is that larger page sizes increase sequential throughput, but at the expense of latency. Given that NAND program times increase with smaller NAND geometries, once again the deck is stacked against manufacturers looking to increase performance as they exploit the benefits of Moore’s Law.

The other big change with the 128Gbit implementation of IMFT’s 20nm process is the inclusion of ONFI 3.0 support. There are some power savings courtesy of ONFI 3.0 (lower voltages, on-die termination), but the big news here is an increase in max interface speed. The previous ONFI interface standard (2.x) topped out at around 200MB/s, while ONFI 3.0 kicks that up to 400MB/s. Crucial’s implementation seems to be limited to around 330MB/s, but the drive isn’t anywhere close to saturating that. Remember the interface speed governs the maximum rate at which you can transfer data to/from a NAND device. Most NAND devices are capable of dual-channel operation so in the higher capacity implementations we’re talking about a maximum NAND-to-controller transfer rate of over 600MB/s. There’s more than enough headroom here.

Supporting the new controller, new NAND die, larger page sizes and ONFI 3.0 obviously require a new firmware, so the M500 ships with an evolution of what Crucial developed for the m4. The end result is vastly improved performance across the board, the big question being how well does it compare to the rest of the world given how much has changed since the m4 first arrived on the market.

The 20nm 128Gbit NAND: Larger Pages, Larger Blocks, Lower Performance & Cost?

Intel/Micron NAND Evolution
  50nm 34nm 25nm 20nm 20nm
Single Die Max Capacity 16Gbit 32Gbit 64Gbit 64Gbit 128Gbit
Page Size 4KB 4KB 8KB 8KB 16KB
Pages per Block 128 128 256 256 512
Read Page (max) - - 75 µs 100 µs 115 µs
Program Page (typical) 900 µs 1200 µs 1300 µs 1300 µs 1600 µs
Erase Block (typical) - - 3 ms 3 ms 3.8 ms
Die Size - 172mm2 167mm2 118mm2 202mm2
Gbit per mm2 - 0.186 0.383 0.542 0.634
Rated Program/Erase Cycles 10000 5000 3000 3000 3000

There's a lot of data in the table above, but if you look closely you'll see a couple of trends. The obvious ones are increasing page and block size over time. NAND program latency has also climbed steadily over the years, while endurance decreased. All in all, the picture looks pretty bleak. It's impressive that performance keeps going up each generation given how much the deck is stacked against seeing continued performance improvements. The increase in program time gives you a preview of what we're going to see in the performance pages. Small writes will take longer. Garbage collection routines on a full drive will also take longer to run as each block that needs to be recycled for use has more pages and more data to deal with. Although Crucial uses a faster controller in the M500 vs. m4, the internal housekeeping it has to do goes up tremendously as well. The M500 isn't a drive that was built in pursuit of peak performance. Instead this drive targets the mainstream.

Looking at the difference in density between the two 20nm NAND devices, there's nearly a 17% increase in density from moving to the larger page/block sizes. It's a remarkable improvement especially when you consider the gains are decoupled from a new process node. Ultimately this is Micron's answer to TLC for the time being. Rather than sacrificing endurance to get to lower price points, the 20nm 128Gbit 2bpc MLC NAND device at mature yields should deliver competitive pricing at higher endurance. Indeed this is the message behind Crucial's M500. The company isn't targeting Samsung's SSD 840 Pro, but rather the TLC based 840.

Price Comparison
  120/128GB 240/256GB 480/512GB 960GB
Crucial M500 $129 ($129) $219 ($202) $399 ($442) $599 ($570)
Intel SSD 335 $181 $220 - -
Samsung SSD 840 $100 $169 $333 -
Samsung SSD 840 Pro $139 $229 $463 -

The reality of it all is the M500's MSRPs are closer to the 840 Pro's street prices than the 840's. MSRPs tend to run a bit high on SSDs, so I wouldn't be too surprised to see the M500 eventually settle down closer to the 840 (remember the MSRP for the 840/840 Pro at 250/256GB are $199 and $269, respectively). It's definitely a different approach to driving costs down vs. going to TLC, and it's one that can't necessarily be repeated each generation, but for now the answer works. I'm not sure how meaningful the added endurance is for most client users, although you could make an interesting case for the M500 in some enterprise workloads that the TLC 840 wouldn't be able to make it into.

 

The big news is of course the 960GB capacity point. At $599 the 960GB M500 is by far the cheapest drive available at anywhere that capacity. A quick search on Newegg reveals a $1000 Mushkin 960GB drive and a $3000 1TB OCZ Octane. At $599, the 960GB is a steal at $0.62/GB. Even the Phison based 960GB BP4 from MyDigitalSSD weighs in at $799, and OWC's Mercury Electra MAX (3Gbps SATA) is still over $1000. To put the drive's excellent price in perspective, the 960GB M500 has roughly the same MSRP as Intel's 80GB X25-M had back in 2008. That's an order of magnitude more storage capacity at the same price in 5 years time. Moore's Law makes me happy.

Encryption Done Right & Drive Configurations
POST A COMMENT

110 Comments

View All Comments

  • NCM - Tuesday, April 09, 2013 - link

    TRIM support is built into the OS X, but disabled by default for non-Apple drives. As others have pointed out, the freeware utility 'TRIM Enabler' easily takes care of that. The only other thing to know is that some OS X updates may reset TRIM to 'off', so it's as well to check after any update and re-enable it if necessary.

    I take care of an office full of Macs, including Mac Pros, iMacs, Minis and MacBook Pros, the majority of which have SSDs that I installed. I'm typing this on my 2010 MBP with a 512GB Plextor M3P.

    With the price of SSDs now this is a very worthwhile upgrade, and particularly one that offers a new lease on life for older computers.
    Reply
  • Bkord123 - Tuesday, April 09, 2013 - link

    All of these comments are going to make my wife mad when I buy yet another gadget! I'm not as worried now about the TRIM issue. Btw, does this site have a page that ranks hard drives? I did look and didn't see anything here. Reply
  • jamyryals - Tuesday, April 09, 2013 - link

    Anand has a Bench utility you can use to compare devices. Here's two popular reliable drives -
    http://www.anandtech.com/bench/Product/792?vs=743
    Reply
  • glugglug - Tuesday, April 09, 2013 - link

    With most SSDs no longer using 4KB pages, does it make sense to have 8KB and 16KB random write tests?

    Also, does application performance improve if the drives are formatted with an 8KB or 16KB cluster size?
    Reply
  • Kristian Vättö - Tuesday, April 09, 2013 - link

    Most real world IOs are 4KB. Reply
  • glugglug - Tuesday, April 09, 2013 - link

    Not true, even with the default 4KB cluster size the drives get formatted with. If you format with 16KB clusters, *none* of the IOs will be 4KB. Reply
  • Kristian Vättö - Tuesday, April 09, 2013 - link

    Based on the workloads we've traced (using default cluster size), 4KB is the most common IO size, although it obviously varies and some workloads may have consist of larger IO sizes. Do you have something that backs up your statement? Would be interesting to see that. Reply
  • glugglug - Tuesday, April 09, 2013 - link

    According to the table in the article, for the Anandtech 2011 Heavy Workload, 28% of the IOs are 4KB, not "most".

    I am thinking that what must happen for a 4KB IO on a drive with 16KB pages is that it has to read the current contents of the 16KB page so that the 4KB being rewritten can be merged into it, then write a 16KB page, so each write really ends up being a read + write operation not just the write by itself.

    Worse, when TRIM is used, if the TRIM operation covers only 4KB of the 16KB page, the page can't really be trimmed, because the other 12KB might still be in use; the drive firmware can't know for certain, so having a cluster size match or exceed the drive's page might result in better steady state performance over time because of TRIM not losing track of partial pages.
    Reply
  • Tjalve - Wednesday, April 10, 2013 - link

    I think there are some caching involved when dealing with writes thats smaller then the page size of the NAND. I would guss that the M500 caches in DRAM. There are other vendors that use the onboard flash for caching. Like Sandisk nCache for example. Reply
  • glugglug - Wednesday, April 10, 2013 - link

    For some SSDs that is definately the case. I'm pretty sure Sandforce needed to do it for example, both because the compression makes the size of the flash writes unpredictable, and because if you look at the cluster sizes the chipset supports to go with various obscure controllers its kind of nuts.

    I don't think that is the case here though, because if you multiple the marketed 4KB random write numbers by 4KB, you pretty much get exactly the sequential write speed, and write-back caching to deal with the smaller writes would result in much better sequential performance.
    Reply

Log in

Don't have an account? Sign up now