Activity | admin | Share Knowledge Liner

admin posted an update 4 years, 8 months ago

…..
SSD endurance myths and legends
by Zsolt Kerekes, editor – StorageSearch.com

SSD history
storage reliability
sizing SSD controller architecture
how fast can your SSD run backwards?
selective memories from 40 years of thinking about endurance
how naughty flash was sweetened for the enterprise – timeline from SLC to QLC

SSD endurance myths and legends

The forever war of flash SSD endurance has changed in character from its original simple aims of making endurance as good “as possible” – which was to avoid burnout and catastrophic failure from excessive writes associated with high performance and write amplification (see more about this in SSD jargon) or high endurance could be needed to provide long service life – typically 7 years or more – when the SSDs were used in equipment in embedded markets which due to electrical power and physical size constraints traditionally (although not necessarily in future) depended on the reliability of solo SSDs (instead of fault tolerant arrays – such as exemplified by RAID systems) towards a new value adjusted mission statement for endurance which is judged by analyzing the price points and cost-benefits for various flavors of DWPD (drive writes per day).

In short – the new business related ambition is endurance which is “good enough” but not over specified having regard for the intended application roles of each new SSD.

Sounds simple enough? – except when you see that exactly the same endurance figures can be obtained at the global SSD level using different permutations of memory geometries (nanometer line widths) and different coding densities (SLC, MLC, TLC, QLC – and pSLC). So that’s why SSD endurance (and its role on price and reliability) remains a complex topic which is much studied by SSD vendors and their customers.

And you’ll see different optimization techniques for endurance which leverage nuances of knowledge about the operating environment in similar systems depending on whether the SSDs are standard or custom designed for that application.

It’s over 10 years since StorageSearch.com began to focus on the special challenges that flash wear-out posed for designers of SSDs – as the requirements for high performance SSDs began to push up against the endurance limits of flash memory and move into enterprise acceleration roles which had previously been the exclusive domain of RAM SSDs.

don’t hold your breath waiting for
customer printable graphene SSDs
not your Granddad’s industrial SSD market
Even in its infancy – endurance management was a complicated technical subject – but if we look back from the perspective from the ultra-complexity of today – it was much easier to manage and understand.
Commodity SLC was rated at 100,000 write cycles (which seems astonishingly high by today’s standards).
Also the fastest flash SSD throughput and IOPS rates were 10x, and 100x slower respectively than typical products today.
Nevertheless the risks of wear-out – even in the simpler days of SSD yore – were very real. And I’ve heard of many enterprise users who experienced these failures.

Since then – as flash memory cell sizes have shrunk to deliver ever cheaper flash capacity (more gigabytes per square inch on the chip) the raw endurance figures in each new generation of memory have got worse.
..

where did we get to with endurance? – in 2016

Today’s commodity 2D MLC flash has raw wear-out in the 2,000 to 3,000 write cycle range. (Later – a news story in March 2016 suggested that 2D QLC (x4 nand – which has double the virtual density of TLC) will have endurance in the range of 500 write cycles.)

Pioneers of 3D flash SSD design say that raw 3D nand flash endurance is better.

How much better?

3x to 4x better than 2D at the same line geometries – Dave Merry founder of industrial SSD company FMJ Storage told me in March 2014 – based on his early access characterization research.

Part of 3D nand’s better endurance is due to more expensive substrate (insulating) materials. But another factor – explained by Samsung in 2015 – is that the different design of charge trap (compared to floating point) works with a lower write pulse voltage.

For a long time it had been thought that the future direction of endurance (with successive cell geometry shrinks) would be downwards (towards worse). This was set against the growing IOPS demands from SSD architects which were getting higher and higher.

The faster the SSD, the quicker it can wear out the memory.

This is what created the pressure cooker environment for ever more devious flash controller management schemes and clever SSD architecture.

The risk of flash wear-out in SSDs is a kind of forever war – which is never really permanently won.

That’s why articles about flash SSD endurance remain so popular .

But endurance was a bit more complicated than that

All the figures you see above for “endurance” are based on classical wear leveling techniques using a single memory manufacturer mandated master set of specifications for the shape, height and width of write programming pulse.

In 2012 we started to see endurance stretching effects from over 10 competing companies using different variations of adaptive R/W and DSP ECC techniques in their controllers.

But in some key embedded markets (such as the industrial and military markets) those techniques were rarely if ever used in SSDs due to the added complexity, latency variability and increased power consumption required by faster SSD processors.

So – until about the middle of 2015 – it was still generally agreed that if you used classical (non adaptive, non DSP) controllers then the endurance estimates you needed to take into account were still the figures supplied by the semiconductor memory makers.

Then a new company came along – whose founders had been researching a different way of characterising flash endurance for over a decade.

That company NVMdurance had productized an entirely new way of working with flash – which could stretch the endurance of flash in an SSD by a significant factor irrespective of whether the controller used classical ECC or the new DSP type of ECC.

NVMdurance uses a multi-stage life cycle model for flash coupled with “brute force” raw computational research which discovers the best magic numbers for any type of flash using a small sample of about 100 real chips but then simulates behavior inside the SSD for millions of different predicted devices.

The consequence is that for every type of MLC or TLC flash memory – whether it’s 2D or 3D – there now exist 2 different endurance ratings:-
the raw native endurance – which comes from classical approaches and the text books
the new metascale advised, life-cycle fitted, virtually hardened flash endurance – which you can get by using NVMdurance’s magic numbers alongside their own lightweight firmware which operates agnostically with either classical or DSP controller approaches.
The result is that flash endurance (for all types of flash) can now be upto 10x better!

The NVMdurance approach in one sense is evolutionary (because some companies have got similar effects before in their own SSDs – when they employed their rare flash engineering talent to optimize endurance parameters in conjunction with DSP in particular memory generations and product lines).

But the NVMdurance approach is revolutionary in that the machine based characterization approach is automatically scalable to produce optimum results for more flash geometries and architectures than hand tuned methods, and the business model makes it accessible to a much wider range of end markets for SSDs.

Here are some articles which discuss how SSD companies are dealing with these challenges.
flash and nvm news in an SSD context
razzle dazzling SSD cell care and retirement plans
from SLC to XLC – flash wars in the enterprise – 2004 to 2015
adaptive R/W flash care management IP (including DSP ECC) for SSDs
SSD ad – click for more info

and here’s what I said about SSD endurance before….

SSD endurance – should you worry? – and why?

Flash wear out still presents a challenge to designers of high IOPS flash SSDs as the intrinsic effects at the cell level get worse with each new chip generation.

That’s in contrast to RAM SSDs – where as long as enterprise users remember to replace their batteries periodically – the memory life is more dependent on elapsed time (classic bathtub reliability curve) and heat stresses rather than directly related to the number of R/W cycles.

Higher SSD capacity, and faster speeds come from progressively smaller cell geometries – which we used to call shrinks. In flash memory small size means less trapped charge holding the stored data values and greater sensitivity to charge leakage, charge dumping and disturbance effects from the normal processes which happen around the cell vicinity during R/W, powering up, powering down etc.

If you’re a consumer you don’t have to worry about the internals of endurance management – because most new SSDs are good enough (if they’re used in the right applications environment).

Exceptions still do occur, however for users in the enterprise SSD market – where I still hear stories of users thinking it’s perfectly normal and economic to replace burned out Intel SSDs every 6 to 12 months – instead of buying more reliable (but more expensive) SSDs – from companies like STEC.

But if you’re a systems designer it’s useful to know that the longevity difference between “good enough” and the best endurance architecture schemes can still be 2x, 3x or 100x – even when using the same memory.

In 2011 – new evidence started coming in from longtitudinal flash SSD research done by STEC that old, heavily written MLC cells – managed by traditional endurance schemes – tend to get slower as they get older – due to higher retry rates on reads – even though the blocks are still reported by SMART logs as “good” – and the writes do eventually succeed on retry.

In the same year – a paper by InnoDisk confirmed that whereas SLC and MLC memories have often had endurance populations within each chip which were mostly much better than guaranteed (something which SSD makers had been telling me since 2004) – the headroom / margin of goodness – in newer types of MLC is lower than in the previous MLC generations. That’s why controllers which used to work well with vintage MLC need something much stronger than a tweak to deliver well behaved SSDs when co-starring with the new brat generation of naughty flash.

That’s what started the industry trends towards designing a different type of flash management scheme – adaptive R/W – in which the goodness of cell blocks within the SSD are measured and calibrated – and then different schemes of write pulse length and different strengths of ECC codes (including DSP – digital signal processing to remove “noise”) are applied within the same SSD.

These characteristics are re-evaluated regularly according to error rates – but also according to the age of the SSD and the write counts in the blocks. One of the ideas of the “age” factor being – that using lower power write pulses at the start of SSD life (along with stronger codes) reduces the damage done to the flash material – which means that heavier pulses carrying more charge can be reserved for later years of use – when the cell quality declines due to wear out effects.

SSD news
SSD controllers
top SSD companies
the SSD reliability papers
FITs (failures in time) & SSDs
DWPD – diskful writes per day
Data Recovery for flash SSDs?
Size matters in SSD architecture
Sugaring MLC for the enterprise
the new SSD uncertainty principle
Surviving SSD sudden power loss
future of SSDs – analysts and reports
Bad block management in flash SSDs SSD ad – click for more info
how fast can your SSD run backwards?
flash SSD capacity – the iceberg syndrome
Data Integrity Challenges in flash SSD Design
Increasing Flash SSD Reliability – the impact of wear leveling (2005 classic)

.
SSD ad – click for more info

.
SSD Myths and Legends – “write endurance”

This, below, is the original text of my SSD endurance article published in March 2007

Does the fatal gene of “write endurance” built into flash SSDs prevent their deployment in intensive server acceleration applications? It was certainly true as little as a few years ago (2005). What’s the risk with today’s devices?

Flash based solid state disks would seem to be the ideal virtual storage device…

In every other respect you can treat them in exactly the same way as a hard drive:- same interface, same software model. They even fit mechanically into the same standard hard drive slots. And in many ways they are better – significantly faster, consuming less electric power and more tolerant of ambient temperature and vibration extremes. You mostly don’t need to know about what’s inside them. They are the perfect “fit and forget” storage product.

In the smaller form factors like 1.8″ and 2.5″ – the gap in capacity between SSDs and hard drives has disappeared. If it wasn’t for the price you’d use them – right? (The user value propositions – explaining why SSDs can be significantly cheaper to buy and own in a wide variety of applications are discussed in another article.)

What’s wrong with this utopian vision?

And why is it that even if you were offered a flash SSD accelerator for your server absolutely FREE you might still hesitate about installing it?

The answer explains why the flash SSD server acceleration market still isn’t a billion dollar plus market – even 4 years after I first posed this exact same question.

When you look in more detail at flash SSDs there is just one skinny dark stormcrow hanging around the edge of this picture which makes you feel uneasy about a technology which in other respects is acquiring an untarnished reputation. That’s the prickly issue of write endurance.

Write Endurance: – The number of write cycles to any block of flash is limited – and once you’ve used up your quota for that block – that’s it! The disk can become unreliable.

In the early days of flash SSDs managing this was a real headache for oems and users. The maximum number of write cycles to an address block – the endurance – was initially small (about 10,000 write cycles in 1994, rising to 100,000 in 1997). And the capacity of flash storage was small too. So the write endurance limit was more than just a theoretical consideration. In the worst case – you could destroy a flash SSD in less than a week! But in those days the SSD was being designed in by electronics engineers who knew exactly how the SSD was going to be used. If it helped solve the problem they could even rewrite the software a different way to lessen the risk.

But when you buy an SSD for use in a notebook or server – you don’t write the software. You don’t control the data. So how do you know in advance if you’re going to hit that brick wall?

This fear is an issue which has slowed down the adoption of flash SSDs in commercial server acceleration applications. Write endurance doesn’t affect RAM based SSDs – which have until now dominated that part of the market – mainly due to their superior speed. But the speed of flash SSDs has improved to the point where they could replace RAM based SSDs in many server acceleration slots at a much lower price – if it wasn’t for the worry about endurance.

Write endurance has been a FUD issue for potential enterprise server users. They know it’s lurking there – but who can they trust to quantify the problem in their own language?

Server makers didn’t want users to know about SSDs (any type – period) during 2000 to 2006 – because more SSDs meant selling less servers. In the 2005 edition of the SSD Buyers Guide I wrote about the problem…

“One disadvantage, compared to RAM SSDs is that flash has an intrinsic limit on the total number of write cycles to a particular destination. The limit varies, according to manufacturer but is over millions of cycles in the most durable products. Internal controllers within the flash SSD manage this phenomenon and can reallocate physical media transparently to prolong media life. In most applications, high endurance flash SSDs can have a reliable operating life which is typically 3 times as high as that of a hard drive. But I would hesitate about installing a flash SSD as a server speedup in a university maths research department, for example, or in other applications where the ratio of data writes to data reads is unusually high.”

In May 2006 I came to the conclusion that my earlier doubts may need to be revised.

It was clear from reader emails and negative comments about SSDs which I saw in other publications that fear and doubt about the impact of write endurance was slowing down adoption of flash SSDs in the server acceleration market. It was also clear that most users didn’t know how to interpret the kind of data being offered by SSD oems – which was designed for an elite audience of electronics designers – and not for managers of storage systems. So I contacted all flash SSD oems with the idea of setting up a standard way of presenting endurance life expectancy data – with a proposal which I called the “SSD Half Life.” That dialog met with some enthusiasm but there wasn’t enough vendor support to take it further. The SSD oems I talked to took reliability very seriously – but didn’t want their own proprietary reliability schemes and models swamped by a general industry wide scheme.

The way that SSD oems deal with the management of write endurance internally within their products varies but they all have the common theme of scoring how many times a block of memory has been written to, and then reallocating physical blocks to logical blocks dynamically and transparently to spread the load across the whole disk. In a well designed flash SSD you would have to write to the whole disk the endurance number of cycles to be in danger.

Some manufacturers go a step further. SiliconSystems has a patented algorithm which delivers a lifetime which it claims is better than simplistic wear levelling. Another manufacturer Adtron actually has a percentage of spare flash blocks in the SSD – which are invisible to the host interface and don’t show up as spare storage. But internally – when blocks get close to the limit – the data is transparently switched over to the spare parts of the disk to give an additional breathing space.

The precise numbers are a proprietary secret but are based on analyzing the software from real customers’ SSD applications over many years. OEMs, like these, which target high reliability applications, are also more picky about which flash chips they use, and qualify them according to the results they see from testing.

the Flash SSD Application from Hell* – the Rogue Data Recorder

In most real-life applications the computer does a lot more reads from disk than writes – and the duty cycle (that’s the percentage of time that the disk is being accessed at all) is low. But to estimate whether you should be worried about write endurance with today’s SSD technology I’ve chosen a worst case example – the Rogue Data Recorder.

Real hard disk based data recorders from companies like Conduant can record data continuously in an endless loop. They are useful for a bunch of applications such as capturing pre-trigger data in seismic events, capturing unpredictable data for modelling and bugging phone calls. I managed a company in the mid 80s which pushed storage technology to its limits to get wire speed continuous recording onto disk and massive memory systems with inbuilt real-time trigger processors, embedded workstations and array processors for various types of industries and agencies. That was a good education for my day job now of cutting and pasting.

Most of you wouldn’t set out to design a real-time data recorder – and if you are doing that – this article isn’t going to tell you anything you don’t already know. But by looking at the worst thing which could happen and estimating a confidence boundary from that – it can tell you how much you need to worry.

The nightmare scenario for your new server acceleration flash SSD is that a piece of buggy software written by the maths department in the university or the analytics people in your marketing department is launched on a Friday afternoon just before a holiday weekend – and behaves like a data recorder continuously writing at maximum speed to your disk – and goes unnoticed.

How long have you got before the disk is trashed?

For this illustrative calculation I’m going to pick the following parameters:-
Configuration:- a single flash SSD. (Using more disks in an array could increase the operating life.)
Write endurance rating:- 2 million cycles. (The typical range today for flash SSDs is from 1 to 5 million. The technology trend has been for this to get better.

When this article was published, in March 2007, many readers pointed out the apparent discrepancy between the endurance ratings quoted by most flash chipmakers and those quoted by high-reliability SSD makers – using the same chips.

In many emails I explained that such endurance ratings could be sample tested and batches selected or rejected from devices which were nominally guaranteed for only 100,000 cycles.

In such filtered batches typically 3% of blocks in a flash SSD might only last 100,000 cycles – but over 90% would last 1 million cycles. The difference was managed internally by the controller using a combination of over-provisioning and bad block management.

Even if you don’t do incoming inspection and testing / rejection of flash chips over 90% of memory in large arrays can have endurance which is 5x better than the minimum quoted figure.

Since publishing this article, many oems – including Micron – have found the market demand big enough to offer “high endurance” flash as standard products.)

AMD marketed “million cycle flash” as early as 1998.
Sustained write speed:- 80M bytes / sec (That’s the fastest for a flash SSD available today and assumes that the data is being written in big DMA blocks.)
capacity:- 64G bytes – that’s about an entry level size. (The bigger the capacity – the longer the operating life – in the write endurance context.)

Today single flash SSDs are available with 160G capacity in 2.5″ form factor from Adtron and 155G in a 3.5″ form factor from BiTMICRO Networks.

Looking ahead to Q108 – 2.5″ SSDs will be available upto 412GB from BiTMICRO. And STEC will be shipping 512GB 3.5″ SSDs.
To get that very high speed the process will have to write big blocks (which also simplifies the calculation).

We assume perfect wear leveling which means we need to fill the disk 2 million times to get to the write endurance limit.

2 million (write endurance) x 64G (capacity) divided by 80M bytes / sec gives the endurance limited life in seconds.

That’s a meaningless number – which needs to be divided by seconds in an hour, hours in a day etc etc to give…

The end result is 51 years!

But you can see how just a handful of years ago – when write endurance was 20x less than it is today – and disk capacities were smaller.

For real-life applications refinements are needed to the model which take into account the ratio and interaction of write block size, cache operation and internal flash block size. I’ve assumed perfect cache operation – and sequential writes – because otherwise you don’t get the maximum write speed. Conversely if you aren’t writing at the maximum speed – then the disk will last longer. Other factors which would tend to make the disk last longer are that in most commercial server applications such as databases – the ratio of reads to writes is higher than 5 to 1. And as there is no wear-out or endurance limit on read operations – the implication is to increase the operating life by the read to write ratio.

As a sanity check – I found some data from Mtron (one of the few SSD oems who do quote endurance in a way that non specialists can understand). In the data sheet for their 32G product – which incidentally has 5 million cycles write endurance – they quote the write endurance for the disk as “greater than 85 years assuming 100G / day erase/write cycles” – which involves overwriting the disk 3 times a day.

How to interpret these numbers?

With current technologies write endurance is not a factor you should be worrying about when deploying flash SSDs for server acceleration applications – even in a university or other analytics intensive environment.

How about RAID systems stuffed with flash SSDs?

The calculation above gives the worst case (shortest) operating life based on stuffing data into a single disk at the fastest possible speed. Having a faster interface coming into the a box stuffed with SSDs doesn’t make the life shorter – because the data can only be striped to any individual disk at the limiting rate for that disk.

Au contraire:- not only can an SSD RAID array offer a multiple of a single SSD’s throughput, and IOPs, just as with hard disks but depending on the array configuration the operating life can be multiplied as well – because not all the disks will operate at 100% duty cycle. That means that MTBF and not write endurance will be the limiting factors. And although oem published MTBF data for hard disks has been discredited recently – the MTBF data for flash SSDs has been verified for over a decade in more discriminating applications in high reliability embedded systems.

I’ve been waiting years for storage oems to start marketing flash SSD based storage arrays – as alternatives to RAM based systems. What’s held that market back has been the looming shadow of write endurance. That myth – that flash SSDs wear out – now belongs to the past.

…Later:- in May 2008 – in an exclusive interview with STORAGEsearch.com – AMCC 3ware confirmed it is working with leading SSD oems to develop products which will support the unique needs of the flash SSD RAID market.

* clarifying why the Rogue Data Recorder is the Worst Case Application

I didn’t need to explain this choice to those who design SSDs, but it’s clear from some comments I’ve seen that some readers who don’t have an electronics / semiconductor education or don’t know enough about SSD internals have queried this choice.

Why, for example, does the data recorder example stress a flash SSD more than say continuously writing to the same sector?

The answer is that the data recorder – by writing to successively sectors – makes the best use of the inbuilt block erase/write circuits and the external (to the flash memory – but still internal to the SSD) buffer / cache. In fact it’s the only way you can get anywhere close to the headline spec data write throughput and write IOPS.

This is because you are statistically more likely to find that writing to different address blocks finds blocks that are ready to write.

If you write a program which keeps rewriting data to exactly the same address sector – all successive sector writes are delayed until the current erase / write cycle for that part of the flash is complete. So it actually runs at the slowest possible write speed.

If you were patient enough to try writing a million or so times to the same logical sector – then at some point the internal wear leveling processor would have transparently assigned it to a different physical address in flash by then. This is invisible to you. You think you’re still writing to the same memory – but you’re not. It’s only the logical address that stays the same. In fact you are stuffing data throughout the whole physical flash disk – while operating at the slowest possible write speed.

It will take orders of magnitude longer wearing out the memory in this way than in the rogue data recorder example. That’s because writing to flash is not the same as writing to RAM, and also because writing to a flash SSD sector is not the same as writing to a block of dumb flash memory. There are many layers of virtualization between you and the raw memory in an SSD. If you write to a dumb flash memory chip successively to the same location – then you can see a bad result quite quickly. But comparing dumb flash storage to intelligent flash SSDs is like comparing the hiss on a 33 RPM vinyl music album to that on a CD. They are quite different products – even though they can both play same music.

…Later:- Clarifying Flash Endurance Specifications

I’ve added this footnote in response to some reader emails which asked about the variation in flash endurance specs quoted by different flash SSD oems.

Like any semiconductor related spec (such as memory speed, or analog offset voltage in an op-amp, or failed memory blocks in a high density RAM chip) – there’s a spread of performance which depends on the process and may vary over time in the same wafer fab, or at the same time when chips are made in different fabs within the same company.

A spec such as 100k or 1 million or 10 million erase-write cycles – is a business decision made according to market conditions – which gives generic semiconductor buyers a confidence level that if they buy 1 million chips – then the reject rate – of those that will fail due to process tolerances – will be acceptably low. The shape of the distribution curve may not actually be gaussian – but there is a distribution curve in there which is implied by the published specs.

Due to process variations between oems (some designs will be automatically shrunk from old designs, other layout geometries may be recompiled or optimized for that particular process point) there will be vast differences between the endurance from different chipmakers.

As the generic semiconductor flash market doesn’t place a premium on this spec – the “datasheet” published standard number will gradually improve at a slow pace (every 2-3 years) even if some oems are making chips today which are 10x better.

If I was designing a high reliability flash SSD – I would want to get into the process details – qualify devices and order them to my own spec. Currently SSD volumes are too low – to give much buying power with flash chipmakers. Therefore few SSD oems are able to buy flash chips qualified to their own specs. (This is done by batch testing samples and by negotiation with the fab where the chips are made.) Some SSD oems make their own flash chips – and while this gives them more control over the end to end process – it does not necessarily mean that they start with the best chips.

See also:- is eMLC the true successor to SLC in enterprise flash SSD?- which so called “enterprise MLC” tastes the sweetest? How come there are so many different and contradictory reliability claims?

PC Knowledge sharetank