Data Recovery on Samsung 980 and 990 NVMe SSD Models & Reliability

We successfully recovered the data for a local Harvard University customer, here in Cambridge MA, from a Samsung 980 Pro NVMe SSD. The customer reported a boot failure, therefore the SSD entering in a read-only state of operation. Ironically, the customer was using this SSD to play Steam games, which is relevant to the following news showing rampant failures on the newer and new Samsung Solid State Drives for 2021-2023.

Samsung SSD Data Recovery and Reliability

Breaking news:

We have come to learn from various blogs and forum posts, that users worldwide are experiencing SSD failures on these particular models. Here are some links for reference:

https://www.tomshardware.com/news/samsung-980-pro-ssd-failures-firmware-update

https://www.techradar.com/news/samsung-990-pro-ssds-are-apparently-failing-fast-and-nobody-knows-why

https://arstechnica.com/gadgets/2023/02/pc-maker-pulls-samsung-pro-ssds-after-users-report-abnormal-health-drops/?utm_source=facebook&utm_medium=social&utm_brand=ars&utm_social-type=owned&fbclid=IwAR0l1blHGhYx3Vmlw07CwSA_TUxrzCKCXa7KU3KM4BvSHSQ4f6Iayk73pok

The following is our interpretation of those Samsung NVMe SSD models, based on our expertise as data recovery specialists:
Historically, as with HDDs, with SSDs, there is a need to increase capacities. Customers demand it, so storage manufacturers do their best to innovate and meet the demand.

To increase storage capacity, it requires new design/technology allowing to cram more physical cells to storing data. With this storage increase, reliability decreases. 

To mitigate for less reliability, more and more complex firmware designs are needed. In particular, the most important technologies are Wear Leveling, Overprovisioning and Error Correction Code (ECC).  

With increased firmware complexity, there are more opportunities for the firmware code to be buggy, break, etc, especially when the manufacturer cannot possibly replicate the various scenarios users worldwide put the SSDs under load.

Below is a short article with a great visual queue... look at the CONs category, where the common denominator is: endurance. It is an older article, so the reviewers have not yet had a chance to criticize on the 3D NAND and now, V-NAND models, endurance statistics - it takes time to gather data...

https://www.kingston.com/en/blog/pc-performance/difference-between-slc-mlc-tlc-3d-nand

With modern SSDs, internal firmware algorithms run in the background at all times with the goal of optimizing wear leveling (a similar experience occurs on SMR HDDs, where the drive is active relocating data, while in idle mode, right?!?)

On SSDs, bytes of data move around without our awareness on the various chips to level the wearing uniformly on the various flash planes. Quite possible that on these V-NAND 980 and 990 NVMe SSD models, something in the firmware code is broken, where the wear leveling and overprovisioning fail to operate correctly. As a result, the amount of write cycles is concentrated to one, or a couple of NAND flash chips, only. With a huge sudden dump of data on the SSDs, thus suddenly driving the write cycles to one NAND chip, the failure possibility is suddenly accelerated despite the high Terabytes Written (TW) and Program/Erase (P/E) expected thresholds. It is plausible to see how something like this could fall through the cracks, even among the best of SSD software and hardware manufacturers.

Or, of course, it could be an overlooked piece of code mis-reporting the health status on either the SSDs, or on the SMART reporting tools (e.g. Samsung Magician, CrystalDiskInfo, HDDScan, Smartmontools, etc). However, it would be expected to be too embarrassing of an issue for a power house company like Samsung to let that fall through the cracks.

Time will tell.