HPE warns of impending SSD disk doom

Techies are used to worrying about the longevity of their data storage. Hard drive heads used to have a nasty habit of crashing before laptops introduced software to protect them from drops and power surges. ‘Data rot‘ can damage your DVD storage, and magnetic tape can suffer as its substrates and binders degrade.

But what about the firmware, which contains the instructions for reading and writing from the media in the first place? That’s now an issue too, thanks to HPE. It had to recall some of its solid-state drives (SSDs) last week after it found that they were inadvertently programmed to fail.

The company released a critical firmware patch for its serial-attached SCSI (SAS) SSDs, after revealing that they would permanently fail by default after 32,768 hours of operation. That’s right: assuming they’re left on all the time, three years, 270 days, and eight hours after you write your first bit to one of these drives, your records and the disk itself will become unrecoverable.

The company explained the problem in an advisory, adding that an unnamed SSD vendor tipped it off about the issue. These drives crop up in a range of HPE products. If you’re a HPE ProLiant, Synergy, Apollo, JBOD D3xxx, D6xxx, D8xxx, MSA, StoreVirtual 4335, or StoreVirtual 3200 user and you’re using a version of the HP firmware before HPD8, you’re affected.

You might hope that a RAID configuration might save you. RAID disk implementations (other than RAID 0, which focuses on speed), mirror data for redundancy purposes, meaning that you can recover your data if disks in your system go down. However, as HPE points out in its advisory:

SSDs which were put into service at the same time will likely fail nearly simultaneously.

Unless you replaced some SSDs in your RAID box, they’ve probably all been operating for the same amount of time. RAID doesn’t help you if all your disks die at once.

This bug affects 20 SSD model numbers, and to date, HPE has only patched eight of them. The remaining 12 won’t get patched until the week beginning 9 December 2019. So if you bought those disks a few years ago and haven’t got around to backing them up yet, you might want to get on that.

HPE explains that you can also use its Smart Storage Administrator to calculate your total drive power-on hours and find out how close to data doomsday your drive is. Here’s a PDF telling you how to do that.

Unfortunately, HPE didn’t include the same kind of warning that Mission Impossible protagonist Jim Phelps got at the beginning of every episode: “This tape will self destruct in five seconds”.

But then, 117,964,800 seconds is a little harder to scan. In any case, your mission, should you choose to accept it, is to back those records up.