Software update goes bad – International Space Station lost and then found

When it comes to installing updates, do you lead, follow, or get out of the way?

Even if you spend what feels like an eternity in change control meetings, and have everything from Plan B all the way to Plan Z just in case the A-plan fails, you’ll know that things can still go wrong.

And whether you’ve planned for hours or weeks, you’ll want a rollback plan, too.

After all, there’s still a chance that any of Plan x, x ∈ {A..Z} might go pear-shaped.

So spare a thought for NASA flight controllers.

They lost contact with the International Space Station (ISS) for a nerve-racking three hours during a recent software update:

Flight controllers were in the process of updating the station’s command and control software and were transitioning from the primary computer to the backup computer to complete the software load when the loss of communication occurred.

Fortunately, controllers in Houston, Texas, were able to get in touch with the ISS via Russian ground stations as it passed over the other side of the world, and, in NASA’s own words, “instructed the crew to connect another computer to begin the process of restoring communications.”

There you have it: when on the road, or in the air, always carry a spare computer! (Or several, if you’re Steve Wozniak.)

Interestingly, five years ago we reported the unexpected news that a computer virus had infected Space Station computers.

So this isn’t the first earth-style IT crisis that the ISS has endured.

By the way, if you’re not sure which of the above-mentioned patching camps you belong to, why not take a listen to this podcast, recorded last year by Paul Ducklin and Chester Wisniewski?

In this episode, entitled Patching: Prepare, Prioritize and Proceed, the dialectic duo take a look at the challenges of keeping on top of those security updates:

(19 July 2012, duration 15’25”, size 11MBytes)

Image of Expedition 34 Badge from the official crew page. Click through to find out about the people up there during the outage.