Recently, we had the International Space Station suffering an update-related glitch.
Now it's the Curiosity Rover, all alone on Mars (or so we are led to believe), suffering a memory-related outage.
The Mars Science Laboratory project reported late last week that the Curiosity Rover was switched over to its backup computer system, following what sounds like a problem of memory corruption.
Memory corruption, if undetected, can lead to insidious problems that get worse and worse over time.
Incorrect data from part X of the system may lead to miscalculations that lead to concomitant problems in part Y, and so forth, especially if the corrupted data is in a part of the operating system code or data that is critical to the safe and secure functioning of the system.
→ Novell's long-gone and unlamented ABEND, and the Windows Blue Screen of Death, are two examples of extreme response by the operating system to unexpected corruption or misbehaviour. These force the entire system to freeze deliberately, typically after dumping some diagnostic information, rather than allowing the system to keep running in an unstable condition.
Amusingly, the Curiosity team refer to the the Rover's two computer systems as the "A-side" and the "B-side," and operations have now switched to the B-side.
That's not B-side as in an old seven-inch single, where the flipside of the record was a sort-of second-rate filler track that wasn't supposed to detract from the main musical attraction on the A-side.
Curiosity's two systems are a redundant pair, so that at any time, one of them is the main computer and the other a backup capable of carrying out just the same operations.
The plan now, therefore, is to repair the A-side so that it can act as the B-side's backup.
The hardware that we fly is radiation tolerant, but there's a limit to how hardened it can be. You can still get high-energy particles that can cause the memory to be corrupted. It certainly is a possibility and that's what we're looking into.
You've probably heard the "high-energy particles" excuse from IT any number of times before, because it's geek humour for "we know it happened but we just can't say why."
This time, however, it might just be true. This might really be down to cosmic rays!
The dangers of memory corruption, of course, aren't just limited to problems with subatomic fragments of energy.
Deliberate attempts to corrupt memory can lead to buffer overflows and other exploitable vulnerabilities that can allow untrusted content, such as the contents of an innocent-looking web page, to trigger insecure activity.
It's this sort of deliberate exploit that that allows cybercriminals sitting almost anywhere in the world (or space, for that matter) to perform drive-by downloads on your computer, potentially bypassing the usual warnings or security checks that kick in before some new piece of software is installed on your computer.
So the lesson to learn from this story, even if you're programmer of mere earthly applications, is that error and consistency checking in your code is always important.
Even if you think you're in a code section in which you feel sure that all the inputs have already been checked for correctness, it's worth checking again.
Not just cosmic rays, but also international cybercriminals, may arrange things so that your assumed-to-be-good data isn't.
Image credit: NASA/JPL-Caltech