Should the "Reboot! Shut up and reboot!" theory be applied to programs?

Filed Under: Featured, Security threats

Tech-savvy website Ars Technica recently invited comments on an interesting thought about programming.

"Should programs randomly fall on their swords?"

Actually, they didn't quite put it like that - indeed, they didn't make it clear whether programs ought to exit gracefully but needlessly after a random time, or whether they ought to be asynchronously killed off on a random basis by some monitor process.

→Such a monitor would be the opposite of a traditional watchdog, a process that keeps its eye on other programs and warns you when they break. This would be a process that breaks other programs, then tells you it's done so.

But they did wonder about making programs exit even if they didn't need or want to, for the greater good of the operating system as a whole.

My first reaction was, "Why not?"

There's a school of thought that says a degree of unpredictability in software, especially long-running network software, can be very handy indeed.

Don't wait, say, two seconds after a failed connection attempt so that you coincide precisely but permanently with a similar every-two-second problem in some other process. Wait two seconds plus a random interval that's different every time.

Don't arrange everything so predictably in memory that if there's an exploitable bug, hackers can reliably work out where to poke their knitting needles. Mix things up a bit so an attacker has to guess, and might very well get it wrong.

And, of course, in anything cryptographic, good quality randomness is vital, lest you turn a problem that should be computationally infeasible into one that is merely difficult or time-consuming.

→Debian once removed code from its kernel because it looked unpredictable. It was supposed to be - it was part of the random number generator. After getting "fixed' it became so predictable that cryptographic keys that should have been unguessable could be brute-forced in seconds or minutes.

Forcing programs to have a short outage every now and then is a bit like companies that require senior executives to use at least some of their annual vacation time each year in unbroken chunks.

Not only does it force the individual to take a much-needed rest, it also mitigates against corruption in the company by getting an alternative hand on the tiller every now and then.

By my second opinion was, "No way!"

Naturally, you should subject your code to randomly-generated failures as a regular and important part of testing. (You do test your software against the sort of error you might never have experienced in real life, such as "disk full," don't you?)

This is especially true for online software, which is frequently developed on a fast, reliable, state-of-the-art local area network, but deployed over slow, laggy, flaky links.

But deliberately breaking code just to make it restart, hopefully with any ills of the past behind it, could ironically make things worse.

That's a little bit like pulling your car to the side of the road every few minutes to make sure the tyres don't overheat: a useful precaution in an emergency where you know there's a tyre fault, but a pointless waste of time if there isn't.

In fact, you can argue that getting into the habit of random "corrective process termination" could actually mask the symptoms of a fault, or lead to known problems being mitigated by accident, and thus never getting proper corrective attention.

→Tech support staff don't usually say "shut up and reboot" (with apologies to Dogbert) because it's scientific. They say it because it isn't scientific, but it very often works, and improves their call closure rates in the long run.

So randomly self-breaking programs sound a little bit like those rules that say things like, 'You must change your password every 45 days."

When an online service tells you that, are they implying that they actually get breached fairly frequently? That if they do get breached they probably won't realise?

Actually, you should change your password if you think you need to.

And if you think you need to, you should change it then and there, rather than saying to yourself, "My next 45-day mandatory password update is coming in a while, so I'll wait until then."

, , , , ,

You might like

11 Responses to Should the "Reboot! Shut up and reboot!" theory be applied to programs?

  1. Anonymous · 400 days ago

    If programs are killed randomly, Murphy says the next time it happens will be when you're trying to demonstrate your system stability to executive management.

  2. Frank · 400 days ago

    Two words - Chaos Monkey! Google will fill in the gaps.

    • Paul Ducklin · 400 days ago

      But is that the *right* way to do it?

      is it good because it works, or does it work because it's good?

  3. shtiasa · 400 days ago

    One disadvantage might be that malicious individuals may use this random terminator watchdog as a smokescreen for their own activities, if the manage to get control of this 'watchdog program.And who can guarantee that this program is invulnerable? These are the reasons I would say that this isn't a very good idea.

    • Paul Ducklin · 400 days ago

      They don't even need to attack the "antiwatchdog." The fact that things routinely terminate where you might not expect could end up being a smokescreen of its own.

      "Hey! Looks like an exploit ran on the server - maybe we've been owned!"

      "Naaah, it'll just be the Chaos Hamster kicking in. She'll be right."

  4. Martin P · 400 days ago

    "Should programs randomly fall on their swords?"

    A practice Microsoft has been pioneering for decades.

  5. Bolek · 400 days ago

    Not really about random killing: I like my HW, OS and programs being stable for extended periods (once it meant several weeks of continuous scientific computing, 100% CPU). But I usually close programs I don't need and namely, browser tabs. I also turn off the computer for night when I'm not working (unless it's computing). It works better than the computer of my colleague, who frequently has tens to a hundred browser tabs open and the computer running for weeks without restart. Makes if easier to know my way among the windows and faster to switch windows (no huge swapping). It also forces me to really read the pages I open before I turn the comp off. So I'm having less tabs open, but I read the content.

    I wonder if turning the computer off is better for security - the computer cannot get either malware or antivirus updates while off. So it is protected, but may be more vulnerable after start.

  6. MikeP_UK · 400 days ago

    Software testing is abysmal these days. We used to do both scripted and random testing, so introducing the 'user finger test' and found unexpected problems that were cured.
    If software randomly dies, what happens to mission critical actions? Imagine you are developing a training course for new software and using word processor, graphics, spreadsheet, presentation suite, etc. If any of them fail while you are using them, the delivery timescales/deadlines go out the window. So the whole concept of software shutting down and dying for no good reason is a total non-starter. We users need properly tested software that works correctly straight out the box and stays working all the time it's in use. And is available the next time, and the next, and the next, ....

  7. Thomas · 399 days ago

    I don't know how login codes are written, but wouldn't this be a good place for a program to fall on it's sword? If a login procedure stopped abruptly and notified the entity and the registered account holder after say 10 attempts in 10 seconds, wouldn't that stop brute force attacks, as well as alert the the parties at risk that they were under attack? If a bank, credit card company, business, or whatever was notified it had x number of attacks over a set period of time, it could (if it was serious about security) implement measures to protect itself. If an individual was notified it had x number of attacks on x number of accounts, he or she would at least know it was happening, and implement a strategy to protect herself or himself. I know I would like to have that information if it happened instead of being clueless about it.

  8. Okay, for one thing: workhorsing -- thanks a lot, guys. My computer needed 8 days for that render and you just got through making it run a power cycle for no good reason whatsoever -- and, second, it could lead to billions of dollars in losses for business because of lost work that now needs to be rewritten, and -- *facepalm* -- if the reason they're using is because customers are calling tech support before power cycling their computers after encountering a problem, that just means PBCK not that you should just have every computer spontaneously reboot every half an hour, mostly because the computer is on the other side of the keyboard, but also because Microsoft's already tried that.

  9. 2072 · 392 days ago

    This sounds like an excuse for bad programming practices...

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

About the author

Paul Ducklin is a passionate security proselytiser. (That's like an evangelist, but more so!) He lives and breathes computer security, and would be happy for you to do so, too. Paul won the inaugural AusCERT Director's Award for Individual Excellence in Computer Security in 2009. Follow him on Twitter: @duckblog