Interrupt Race Conditions |
|
|
|
|
Written by Graham Stoney
|
|
The classic bug in interrupt-driven systems is the race condition where an interrupt occurs in a section of code that the designer didn't contemplate. This is one of the few advantages of using polling: At least in a polling-based system, the event can only occur in the part of the code that polls for it.
In a system where interrupts nest, you also have to deal with worst-case stack growth which occurs only when all possible interrupts occur at the same time; and this is likely to be so rare that it will never occur during testing. Instead, the device just fails seemingly randomly in the field. So you need to prove analytically that the stack is deep enough to handle worst case. I accomplished this in one system I worked on by writing a Perl script which parsed all the source code looking for stack pushes and pops, then exhaustively simulated every combination of interrupt occurring on every instruction in the code. Not only did this calculate an exact worst-case stack depth, but it also found many race conditions I hadn't considered. Bear in mind that a successful system will endure orders of magnitude more use hours in the field than it ever will in testing; so you can't assume that it won't fail just because you think you've done a lot of testing. Most likely you've done only a tiny fraction of what will happen in the field.
|