Yesterday I wrote that software engineering is one of the most mentally challenging tasks that humans perform, and this article about optimistic programming might help non-programmers understand why.
I remain astonished that so many developers continue to write code that assumes relations like1+1=2 are true. In fact, 1+1=0fe23b9, sometimes. Or -65535. Or any of innumerable other values.1+1=2 only when everything works perfectly. Do your programs work perfectly all of the time? The evidence suggests that most of us create imperfect code. Lots and lots of bugs.
Yet when writing the code, we labor under the assumption that there will be no bugs. Bugs are largely treated reactively: Chase 'em down when they appear rather than anticipate how they may arise and appropriately taking defensive action.
1+1!=2 if any of the parameters are globals and a reentrancy problem stomps on part of a value. Badly encapsulated data has the same problem. A null pointer passed to a summing function can return utterly unpredictable results.
Apparently, gauged by the code I see, none of us has ever dereferenced a null pointer.
Computer programs are literally chaotic systems: they are deterministic (i.e., they do exactly what you tell them to, every time) and tiny changes can have wide-ranging effects (change of a single bit will completely alter the behavior of the program). Not only are there an infinite number of errors that could occur, there are an infinite number of right ways to reach a goal as well. Navigating between these two infinities is hard. A program may have bugs that only appear once every thousand years, or bugs that result in very subtle output errors that are difficult to notice. Oftentimes a programmer will see an erroneous result but be unable to find the place in the code that's causing it.
One of the simpler problems to explain to a non-engineer is the halting problem. Everyone has experienced it: you're using a program and it suddenly becomes non-responsive: it crashes, goes into an infinite loop, or otherwise just stops responding to you. These are all examples of a halt, and the halting problem is the question of whether or not it's possible to determine if a particular program will ever halt. More formally:
Given a description of a program and a finite input, decide whether the program finishes running or will run forever, given that input.
Alan Turing proved mathematically that the halting problem is undecidable. That means that even if you have the source code for a program and know exactly what input will be put into it, there are some programs for which it is still impossible to determine whether or not they will halt. Typically, it's easier to determine if a program will halt than if it won't. If you run it, and it halts, then there you go! However, if you run it and it doesn't halt, you can't know that it won't halt if you just wait a little longer. How long do you have to wait? There's no way to know. The second after you stop waiting and decide that the program won't halt, it might halt. (Because actual computers have finite memory it is theoretically possible to determine whether a program will halt or not simply by enumerating every possible memory state, but practically this cannot be accomplished because there are far more memory states than there are atoms in the universe.)
So there's a very high-level description of one kind of bug that is known to be unsolvable. By being very careful, very smart, and very thorough it's possible to limit the number of bugs in a program to a level that doesn't render the program unusable, but improvements are asymptotic. The first 90% of the bugs take 10% of the time, the next 9% of the bugs take 10% of the time, the next 0.9% of the bugs take the next 10% of the time, and so on, until you've spent far more than 100% of the time alloted!
And remember: because the system is chaotic a single bug can result in catastrophic failure. Jaron Lanier has written extensively about Gordian software and why we need to move towards a non-chaotic software paradigm that is inherently fault-tolerant, but there's essentially zero progress on that front at the moment. Software bugs are here to stay.