Boeing Failures Illuminate Greater Software Challenge
“If x: do y; else: do z.” The beauty of software is in its ability to eliminate human error; a computer can be trusted to execute code accurately every single time, all at incredible speeds while occupying minimal space. But what happens when the code is so complex that the humans writing it cannot identify a bug? As confirmed in the recent tragedies involving Boeing 737-MAX aeroplanes, the computer will still perform exactly as directed.
Software has been adapted and improved over past decades and given greater trust by humanity as it’s progressed. Over time, software has been utilised to control our stock markets, mobile communications, robots in warehouses, rockets (both space and weaponry), aeroplanes and, newly, automated vehicles. Until more recently, the costs of software disasters were almost exclusively monetary. Unprecedented minor flaws in apps or robots were not ideal but almost never fatal. As we begin to trust the software in planes and cars every day with our lives, a new standard of 100 per cent accuracy is required to maintain safety.
While software has constantly improved, it has likewise increased in complexity. The original Onboard Maintenance Function programmed into Boeing 737 planes took two and a half years to develop, including more than 1700 requirements in 32,000 lines of code. Nowadays, Boeing 787 software includes around 14 million lines of code.
The underlying complexity lies within the design of coded programs. A main body is run, calling upon thousands of smaller functions as each line is parsed over. These smaller functions call on smaller functions themselves, each acting as a ‘black box’ – if you input the right arguments, the function returns an output as designed. To execute each function, you need not understand how it works, but simply trust that it does. As such, hundreds of engineers are often responsible for the final product. Not to mention all writing code in their own variation of computer language, accounting for physical and biological problems through a text editor.
Naturally this means that fixing a bug can be extremely complicated. Each individual function is tested rigorously to account for every input case, effectively eliminating human keystroke error. This bottoms-up approach ensures accuracy can be tested at every level. However, when a bug is buried within millions of lines of code, many relying on one another to work as required, isolating and editing an error is sometimes impossible without compromising other functions. As such, updates are typically appendages that override previous code to improve functionality or eliminate bugs, thereby increasing the size and complexity of the software program.
James Somers unpacks this challenge in detail in his cautionary article in The Atlantic. He quotes Nancy Leveson, a professor of aeronautics and astronautics at the Massachusetts Institute of Technology who has been studying software safety for 35 years.
“Software is different. Just by editing the text in a file somewhere, the same hunk of silicon can become an autopilot or an inventory-control system. This flexibility is software’s miracle, and its curse. Because it can be changed cheaply, software is constantly changed; and because it’s unmoored from anything physical—a program that is a thousand times more complex than another takes up the same actual space—it tends to grow without bound. ‘The problem,’ Leveson wrote in a book, ‘is that we are attempting to build systems that are beyond our ability to intellectually manage.’”
As we have seen with Boeing’s malfunction, trusting software too complex for humans to intellectually manage can now be fatal. The real problem, however, is not in the computer and not in the accuracy of the code. It is that humans often fail to identify cases or requirements that are statistically improbable, or impossible under human estimation. Unprecedented physical environments can create scenarios that software engineers were not able to perceive from behind their computer screens. When the system acting is not equipped to perform it acts as exactly as it was told to in the circumstance it believes it is in (e.g. nose-diving due to a bad sensor reading in the 737-MAX case).
While Boeing will no doubt account for the problems that caused the 737-MAX crashes, they will also inevitably have added to the complexity of the plane’s software systems. Modern, model-based coding languages have been developed to make unperceived requirements easier to identify; however, it is costly to rewrite entire software systems in a new language.
Looking forward, our next challenge will come as automated vehicles are rolled out. Drivers every day encounter scenarios on the road they never could have expected to see. With over 100 million lines of code controlling cars, these systems are even more complicated than aeroplane software. As Somers identifies, “when you’re writing code that controls a car’s throttle, for instance, what’s important is the rules about when and how and by how much to open it. But these systems have become so complicated that hardly anyone can keep them straight in their head.” Before more lives are put in the hands of software every day, underlying system and language designs need to be simplified so that engineers can be certain of our safety.The beauty of software is in its ability to eliminate human error; but what happens when the code is so complex that the humans writing it cannot identify a bug? Click To Tweet