• Gary Rollo says lean, agile small companies are the engine room of the nation’s innovation here's why.

Boeing Failures Illuminate Greater Software Challenge

29042019_boeing and technology

Boeing Failures Illuminate Greater Software Challenge

“If x: do y; else: do z.” The beauty of software is in its ability to eliminate human error; a computer can be trusted to execute code accurately every single time, all at incredible speeds while occupying minimal space. But what happens when the code is so complex that the humans writing it cannot identify a bug? As confirmed in the recent tragedies involving Boeing 737-MAX aeroplanes, the computer will still perform exactly as directed.

Software has been adapted and improved over past decades and given greater trust by humanity as it’s progressed. Over time, software has been utilised to control our stock markets, mobile communications, robots in warehouses, rockets (both space and weaponry), aeroplanes and, newly, automated vehicles. Until more recently, the costs of software disasters were almost exclusively monetary. Unprecedented minor flaws in apps or robots were not ideal but almost never fatal. As we begin to trust the software in planes and cars every day with our lives, a new standard of 100 per cent accuracy is required to maintain safety.

While software has constantly improved, it has likewise increased in complexity. The original Onboard Maintenance Function programmed into Boeing 737 planes took two and a half years to develop, including more than 1700 requirements in 32,000 lines of code. Nowadays, Boeing 787 software includes around 14 million lines of code.

The underlying complexity lies within the design of coded programs. A main body is run, calling upon thousands of smaller functions as each line is parsed over. These smaller functions call on smaller functions themselves, each acting as a ‘black box’ – if you input the right arguments, the function returns an output as designed. To execute each function, you need not understand how it works, but simply trust that it does. As such, hundreds of engineers are often responsible for the final product. Not to mention all writing code in their own variation of computer language, accounting for physical and biological problems through a text editor.

Naturally this means that fixing a bug can be extremely complicated. Each individual function is tested rigorously to account for every input case, effectively eliminating human keystroke error. This bottoms-up approach ensures accuracy can be tested at every level. However, when a bug is buried within millions of lines of code, many relying on one another to work as required, isolating and editing an error is sometimes impossible without compromising other functions. As such, updates are typically appendages that override previous code to improve functionality or eliminate bugs, thereby increasing the size and complexity of the software program.

James Somers unpacks this challenge in detail in his cautionary article in The Atlantic. He quotes Nancy Leveson, a professor of aeronautics and astronautics at the Massachusetts Institute of Technology who has been studying software safety for 35 years.

“Software is different. Just by editing the text in a file somewhere, the same hunk of silicon can become an autopilot or an inventory-control system. This flexibility is software’s miracle, and its curse. Because it can be changed cheaply, software is constantly changed; and because it’s unmoored from anything physical—a program that is a thousand times more complex than another takes up the same actual space—it tends to grow without bound. ‘The problem,’ Leveson wrote in a book, ‘is that we are attempting to build systems that are beyond our ability to intellectually manage.’”

As we have seen with Boeing’s malfunction, trusting software too complex for humans to intellectually manage can now be fatal. The real problem, however, is not in the computer and not in the accuracy of the code. It is that humans often fail to identify cases or requirements that are statistically improbable, or impossible under human estimation. Unprecedented physical environments can create scenarios that software engineers were not able to perceive from behind their computer screens. When the system acting is not equipped to perform it acts as exactly as it was told to in the circumstance it believes it is in (e.g. nose-diving due to a bad sensor reading in the 737-MAX case).

While Boeing will no doubt account for the problems that caused the 737-MAX crashes, they will also inevitably have added to the complexity of the plane’s software systems. Modern, model-based coding languages have been developed to make unperceived requirements easier to identify; however, it is costly to rewrite entire software systems in a new language.

Looking forward, our next challenge will come as automated vehicles are rolled out. Drivers every day encounter scenarios on the road they never could have expected to see. With over 100 million lines of code controlling cars, these systems are even more complicated than aeroplane software. As Somers identifies, “when you’re writing code that controls a car’s throttle, for instance, what’s important is the rules about when and how and by how much to open it. But these systems have become so complicated that hardly anyone can keep them straight in their head.” Before more lives are put in the hands of software every day, underlying system and language designs need to be simplified so that engineers can be certain of our safety.

The beauty of software is in its ability to eliminate human error; but what happens when the code is so complex that the humans writing it cannot identify a bug? Click To Tweet
INVEST WITH MONTGOMERY

Lachlan is a Research Analyst at MGIM. Lachlan joined MGIM in July 2018 after studying at the University of California, Berkeley where he holds a Bachelor of Arts (Applied Mathematics and Computer Science).

This post was contributed by a representative of Montgomery Investment Management Pty Limited (AFSL No. 354564). The principal purpose of this post is to provide factual information and not provide financial product advice. Additionally, the information provided is not intended to provide any recommendation or opinion about any financial product. Any commentary and statements of opinion however may contain general advice only that is prepared without taking into account your personal objectives, financial circumstances or needs. Because of this, before acting on any of the information provided, you should always consider its appropriateness in light of your personal objectives, financial circumstances and needs and should consider seeking independent advice from a financial advisor if necessary before making any decisions. This post specifically excludes personal advice.

Why every investor should read Roger’s book VALUE.ABLE

NOW FOR JUST $49.95

find out more

SUBSCRIBERS RECEIVE 20% OFF WHEN THEY SIGN UP


Comments

  1. Would just like to point out, that as is being discussed in IT circles around this issue, This was NOT a coding issue or a bug in complex code but a design decision made by project or upper management around the risk controls for this software functionality.

    Most coders would balk at a single point of failure in code like this.

    Management chose (and got approval from the FAA to use a single sensor input) to design it this way that a single sensor would be relied apon. The code worked as expected and designed.

    We run into the same issue with Cloud and IT Automation. you design it to work a certain way, then that action gets done thousands or millions of times. The decision of how the action runs becomes more important as the inputs and outputs need to be impeccable. if you need a higher level of certainty you add checks to the inputs and outputs. I imagine it is the same with your software models on equities.

    Unfortunately a “Software bug” is an easy scapegoat for the problem as most people don’t understand it and it’s an easy faceless person to blame.

    If it was just a “bug” it would have been resolved in a very short period of time. Unfortunately it looks like their design decision requires a rework of how this system functions (the inputs and outputs). Hopefully leading to better regulation standards as to what level of redundancy on sensors is required.

    Garbage in > Garbage Out

Post your comments