2022-03-30

Hard truths about software

The global outrage about Russia's invasion of Ukraine in February 2022 has also highlighted the extent to which modern society depends on software. More specifically, on software written by other people. Much of this software comes from open source projects or makes extensive use of open source libraries. This is true also for commercial software that might do so legally or (often, I suspect) surreptitiously by simply copying code from open source projects where these projects have licence conditions that do not allow use in proprietary software.

The Open Source Initiative reports that protestware has been included in many open source projects, some of it malicious – at least one case was uncovered where the code was introduced by one developer to delete all files on devices with IP addresses in Russia or in Belarus. An updated version simply displayed an antiwar message. This piece of code is used in another project and another (as detailed by Lyran Tal at snyk.io), eventually impacting Vue.js which is a commonly used JavaScript framework for building websites.

The complexity of modern software cannot be overstated. Everyone builds applications that rely on a multitude of libraries (even the simple computational models that I write in Python include many libraries about which I know very little and these models run only in a text terminal interface) and, indeed, on an operating system. This can be seen in the size of the executable files: the latest version of Microsoft Edge is a compressed download of over 100 MB. There is no way that a single person can have an overview of the entire source code of the browser and I doubt there is a team that would fit into a single room that does.

Furthermore, software is everywhere. Since computing power became cheap many decades ago, common electronic devices no longer contain custom control hardware but rather have a general purpose computer inside with specific software to handle the connected hardware components. This is true for everything from your car to your phone, via your dishwasher and your smart lightbulb.

With commercial software the user gets the ability to sue someone if things go wrong (depending on the exclusions in the agreement, and I suspect there are cases of "not my fault" on the record). However, the sad news is that it is not only a practical but also a theoretical impossibility to fully guarantee software behaviour.

In his 1951 doctoral dissertation, Henry Gordon Rice proved that no non-trivial semantic property (like, "it never hangs") of a computer program can be universally decided by another program. That is, given a specific property that software should satisfy (e.g. not contain a backdoor) and a program for checking it, that program will always fail to give the correct answer for some of the software to which it is applied. It is however possible to verify the behaviour of some software by a formal verification proof, for example the operating system CertiKOS. Formal verification requires somewhat complicated mathematical modelling but hardware companies such as Intel have been pioneers in using it.

In telecoms, the programming language Erlang was developed at Ericsson for high reliability in the 1980s and is in current use for much of the code of WhatsApp. Substantial work has been done on the formal verification of Erlang software. Advocates of Open RAN (Radio Access Network) technology and telecommunications operators will probably have to keep in mind the hard fact that software reliability for realtime systems (like cars or networks) is orders of magnitude more important than for the likes of the Candy Crush app. Mixing hardware and software from different vendors in a critical realtime application is likely to pose more challenges in terms of verifiable reliability than some think.