Why software has bugs

As my wife walked past a meeting, the word 'bug' crept in and she asked if we were discussing a nature program.

Little did she know, we were discussing a small, fixable issue within our software system, however it got me thinking.

So I want to take a few minutes to explain the what, the how, the why and ways we can try and prevent bugs in software

The "What"

As described by Wikipedia:

A software bug is an error, flaw or fault in the design, development, or operation of computer software that causes it to produce an incorrect or unexpected result, or to behave in unintended ways.

As long as software has existed, bugs will exist. It's a complex issue and there are too many reasons to spell out in this post, however let's look at some areas we can control the flow of bugs.

Misunderstanding of requirements

At every point of the software development lifecycle (SDLC), there is room for miscommunication and misunderstanding.

Plan -> Analysis -> Design -> Implementation -> Maintainance -> ...

As we now know a bug isn't just a "mistake" within the code, but can also be an unintended or unexpected behaviour. These can creep in as early as the initial planning phase of work.

It is important to make sure the stakeholders and developers are aligned on the behaviour through clear, concise functionality, generally split up into manageable chunks of work (stories).

Complex system and scaling

The more complex the system, the easier it is to have bugs. Integrations, third-party or in house, can lead to gaps in a system because of the inherent nature of a system that is split into more than one concern. Take a microservice architecture as an example, with services split into one concern each, the developer must take into account a lot more variables than a monolithic system where everything is in one place. (this of course comes with its own cons).

Naturally a system will become more complex as a system scales in capability and features. A simple system with a few lines of code, with one clear concise purpose can soon become complex with libraries with plenty of conventions, additional functionality and frameworks with configurable parameters. As the scaling occurs, this usually includes adding more contributors into the system which can lead to drift in a system development.

Lack of adequate testing

There's not enough time to run through how important a thorough testing strategy is, however if there are holes or gaps in the system tests, it can be reason bugs appear. There are many areas of testing to discuss, but the methodology we use at Countingup is the 'testing trophy', to make sure we have confidence in our changes.

Within these types of test, we should cover specific logic, integrations with multiple functions and a whole system behaviour respectively. However, it is important to acknowledge we can't test everything. We can use aides to help such as coverage reports, but at the end of the day, it's important to find the balance between time/value of tests as we can find ourselves paralysed and unable to release features in a good timeframe. I'd always prefer quality to quantity either way.

The concept of shift-left testing, which could warrant its own post, presents the idea that developers need to be engaged in testing from a very early stage of the SDLC. This is a good method of identifying and preventing bugs in code or testing strategy from an early stage, the earlier, the better.

Prevention

At Countingup, we have numerous methods of identifying and mitigating bugs as well as a solid process to process bugs when they appear. These have been covered in other blogs such as our automated test suite which run periodic end-to-end tests and our "fail fast" philosophy through the use of canideploy slack command to allow small, fast changes which are easy to revert and helps streamline production releases.

Summary

There are plenty of areas we haven't touched on, but in general, there is no perfect software and each project will have different tolerances to bugs. The key here is to be aware of mitigation methods and be proactive in testing the system well enough that there are no major bugs that affect service. Humans make mistakes, and so software has mistakes. At Countingup we have taken time to ensure we can proactively prevent as much as possible, and when not if bugs occur, we react quickly and have an adequate process to handle and prioritise them.