Code that survives is good at surviving

Good code survives because it creates an environment in which programmers survive. The environment is important. Software development is an ecosystem. This is because programmers inhabit codebases, but there are other competing inhabitants too. The things that survive in an environment are the replicating entities or the entities that spread and grow and their hosts. Whatever meets these criteria, whether biological or memetic, counts as an inhabitant in the environment.

Code

Code survives, or at least configurations of code do. Sequences of code can be copied, altered, deleted, or stay. These sequences and relations are the thing that is replicated with an opportunity to change. Thus, we can call them memes in the original sense. Some of the structure survives the copying process, and we give names to that which survives to enable discussion. When we name a thing, its chances of survival go up because it becomes a bounded concept. We document protocols, not sets of ten lines of operations. We write comments for functions and procedures, not every 100 bytes of machine code. We capture the recognisable patterns and how the sequences interact to bring about a change. These named features will be carried through the copying and mutation steps.

Conversely, it’s not a great idea to assume code that survives is good for developers and clients. After all, anti-patterns exist. The attributes allowing code to survive don’t match what we normally call good or valuable code. Unfortunately for us, code that survives doesn’t equate to high-performance code. Just because code is static doesn’t mean it’s correct or even provides users with value. No, indeed, code that survives has attributes perfectly tuned to make the code survive. That is all. It’s good, but not good for us.

Beyond surviving, some code gathers momentum and picks up programmers or codebases in which to survive. Where other code stays put and lingers in only one codebase, some code drags in other libraries or ways of thinking and becomes even more robust. They bring an environment with them. They infect minds with their values. This ability to spread to new hosts makes a piece of code viral. It’s different again from a pattern in that it’s not necessarily a self-forming system or structure of relationships. But when it does spontaneously form, it maintains its idioms very strongly.

There are a few recognisable attributes that make code inviting to a host. Here are a few that stand out.

  • Solves your problem:
    • It’s easy to see the value of the code—what problems it solves.
    • You can tell it will solve a real problem you are having right now.
  • Unambiguous:
    • Is is easy to understand what the code intends to do.
    • Examples show how it works with glue code.
    • It is something you can touch; you can see the source.
  • Highly rated:
    • There is a community of positive comments.
    • Products made with it show good results.
    • ‘Not getting fired for buying IBM.’
  • Safe:
    • There are no known issues with the library.
    • The community has no major outstanding concerns or caveats.
    • The license is compatible with your use case.
  • Slot-in replacement:
    • It is easy to add to an existing codebase.
    • It could even be part of the language in the near future.
    • It could be as simple as copy-pasting1 it in.
  • Gentle start:
    • It has easy-to-read documentation or is designed so that documentation is unnecessary.
    • The API is quick to get started with, and defaults are sensible.

I am torn to mention other things experienced developers care about, and remember, not everyone considers these. Even though they make a library valuable to someone with experience, lacking these won’t immediately make it less likely to go viral. The inexperienced outnumber the experienced. It’s a quality of the environment, and therefore part of the situation we find ourselves in.

  • Deep on inspection:
    • A small API with extra depth and options hidden under the surface shows care about someone needing to refine their use later in development.
    • A shallow API with everything completely hidden will become a burden later on in a project, so it will repel experienced programmers.
  • Clear error messages: or good logging features.
    • Anything that shows the library developer knows terrible things happen and understands people are going to want to figure out what they were and why.
    • Scripting languages that are perfect in every way but don’t have debug output or breakpointing built-in become dangerous time-sinks.
  • Awareness or compatibility with more platforms:
    • It’s not only that the code will be compatible when there is a shift to a different platform but also that a multi-platform library is less likely to have structural issues limiting its usage to single development environments.
  • Lots of examples of usage:
    • Documentation is one thing, but a suite of tests and examples
      • shows you how things are done, and
      • proves things are working.

Vestigial organs

Much like our appendix or tailbone, codebases can carry code that no longer serves a purpose. As long as it isn’t hurting anyone and remains out of sight, it can quietly take up compilation time and hard drive space without being disturbed. We carry around code that doesn’t work for several reasons. Sometimes, it’s physical, such as when the code inhabits the same files or folders as other code we actively use. Sometimes, it’s historic code for which there is little reason to inspect how or whether it works. Any old code we didn’t delete when it ceased being necessary will remain for months or years.

Getting rid of unused code can be difficult. If you don’t include unit tests, it is possible to use code coverage tools to see what is used in your codebase. You might use higher-level tests, such as integration or end-to-end tests, to deduce what code isn’t really required. But most developers don’t think to do this kind of proactive work to discover unnecessary code. In some cases, it’s impossible because your project might be a library for others, and you cannot run their code to review usage.

There are cases of faulty code being carried along as safe and trustworthy. When a library has proven, through positive evidence to be valuable and safe, vestigial faulty code can hide in less-travelled places. An otherwise safe library can be rendered a security risk when a route through the code previously untested becomes a new hot spot. High code coverage can help here, but nothing will protect you against incorrectly written tests or tests proving the code matches an incorrect design.

We take code with us from place to place for many reasons, but the two that crop up the most are these.

  • Old code is comfortable. A worse but familiar API is better than a new one for those not fed up with it.
  • Cleaning up before bringing it over is work. This means untested but existing code is preferable to new or unknown code with many tests.

You can see the danger in this. We trust the old code we have dragged from project to project, but when we finally start to use it, we find bugs and missing features. Individual bugs don’t feel like a significant cost. But together, an old codebase can contain hundreds or thousands of traps for an unwary developer.

Paradigms and mental models

Paradigms leak into how programmers think about the codebase and spread without impediment. When code pushes against an entrenched paradigm, it can be rejected during code review as not aligned. Familiarity is very comforting to many programmers. Unfamiliar code doesn’t have to be wrong for us to reject it. It’s a clash where we equate foreign-to-our-thinking with being technically wrong.

We should define wrong code as producing incorrect behaviour for given inputs. But when a reviewer claims code is only correct if it follows style and paradigm, then our code isn’t correct, even if it would add value to the customer. So, we must understand the environmental aspect again. Correct code, as in functionally correct, is only correct regarding the spec, even with comments explaining the reason why the change was made. Code exists in the environment but also creates one in which programmers will make further changes.

If the new, technically better, unaligned code makes it into the codebase, it creates tension. Someone may come along and clean it up back to the original paradigm, possibly breaking it, even if it was objectively better left untouched. I have seen some programmers reverting valuable optimisations because they did not see the value in the change or preferred the paradigm-aligned code over fast or flexible code. The spec is important, but you must also be aware of the environment in which the code resides.

Idioms and processes

Traditions spread too. This is less about a paradigm than about a habit or a technique. Idioms can stick around beyond their value until they become detrimental, and then it can take a while for people to break the habit of reintroducing them.

Getting C programmers to program C++ idiomatically takes a lot of work. Helping them migrate away from raw pointers for objects and moving to shared pointers and value types has been difficult. Trying to get modern C++ programmers to stop using for loops and prefer using algorithms for performance and correctness has been difficult due to the infectious nature of raw loops in code.

Politics

If some code was hard to write, we naturally feel it has value. The feeling can be pretty strong. Deletion of defective code becomes politically tricky. No matter how well your source control works, some people don’t want to see working code removed from the main codebase. This is an anti-pattern because unused code has no value, and any code in a codebase limits what code you can add and how you can change the code. Delete the dead code.

Complex code also hangs around longer because people fear the sunk cost. Even if the complicated code isn’t used, the fear of writing it again is too intense for some to bear. People fear the cost of rewriting complicated code but for some reason don’t fear maintenance or knock-on costs as much as they should. Bad code exploits this as a survival trait. The more difficult it looks to write, the more likely it will get to stay around. Not only that, but it breeds too. A programmer surrounded by complicated-looking code is inclined to add more complicated code. The opposite may also be true. In line with the broken windows theory, simple, clean, good code reinforces good practices by its presence, but any dirty code lowers the bar for what is acceptable.

The most complicated, unused, slightly wrong, poorly performing, untested code can hang around for a long time if it was difficult to develop, especially if the original author is still part of the organisation and even more so if they are now a senior staff member. It’s not uncommon to find architects inventing or at least naturally including a pet project they worked on when they were a senior developer. That which is familiar is simple to them. Trying to prise an implementation detail away from an architect can be a dangerous political game. So long as the code strokes the ego of the most senior staff members, it will survive.

1

Single-header C++ libraries are often thought of fondly for this reason.