I’ve written an ebook about Programming Language Concepts Show me!
I thought I’d write a little post about what I’ve learned over the years regarding some characteristics of clean code. “Clean” here refers to features that help code being
- bugfree, or rather “less attracted to bugs”
I think these are the three major aspects developers should ultimately strive for when writing code. The questions that arise from this new (or not so new) awareness is “How to get there?” or “What kind of tools can I use to achieve this?” or “What properties do I have to focus on that will help me get there?”.
There are several more or less abstract concepts, general properties and paradigms that usually help you to create code that is “better” or “cleaner”, i.e. embodies the features mentioned above. I’ll leave out the obvious rules like “keep your functions as short as possible” and instead focus on rules that are somewhat fuzzier, not directly language dependent and thus probably more difficult to realize:
- Keep your code concise. In my opinion this is more important than type safety (see below).
- Avoid scattering side effects all over your code. Put the side effects you need into clearly separated parts of your code. Make sure you understand what side effects are.
- Don’t mix functionalities. Separate knowledge. For example, try separating algorithms from the knowledge of how a specific data structure in your code looks like.
- Make your code also modular on a higher level. Keep related functions together and separate them from those other functions that deal with a different subject.
I will now go a bit more into the details of each of these items.
Keeping your code concise
Isn’t this the same as “Keep your functions short?”. I think it isn’t, because it not only relates to the length of functions, but tries to relate to your code as a whole. It means you want code that is as expressive as possible and avoids boilerplate code altogether ideally. You should strive to write code that embodies your business logic. Whenever you have to write code that…
- …does error/sanity checks or handling in any form…
- …shovels (i.e. converts) data from one data structure to another…
- …repeats itself (e.g. having to mention the type declarations more than once like in Java)…
…then you should develop a feeling like “Well, this is nothing productive that I’m doing here – is it really necessary? Why am I not working on the real business logic instead?”. I mention that so clearly because I think this is a major trap: It is so easy to feel productive when writing boilerplate code or code that just converts one data structure to another because you can type and type and type stuff all the time and thus feel productive. But think about it for a moment: You are writing stuff that should be implicit – I mean it’s 2015, so you should not have to deal much with these more or less dull tasks of shoveling data from data structure A into data structure B or such. You should be able to use your time for focusing on the real business logic – the things that make your software individual and valuable.
Choice of programming language
Unfortunately I have to dive a bit into the choice of the programming language here. I think whether you actually are able to keep your code concise depends in fact a lot on the language you use. Just a few examples:
- In C you of course have to do error and sanity checks a lot as otherwise you not only introduce fatal bugs into your software (the program crashes entirely), you also introduce security vulnerabilities (buffer overflow etc.). Since the language doesn’t do these things on its own, you have to do them explicitly in your code. This is usually a thing you want to avoid, unless you are writing real low level code like a hardware driver or something like that.
Modern approaches like pattern matching and destructuring in today’s programming languages remove a lot of the need for explicit error and sanity checks in locations where you should instead deal with business logic, although you will (probably, depending on how important it is for you) still have to deal with that error on another level, in case the pattern matching or destructuring fails when being tried on data that doesn’t match.
- In Java there is no type inference, meaning you have to write the type declarations over and over again, not just in one place. This might not be understood as such a bad thing as you could argue that it adds clarity to the code, but I think it also worsens readability.
Scala code for example looks much cleaner to me, featuring type inference, thus resulting in shorter code.
- In most object oriented languages you invest a notable amount of time into declaring classes of all kinds – let it be database entities, RPC request or response objects or just simple containers for structured data. On top of that there are interfaces you often need to declare. This is a lot of overhead, usually especially in statically typed object oriented languages.
You can of course argue that this adds to type safety.
However, I don’t feel type safety is worth all this overhead – at least not for every data structure you use in your code. Why? Well, I think in most cases your code benefits more from being concise than from being 100% type safe. Even though all these class and interface declarations are just simple code that might as well have been auto-generated, it still is code that needs to be maintained for as long as it is there. When a change is needed, it might not be much of a hassle to adjust such a data structure, but devs will often tend to trying to find a (maybe less clean) way around touching that class definition – especially if it is very central and critical to large parts of the application. That’s because it feels scary to touch something that might break a lot. I know it’s irrational and just locally optimized, but it happens.
And besides all these general concepts, certain programming languages are able to express the same program in shorter code than others. This means the expressiveness of certain languages is greater than the expressiveness of other languages.
However, to me conciseness is a very important factor. You’d be surprised at how effectively you can implement changes to programs or create new features when focussing on keeping your program concise and avoiding boilerplate code.
Avoid side effects all over the place
You should always be aware of when you are inserting side effects into your code and wonder if you really want to do this right there. Taking a step back, a side effect is when from within a function, you access (read or write) anything outside of the function. That includes for example:
- any I/O, disk, database or network access from within a function or method
- accessing global variables (that you should avoid anyway) from within any function or method
- accessing object instance or class variables from within an object method
- accessing the free variables from within a closure
I’d consider the first two variations much worse than the last two, since you usually use the last two deliberately and not just by being careless. Another reason for the last two being “less bad” than the first two is that the access to object variables and free variables from an object method or closure respectively is still very local in nature, while accessing global variables or doing I/O is not.
Side effects often make unit testing a pain. When doing I/O for example, you will have to inject some mock object into the object under test. Avoiding side effects also means that you won’t have to make use of mocking when writing your unit tests.
Side effects ties a function or method to it’s surroundings and environment. This implies it makes your code less reusable. In case of an object method or closure this might not be as bad as for other side effects, since you would probably want to reuse the entire class and closure anyway, including the free variables that the code accesses. Accessing a global variable or I/O stream for example means that your code relies on the fact that these entities exist. So when there comes a time in which you want to reuse your code, you won’t be able to too easily.
Also keep the following in mind: When function A is free of side effects and function B has side effects, when you insert a call to B into A, function A then also becomes a function that has side effects.
Once more a note regarding programming languages: You can find languages that more or less enforce you to separate your “side effect afflicted” code from your code that is free of side effects (e.g. Haskell), but that may at times also turn out to be very restrictive in cases when you have good reasons to make an exception from the rule. So the best is to adopt it into your programming habit instead of relying on languages’ rules.
Build your code to separate knowledge about different aspects of your program and data. The best example we’ve probably all come across is the implementation of a sorting algorithm. There are many different sorting algorithms with different properties, and they can be quite complex to implement. A sorting algorithm needs to be able to compare two pieces of data to each other to determine which one of them is considered “greater” or “less”. It would be unwise to implement a sorting algorithm that is tied to comparing a certain data type, as you would only be able to use it on that very type and not on others. Instead of implementing that sorting algorithm many times for many data types, it is certainly wiser to separate comparison knowledge from sorting knowledge.
How can you do that? Well in the object oriented world there are interfaces that for example represent a Comparable entity. In the functional world there are – well – functions that can be passed into the sorting algorithm and that the sorting algorithm applies to two data instances to find out which one of them is greater or less.
Sounds easy and obvious, but it may not always be that easy to realize. It often needs some hard thinking to become aware of that you are currently mixing multiple entities, subjects or such and that separating them might break down the problem into pieces that result in more reusable, more elegant and conciser code than before. So think about your algorithms and be aware of the entities involved.
When having several functions that deal with the same kind of data, it usually makes sense to group these together to form a “module” or “namespace”. This in turn helps you to make these functions as a whole more reusable than they would be otherwise.
It’s not always clear which topics to group functions by, as there might be more than one topic that you could group them around. Unfortunately, these topics tend to be orthogonal to each other, meaning that grouping functions around one topic contradicts grouping them around the other. However, there is no generic answer to this, as it always depends a lot on what the details are. Try to think about the usefulness of the module as a whole: Does grouping the functions in one way result in a module that is far more useful than the other? A relevant term here is again: Reusability (i.e. the module’s independence from it’s surroundings).
By the way – having more than one obvious way how to group functions into a module might also mean that the functions you have are not yet really responsible for only one thing. It sure makes sense to ask yourself that when coming to a point at which you are not sure how to call the module and which functions to put into it.
This article does of course not represent an unarguable truth, neither claims it to be complete. There are many more aspects to all this and there are for sure some that other people assess differently. I just try to shed some light on my thoughts and on some things that I think I’ve learned over years. Maybe some people find one or another idea in this text that helps them to get things straight.
Comments are welcome.
I’ve written an ebook about Programming Language Concepts Show me!