Do One Thing and Do it Well

For as along as I can remember, I have been using Unix-like systems; Linux in particular. In fact, I only encountered Windows when I got to UCT in 2009. It served me well because I was able to do everything I needed to do and more. But I always found myself drawn back to Linux. However, this is not about Linux vs Windows. Frankly, I find that topic boring now. As an experienced Engineer, I now use the tool that is best for the job at hand.

I mentioned that because I want to talk about one of the most important lessons in software that I learned in the Unix/Linux world. Of course this lesson applies to all software. It is just one of the pillars of the strong Linux ecosystem. It is the lesson that your software should do one thing and do it well.

I want to unpack this lesson, or rule if you will. Because it confuses a lot of people in the industry. You hear people saying things like "Software cannot be useful unless it is versatile". That sentence as a response to "Do one thing and do it well" is a sign of misunderstanding that rule to me. I am not sure how people interpret this great rule so I want to  share my understanding and use of it.

What is 'one thing'?

In a world that is increasingly being ran and controlled by software, what counts as 'one thing'? I have found that good rules are those that leave a lot to interpretation. And this is one of the good ones. 'One thing' is what you decide to be one thing. Obviously, this means you can get this right or you can get it horribly wrong. Knowing boundaries in software is a an art, not a science. Just ask anyone building micro-services.

I will not even make an attempt to formalise what one thing is. What I can say is that when building software, prefer small self-contained building blocks over do-it-all giant blocks. When looking at your software at every level; package, module, function etc. ask yourself what it does. If your answer contains more than one 'and' then take a good look at what you have done and see if it can be done better.

What is the point?

Why am I even writing about this? Well, I use a lot of open source software in my job. Open source software is one of the most amazing things about us as humans. People write incredible software and then give it to the world for free to use for their own projects. I am incredibly proud to be a contributor to some open source projects.

I just have a few problems with open source, specifically libraries, chief of which is the "I can do it too" syndrome. I don't actually know if such a term exists. If it doesn't you can credit me for coining it.

A lot of open source libraries start with humble beginnings. Sharp focus and dedicated to solving one problem and just that one problem. They are truly within the bounds of doing one thing and doing it well. This causes an increase in their popularity. The authors bask in the limelight and enjoy it a little too much. Then a new library rises for solving a 'similar-but-slightly-different' or just related problem. The new library seems shinier and the authors of the older library start to envy the authors of the new library. Next thing you know, the older library can now solve both problems.

A big sign of this happening is an addition of an 'and' in the description or readme of an old library.

And then?

This is a big problem in industry because it causes a lot of pain for other software developers. People misunderstand and misinterpret documentation and shoot themselves in the foot by using the wrong tool for the job. I am going to start name dropping soon.

I have always noticed this 'I can do it too' happening in open source libraries and tools (for lack of a better name?). Recently, I noticed it in Kafka. If you don't know what Kafka is, do yourself a favour and read about it. It is an amazing streaming tool originally built by LinkedIn. It is known for having throughput values that blow everything else alike out of of the water. I won't quote any numbers here though.

Kafka started as a streaming platform and it is brilliant at that. Then at some point, I am not sure why, it started to become an everything message bus. Now, calm down, I know my terminology might not be accurate here. Let me explain what I mean. Kafka started to add features that make it usable for queueing. Big whoop you think? The problem I have with that is that it is very clear that it was not designed for that.

I also notice this in projects that start off as tools that can do all related tasks at the same time. An example of this is marshmallow. If you open the documentation, it claims to do 'simple object serialisation'. However, when you actually use it you find that it is more useful for validation (which it doesn't actually mention in its description) than serialisation.

Okay, so?

The projects I mentioned above are very popular and very useful. They are running in some of the most useful software around the world. I think project authors for software that becomes popular have the responsibility to guide other software developers by writing accurate documentation. The description of what the software was designed for and what it does. They should strive to keep their software focused on one task and doing it well. They should allow other libraries to enjoy the limelight for doing something that their software can do as well.

My engineering background, which is slightly different to the science background of most computer science software developers, has taught me to ask myself "should I" rather than "can I". These projects seem to have asked "can I" at one point and there is always a chance you can. Then you end up doing more than one thing. This bloats and complicates not only your library or tool but the software of the systems using your library as well. It makes designing software difficult as people start to use Kafka for things that RabbitMQ does better.

Suggestions?

I encountered an awesome project by Aphyr which tests distributed software tools and the claims they make in their documentation. It is called a Jepsen test. After each test he makes suggestions to the authors to either improve documentation by removing claims that he has found to be false or to fix bigs that make those claims false.

I think it would benefit the software community if someone much smarter than I would start something similar for the most popular open source libraries.

Echo

Doing one thing and doing it well is important in the design of any system. It is good not only for the reason I mentioned above but also for ease of maintenance. Being able to easily to replace a validation layer of your web application without disturbing the serialisation/deserialisation layer is very important for good software design.

At every layer of your software ask yourself what it does. Modify it until you can describe what it does without having an 'and' in the sentence, worst case have one 'and'.

You can hit me up on twitter to discuss.

Ciao for now!