(Personal) Guidelines for Software Projects

A few guidelines for non-trivial, large projects you actually care about and want to maintain for more than a month or so.

1. Typespecs

Learn to use them. If you are writing a large, complex project in a language that doesn’t support this or have tooling for it then use a different language. Yes, it actually saves so much heartache that it is important enough to switch.

Why? Because for-real type checking can tell you, without the futility or religious interference of unit testing, whether or not your program is valid. A valid program is not necessarily a correct program, but an invalid program is necessarily an incorrect one. (Also, it is worth keeping in mind that classes are not types. There is a subtle, and critical, difference.)

2. Property testing, not unit testing

Don’t simply write a few “unit tests” and assume things work. They don’t. As Rich Hickey (the creator of Clojure) so aptly put it: “What is the one thing that is true about all bugs found in the wild? Every one of them passed all the tests!” It can be useful to engage in regression testing, but regression testing is a subset of integration testing and even crosses over with user testing (the ultimate of all) and project documentation and history management.

When you write code, it has bugs.

  • Some are syntactic: You forgot some ant poop somewhere (things like: : ; . ,), failed to close a brace or paren, or misspelled something.
  • Some are structural: You passed in a foo type but the function is defined as accepting bar (statistically this is the greatest category of compilable, invisible errors — reference point 1 above).
  • Some are scheduling and timing: You have races and deadlocks all over the place and never knew it because they don’t usually get triggered and are super complex to work out in your head.
  • Some are semantic: The program does precisely what you told it to do, but you told it to do the wrong thing (the most frequent place where protocol failures creep in).

You write every one of these kinds of bugs into your programs every time you write a non-trivial program. I can’t just tell you to knock it off and tighten your shot group because I do the same stuff because it is impossible to avoid! If you write all these stupid bugs into your programs, what do you think lurks in your hand-written test code? MORE BUGS!

So what do?

In the same way that we can write a type specification for a function (declare its domain and codomain, basically) we can also write a specification for the function’s valid inputs, and outputs and the expected rules the output should follow (its range and image, basically). This defines the properties of the function.

Neat-O. But what would we do with such a specification? Property declarations are like me explaining to you what a function does, but not how it manages to do it. To test whether our implementation of the function does the expected thing and lacks corner cases, however, we can use a property-based testing system to generate tests for us on the fly and run them to check whether the expected properties of the function hold true. Not only that, smart property based testing systems not only find bugs (values that are defined as valid but produce invalid results that violate the property specification) but can quite often home in on specific broken cases and give you a good indication what sorts of values are problematic. That is to say, a property-based testing engine equipped with good property definitions can locate the corner cases for you.

Why wouldn’t we do this by hand? Because typically unit tests cover a handful of most-common cases with their expected values and that’s about it. Property based testing is much less merciful and also much less prone to error because a property based tester will generate an endless stream of tests according to the provided properties and run them for as much CPU time as you’re willing to give for testing. You are never going to write millions of different test cases for your code. A property based testing engine will do precisely that if you give it the CPU time to do so. Compared to how testing is done in most projects this is like having nuclear power in the age of wooden stoves.

This is magical.

3. DO USER TESTING

When you release something that has worked for you so far, that’s about as much confidence as you should put in an alpha release. “Works for me!” are the bold last words of many an abandonned project.

Don’t be That Guy. Don’t release That Project as a final. Be clear its a beta or even alpha, and development is an ongoing thing, forever. Manage expectations, your users (paying or community) will reward you for being honest.

When you release a project understand that this is your beta period, even if you’re on a relatively mature version. In a sense all significant features go through their own little beta phase. This is true in part because you’ve no clue if power users are going to find a way to break it (they will) or if it will be instantly appreciated and adopted by the userbase (random gamble there). Whatever you think is important or intuitive might have never even occurred to them.

Power users are going to push the button the wrong way and don’t know how to deal. That’s actually a good thing if you maintain a relationship with your users, because you’re basically getting directions straight from the affected party about how to make your program better. This is important whether you’re doing community open source for some sweet Ego Points, or trying to feed the kids at your soul-crushing job.

No amount of unit testing (which we’ve already sort of debunked — write typespecs, don’t blindly churn out unit tests) or property testing (which is vastly superior to unit testing, but misses a lot of side-effecty issues, which are often the central purpose of your program) can catch everything. No amount of integration testing will uncover everything that is wrong with your program. None of these tests will tell you whether your program sucks to use and is reviled by users. But user testing will.

4. Don’t be afraid to change stuff

You have a version control system for your code. You use git. Or something. It doesn’t matter, though, because you have something that does version control for you and creating a new branch is painless. (Unless you’re not using a version control system… then you really need to start. You don’t have to submit to the dark cabal of Ruby hipsters that controls github, but you should at least be using git locally.)

If you have an idea try it out. It is probably a great idea in spirit but won’t be so great in reality until you’ve shaken a bit of the stupid, self-indulgent fantasy out of it. You can’t do that without exploring the idea in actual implementation and that sort of exploration requires hacking up your pristine project a bit until you discover exactly why, in mechanical terms, the Universe hates your idea. Once you know exactly why the Universe hates you and your ideas you can adjust your plan to accommodate the whims of the math gods, tame the vagaries of digital magika, and tap out the proper incantations in much less time than you could had you just held endless meetings about it.

Break stuff. Remember the Cardinal Rule of Hacking:

“If you understand what you’re doing, you’re not learning anything.”
– Some guy (who was not actually Abraham Lincoln)

Sometimes the best sign of progress is a change in the error messages you are getting.

Simplicity follows complexity. Until you write a godawful fugly version of your solution you don’t really understand the problem. If you don’t fully grok the problem how can you ever hope to come up with a solution? Only after you have encountered all the little gotchas that made the code ugly in the first place are you ready to rewrite that steaming pile of (working) poo into an elegant solution that is almost guaranteed to have fewer bugs if for no other reason than increased transparency and better organization of the code.

(But note that you could stick with the ugly version for a bit in a pinch — so not all is lost. Getting something working at all is better than having a bunch of great ideas that don’t exist in reality.)

5. Don’t be afraid of new languages

At this point in my life I’ve written code in about 30 or 40 languages. I don’t know the exact number. I have written a lot of code and gained intimacy with about 10 of those. That’s a lot of languages by some standards and not many at all by others. It is enough, though, that I have come to realize that most languages are minor syntactic variations on a couple of basic paradigms, and really none of that crap matters too much.

It’s all shitty. All languages suck. Some suck a little less than others. Try to find one from the handful that sucks dramatically less than others in a specific domain, then get comfortable with it as a go-to tool for that domain. But remember that it is just a tool. Jackhammers are tools, but I don’t see anyone building houses with them.

When you hop on to a new project that someone is already working on you’re going to have to pretty much adhere to the rules of their house, and that means dealing with whatever annoying language they wrote their awesome project in.

Want to hack on Freenode‘s core implementation? Better not mind dealing with network code and file operations in Java (eek!). And what if you don’t even know Java or it has been years since you saw it last and everything is different now? This is the concern that should worry you the least of all.

If you squint a little projects basically are languages. They have their own semantics (the project libs, its functions, it type specification, its class definitions, its decision tables, its… whatever its got that is relevant). They have their own sort of syntax. In fact, every very large project I’ve ever worked on tended to actually follow Greenspun’s Tenth Rule and if it was a concurrent system (so common today) they even tend to follow Virding’s First Rule. (That becomes less of a joke and more of a law of nature the longer you do this and the more you know about both lisp and OTP.)

What does this mean? It means that learning the language a program is written in is the easy part. Learning the libs of that language tend to take about twice as long as learning the language itself. Learning the internals of a large project, however, tend to take about ten times longer than that. So where is the real cost in effort here? It isn’t in the adoption of a new language. It is in the adoption of a new project because every project is a tarbaby.

6. JUST OPEN YOUR EDITOR YOU PROCRASTINATING SACK OF POO!

Getting started is the hardest part of writing anything, whether prose, code, or poetry is sitting down and typing out something.

How to tackle the procrastination problem? Easier said than done: OPEN YOUR EDITOR

3 to 5 letters is all you need: `vim` or `emacs` and away you go!

Once you’re fully in the Matrix, write a function or spec or something. It doesn’t matter what you try to do: it will be wrong. And then you’ll have been wrong, but not exhausted yet. And suddenly you’ll realize that you are the one being wrong on the internet today and that situation just cannot stand. So you’ll start fixing it. And tinkering on it. And before you know it you’ll actually have some something productive, the curse of social media will be temporarily suspended, and you’ll finally stop feeling so crap about yourself (for a few minutes, anyway).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.