Your tests don’t tell you what you think they do

Yesterday I wrote a tiny JSON encoder/decoder in Erlang. While the Erlang community wasn’t in dire need of yet another JSON parser, the ones I saw around do things just a tiny bit differently than I want them to and writing a module against RFC-8259 isn’t particularly hard or time consuming.

Someone commented on (gasp!) the lack of tests in that module. They were right. I just needed the module to do two things, the code is boring, and I didn’t write tests. I’m such a rebel! Or a villain! Or… perhaps I’m just someone who values my time.

Maybe you’re thinking I’m one of those coding cowboys who goes hog wild on unsafe code! No. I’m not. Nothing could be further from the truth. What I have learned over the last 30 years of fiddling about with software is that hand-written tests are mostly a waste of time.

Here’s what happens:

  1. You write a new thingy.
  2. You throw all the common cases at it in the shell. It seems to work. Great!
  3. Being a prudent coder you basically translate the things you thought to throw at it in the shell into tests.
  4. You hook it up to an actual project you’re using somewhere — and it breaks!
  5. You fix the broken bits, and maybe add a test for whatever you fixed.
  6. Then other people start using it in their projects and stuff breaks quite a lot more ZOMG AHHH!

Where in here did your hand-written tests help out? If you write tests to define the bounds of the problem before you actually wrote your functions then tests might help out quite a lot because they deepen your understanding of the problem before you really tackle it head-on. Writing tests before code isn’t particularly helpful if you already thoroughly understand the problem and just need something to work, though.

When I wrote ZJ yesterday I needed it to work in the cases that I care about — and it did, right away. So I was happy. This morning, however, someone else decided to drop ZJ into their project and give it a go — and immediately ran into a problem! ZJ v0.1.0 returns an error if it finds trailing commas in JSON arrays or objects! Oh noes!

Wait… trailing commas aren’t legal in JSON. So what’s the deal? Would tests have discovered this problem? Of course not, because hand-written tests would have been bounded by the limits of my imagination and my imagination was hijacked by an RFC all day yesterday. But the real world isn’t an RFC, and if you’ve ever dealt with JSON in the wild that you’re not generating you’ll know that all sorts of heinous and malformed crap is clogging the intertubes, and most of it sports trailing commas.

My point here isn’t that testing is bad or always a waste of time, my point is that hand-written tests are themselves prone to the exact same problems the code being tested is: you wrote them so they carry flaws of implementation, design and scope, just like the rest of your project.

“So when is testing good?” you might ask. As mentioned earlier, those cases where you are trying to model the problem in your mind for the first time, before you’ve written any handling code, is a great time to write tests for no other reason than they help you understand the problem. But that’s about as far as I go with hand-writing tests.

The three types of testing I like are:

  • type checks
  • machine generated (property testing)
  • real-world (user testing)

A good type checker like Dialyzer (or especially ghc’s type system, but that’s Haskell) can tell you a lot about your code in very short order. It isn’t unusual at all to have sections of code that are written to do things that are literally impossible, but you wouldn’t know about until much later because, due simply to lack of imagination, quite often hand-written tests would never have executed the code, or not in a way that would reveal the structural error.
Typespecs: USE THEM

Good property testing systems like PropEr and QuickCheck generate and run as many tests as you give them time to (really, it is just constrained by time and computing resources), and once they discover breakages can actually fuzz the problem out to pinpoint the exact failing cases and very often indicate the root cause pretty quickly. It is amazing. If you ever experience this you’ll never want to hand write tests again.
Property Testing: USE IT

What about user testing? It is simply necessary. You’ll never dream up the insane stuff to try that users will, and neither will a property-based test generation system. Your test and development environment will often bear little resemblance to your users’ environments (a few weirdos out there still use Windows!), the things you might think to store in your system will rarely look anything like the sort of stuff they will wind up storing in it (you were thinking text, they were thinking video), and the frequency of operation that you assumed might look realistic will almost never been anywhere close to the mark (your one-off utility program that you assumed would run in isolation initiated by a user command in ~/bin/ may become the core part of a massively parallelized service script executed every minute by a cron job running as root).

Ultimately, hand-written tests tend to reveal a lot more about the author of the tests than the status of the software being tested.

4 thoughts on “Your tests don’t tell you what you think they do

  1. I don’t know…
    I agree about type specs. I’d like to use property-based testing more at my work. Communication with users is an essential skill.
    However, for me, unit tests also serve the purpose for future me and show my intent to users.
    When someone showed a trailing comma not working, I could write a test “rejects trailing commas” and show it to them: “you see, that is by design – find a different JSON parser”.
    If I decide to support it, I’ll write a test “supports trailing commas” and when I am refactoring in the future, I am alerted if I break it. Since it is not even part of RFC, I am more likely to forget about it and break backward compatibility without the test.
    I do understand that unit tests don’t solve all problems and especially not those you mentioned in the article. However, they are quick and easy to write that in 99,9% cases the value of having them outweighs the time commitment.
    It sucks though that users called you out on not writing them. If they want tests, they might write them and make a PR instead of shaming library author.

    1. Thanks for dropping by, Tomasz.

      Good point, and I agree with you — especially the case where hand-written tests are essentially acting as documentation that can enforce itself. That sort of thing is much more useful than tricking yourself into thinking everything works as expected just because some coverage tool told you that you have 100% coverage.

      My thoughts on testing are a bit mixed, really — the religion of testing is clearly rotten, and I wrote this post to push back against it. On the other hand there are clearly very useful cases where a hand-written test is extremely valuable across the life of a project.

      Whenever there is some gap between a standard’s specification and the real world (which is obviously the case with the web!) then you wind up having programs that must be specified somewhere outside the official standards — but keeping track of (or even discovering) the differences between the standards and the real-world program spec is a huge burden. In this case self-enforcing notifiers is really what those tests are: they aren’t compliance tests or even functionality tests, they are documentation that can get in your face and alert you that you forgot some exceptional case you intended to cover that isn’t in any spec anywhere (so the guy writing the property-based tests might not even be aware of it!).

      Hm… The sentiment in the post stands, but there are clearly many angles. I’ll revisit this subject eventually in a bit more depth.

  2. I agree totally. Dogmatic insistence on testing is one of the many annoyances of modern software engineering culture. Like all dogmas, those who advocate it are usually light on nuance or details. When is testing worthwhile and when is it not worthwhile? (E.g., is it worth writing tests for certain problems where the tests take 10x longer than the original code?) How do you test certain things? What kind of tests are most valuable? What are the principles of good testing, analogous to the principles of software design?

    I have struggled to find this information online, though I found many, many blog posts extolling the virtues of testing and the unprofessionalism of software devs who don’t maintain 100% test coverage.

    TDD is even worse. It works for certain problems and not at all for others. It is also very dependant on one’s own thinking style and existing knowledge. However, TDD advocates not only dogmatically assert the superiority of TDD, they also constantly conflate testing in general with TDD. That is, if you don’t use TDD, you must not be writing tests. If you don’t write tests, your code will be unreliable, therefore you should follow TDD. If you don’t follow TDD, you are therefore an unprofessional developer (in 2018 web dev logic).

    I enjoy your blog a lot by the way. Your mercenary posts were very interesting. I’ve travelled a lot (though doing nothing more stressful than managing outsourced devs), and it’s interesting to read your perspective.

    1. It would be an interesting and useful thing to develop guidelines for when to commit to or refrain from chasing coverage. Along with that it should be possible to identify how much testing might actually matter — the criticality of the code in question.

      In a work project it always should be a business decision (well, it always is a business decision, whether it was made consciously or not) because time is the most inflexible thing, and you’re always paying for developer time anyway. In an unpaid project it can be left up to preference, and on some unpaid projects (especially FOSS ones that are like to be publicly read or referenced quite a bit) I would assume that even rather unimportant parts of the code might need tests if for no other reason than to clearly explain what the code is supposed to do.

      The most deleterious issue with ideology is the last bit you pointed out: that little cults tend to manifest around certain ideas and then, for some reason, whatever the central tenet of the cult is becomes a moralism. OOP was that way for a while “Oh, that’s not OOP, so it must be bad” and that’s “bad” in a moral sense, not a technical one. “You should shun this code/coder/org/platform because they don’t adhere to ___” defines piety/heresy, not competence/incompetence, and this is really stupid to me.

      As is so common in software, once you’ve been around a while you tend to realize the answer to any broad, simple question that lacks context is almost always “It depends”.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.