Posts Tagged ‘Dialyzer’

Your tests don’t tell you what you think they do

Tuesday, June 26th, 2018

Yesterday I wrote a tiny JSON encoder/decoder in Erlang. While the Erlang community wasn’t in dire need of yet another JSON parser, the ones I saw around do things just a tiny bit differently than I want them to and writing a module against RFC-8259 isn’t particularly hard or time consuming.

Someone commented on (gasp!) the lack of tests in that module. They were right. I just needed the module to do two things, the code is boring, and I didn’t write tests. I’m such a rebel! Or a villain! Or… perhaps I’m just someone who values my time.

Maybe you’re thinking I’m one of those coding cowboys who goes hog wild on unsafe code! No. I’m not. Nothing could be further from the truth. What I have learned over the last 30 years of fiddling about with software is that hand-written tests are mostly a waste of time.

Here’s what happens:

  1. You write a new thingy.
  2. You throw all the common cases at it in the shell. It seems to work. Great!
  3. Being a prudent coder you basically translate the things you thought to throw at it in the shell into tests.
  4. You hook it up to an actual project you’re using somewhere — and it breaks!
  5. You fix the broken bits, and maybe add a test for whatever you fixed.
  6. Then other people start using it in their projects and stuff breaks quite a lot more ZOMG AHHH!

Where in here did your hand-written tests help out? If you write tests to define the bounds of the problem before you actually wrote your functions then tests might help out quite a lot because they deepen your understanding of the problem before you really tackle it head-on. Writing tests before code isn’t particularly helpful if you already thoroughly understand the problem and just need something to work, though.

When I wrote ZJ yesterday I needed it to work in the cases that I care about — and it did, right away. So I was happy. This morning, however, someone else decided to drop ZJ into their project and give it a go — and immediately ran into a problem! ZJ v0.1.0 returns an error if it finds trailing commas in JSON arrays or objects! Oh noes!

Wait… trailing commas aren’t legal in JSON. So what’s the deal? Would tests have discovered this problem? Of course not, because hand-written tests would have been bounded by the limits of my imagination and my imagination was hijacked by an RFC all day yesterday. But the real world isn’t an RFC, and if you’ve ever dealt with JSON in the wild that you’re not generating you’ll know that all sorts of heinous and malformed crap is clogging the intertubes, and most of it sports trailing commas.

My point here isn’t that testing is bad or always a waste of time, my point is that hand-written tests are themselves prone to the exact same problems the code being tested is: you wrote them so they carry flaws of implementation, design and scope, just like the rest of your project.

“So when is testing good?” you might ask. As mentioned earlier, those cases where you are trying to model the problem in your mind for the first time, before you’ve written any handling code, is a great time to write tests for no other reason than they help you understand the problem. But that’s about as far as I go with hand-writing tests.

The three types of testing I like are:

  • type checks
  • machine generated (property testing)
  • real-world (user testing)

A good type checker like Dialyzer (or especially ghc’s type system, but that’s Haskell) can tell you a lot about your code in very short order. It isn’t unusual at all to have sections of code that are written to do things that are literally impossible, but you wouldn’t know about until much later because, due simply to lack of imagination, quite often hand-written tests would never have executed the code, or not in a way that would reveal the structural error.
Typespecs: USE THEM

Good property testing systems like PropEr and QuickCheck generate and run as many tests as you give them time to (really, it is just constrained by time and computing resources), and once they discover breakages can actually fuzz the problem out to pinpoint the exact failing cases and very often indicate the root cause pretty quickly. It is amazing. If you ever experience this you’ll never want to hand write tests again.
Property Testing: USE IT

What about user testing? It is simply necessary. You’ll never dream up the insane stuff to try that users will, and neither will a property-based test generation system. Your test and development environment will often bear little resemblance to your users’ environments (a few weirdos out there still use Windows!), the things you might think to store in your system will rarely look anything like the sort of stuff they will wind up storing in it (you were thinking text, they were thinking video), and the frequency of operation that you assumed might look realistic will almost never been anywhere close to the mark (your one-off utility program that you assumed would run in isolation initiated by a user command in ~/bin/ may become the core part of a massively parallelized service script executed every minute by a cron job running as root).
Your Users: COMMUNICATE WITH THEM

Ultimately, hand-written tests tend to reveal a lot more about the author of the tests than the status of the software being tested.

Pure Declarations in Erlang

Monday, December 21st, 2015

Over the last year or so I’ve gone back and forth in my mind and in discussions with other Erlangers about type systems in Erlang, or rather, I’ve been going back and forth about its lack of one and the way Dialyzer acts as our bandaid in this area. Types are useful enough that we need Dialyzer, but the pursuit of functional puritanism gets insane enough that its simply not worth it in a language intended for real-world production use, especially in the messy, massively concurrent, let-it-crash, side-effecty, message-centric world of Erlang.

But… types and pure functions are still really useful and setting a goal of making as much of a program as possible into provable, bounded, typed, pure functions tends to result in easy to understand, test and maintain code. So there is obviously some stress here.

What I would like to do is add a semantic that the compiler (or Dialyzer, but would prefer this be a compiler check, tbh) be aware of what functions are pure and which are not. The way I would do this is by using a different “arrow”, in particular the Prolog-style declaration indicator: :-

[Edit after further discussion…] What I would like to do is add a directive that Dialyzer can interpret according to a simply purity rule. Adding this to Dialyzer makes more sense than putting it in the compiler — Dialyzer is already concerned with checking; the compiler is already concerned with compiling.

The directive would be -pure(Name/Arity) (a compliment to -spec). The rule would be very simple: only guard-permissible BIFs and other pure functions are legal from within the body of a pure function. This is basically just an extension of the current guard rule (actually, I wonder why this version isn’t already the guard rule… other than the fact that unless something like this is implemented the compiler itself wouldn’t have any way of checking for purity, so currently it must blindly accept a handful of BIFs known to be pure and nothing else).

For example, here is a pure function in Erlang, but neither the compiler nor Dialyzer can currently know this:

-spec increment(integer()) -> integer().
increment(A) ->
    A + 1.

Here is the same function declared to be pure:

-pure(increment/1).
-spec increment(integer()) -> integer().
increment(A) ->
    A + 1.

Pretty simple change.

“ZOMG! The whold standard library!” And yes, this is true — the whole thing is out. Except that the most important bits of it (the data structures like lists, dict, maps, etc.) could be easily converted to pure functions with little more than changing -> to :- adding a single line to the definition.

Any pure function could be strongly typed and Dialyzer could adhere to strong types instead of looser “success types” in these cases. Some code that is currently written to take an input from a side-effecty function, pass it through a chain of non-returning and possible side-effecty functions as a way to process or act on the value, and ultimately then call some side-effecty final output function would instead change to a form where the side-effects are limited to a single function that does both the input and output, and all the processing in-between would be done in pure functions.

This makes code inherently more testable. In the first case any test of the code is essentially an integration test — as to really know how things will work requires knowing at least one step into side effects (and very often we litter our code with side-effects without a second thought, something prayer-style monadisms assist greatly with). In the second case, though, the majority of the program is pure and independently testable, with no passthrough chain of values that have to be checked. I would argue that in many cases such passthrough is either totally unnecessary, or when it really is beneficial passing through in functions is not as useful as passing through in processes — that is to say, that when transformational passthrough is desired it is easier to reason about an Erlang program as a series of signal transformations over a message stream than a chain of arbitrarily side-effecty function calls that collectively make a recursive tail-call (and that’s a whole different ball of wax, totally orthogonal to the issue of functional purity).

Consider what we can know about a basic receive loop:

loop(State) ->
  receive
    {process, Data} ->
        {ok, NewState} = do_process(Data, State),
        loop(NewState);
    {send_state, From} ->
        From ! State,
        loop(State);
    halt ->
        exit(normal);
    Message ->
        ok = log(unexpected, Unexpected),
        loop(State)
  end.

-spec do_process(term(), #state{}) -> {ok, #state{}} | {error, term()}.
do_process(Data, State) :-
    % Do purely functional stuff
    Result.

-spec log(category(), term()) -> ok.
log(Cat, Data) ->
    % Do side-effecty stuff
    ok.

We can see exactly what cases result in another iteration and which don’t. Compare that with this:

loop(State) ->
  receive
    {process, Data}     -> do_process(Data, State);
    {send_state, Asker} -> tell(Asker, State);
    quit                -> exit(normal);
    Message             -> handle_unexpected(Message, State)
  end.

do_process(Data, State) ->
    % Do stuff.
    % Mutually recursive tail call; no return type.
    loop(NewState).

tell(Asker, State) ->
    % Do stuff; another tail call...
    loop(State).

handle_unexpected(Message, State) ->
    ok = log(unexpected, Message),
    % Do whatever else; end with tail call to loop/1...
    loop(NewState).

I like the way the code lines up visually in the last version of loop/1, sure, but I can’t know nearly as much about it as a process. Both styles are common, but the former lends itself to readability and testing while the latter is a real mixed bag. Pure functions would keep us conscious of what we are doing and commit our minds in ways to the definite-return form of code where our pure functions and our side-effecty ones are clearly separated. Of course, anyone could also continue to write Erlang any old way they used to — this would just be one more tool to assist with breaking complexity down and adding some compile-time checking in large systems.

I would love to see this sort of thing happen within Erlang eventually, but I am pretty certain that its the sort of change that won’t happen if I don’t roll up my sleeves and do it myself. We’ve got bigger fish to fry, in my opinion, (and I’ve certainly got higher priorities personally right now!) but perhaps someday…