Pure Declarations in Erlang

Over the last year or so I’ve gone back and forth in my mind and in discussions with other Erlangers about type systems in Erlang, or rather, I’ve been going back and forth about its lack of one and the way Dialyzer acts as our bandaid in this area. Types are useful enough that we need Dialyzer, but the pursuit of functional puritanism gets insane enough that its simply not worth it in a language intended for real-world production use, especially in the messy, massively concurrent, let-it-crash, side-effecty, message-centric world of Erlang.

But… types and pure functions are still really useful and setting a goal of making as much of a program as possible into provable, bounded, typed, pure functions tends to result in easy to understand, test and maintain code. So there is obviously some stress here.

What I would like to do is add a semantic that the compiler (or Dialyzer, but would prefer this be a compiler check, tbh) be aware of what functions are pure and which are not. The way I would do this is by using a different “arrow”, in particular the Prolog-style declaration indicator: :-

[Edit after further discussion…] What I would like to do is add a directive that Dialyzer can interpret according to a simply purity rule. Adding this to Dialyzer makes more sense than putting it in the compiler — Dialyzer is already concerned with checking; the compiler is already concerned with compiling.

The directive would be -pure(Name/Arity) (a compliment to -spec). The rule would be very simple: only guard-permissible BIFs and other pure functions are legal from within the body of a pure function. This is basically just an extension of the current guard rule (actually, I wonder why this version isn’t already the guard rule… other than the fact that unless something like this is implemented the compiler itself wouldn’t have any way of checking for purity, so currently it must blindly accept a handful of BIFs known to be pure and nothing else).

For example, here is a pure function in Erlang, but neither the compiler nor Dialyzer can currently know this:

-spec increment(integer()) -> integer().
increment(A) ->
    A + 1.

Here is the same function declared to be pure:

-pure(increment/1).
-spec increment(integer()) -> integer().
increment(A) ->
    A + 1.

Pretty simple change.

“ZOMG! The whold standard library!” And yes, this is true — the whole thing is out. Except that the most important bits of it (the data structures like lists, dict, maps, etc.) could be easily converted to pure functions with little more than changing -> to :- adding a single line to the definition.

Any pure function could be strongly typed and Dialyzer could adhere to strong types instead of looser “success types” in these cases. Some code that is currently written to take an input from a side-effecty function, pass it through a chain of non-returning and possible side-effecty functions as a way to process or act on the value, and ultimately then call some side-effecty final output function would instead change to a form where the side-effects are limited to a single function that does both the input and output, and all the processing in-between would be done in pure functions.

This makes code inherently more testable. In the first case any test of the code is essentially an integration test — as to really know how things will work requires knowing at least one step into side effects (and very often we litter our code with side-effects without a second thought, something prayer-style monadisms assist greatly with). In the second case, though, the majority of the program is pure and independently testable, with no passthrough chain of values that have to be checked. I would argue that in many cases such passthrough is either totally unnecessary, or when it really is beneficial passing through in functions is not as useful as passing through in processes — that is to say, that when transformational passthrough is desired it is easier to reason about an Erlang program as a series of signal transformations over a message stream than a chain of arbitrarily side-effecty function calls that collectively make a recursive tail-call (and that’s a whole different ball of wax, totally orthogonal to the issue of functional purity).

Consider what we can know about a basic receive loop:

loop(State) ->
  receive
    {process, Data} ->
        {ok, NewState} = do_process(Data, State),
        loop(NewState);
    {send_state, From} ->
        From ! State,
        loop(State);
    halt ->
        exit(normal);
    Message ->
        ok = log(unexpected, Unexpected),
        loop(State)
  end.

-spec do_process(term(), #state{}) -> {ok, #state{}} | {error, term()}.
do_process(Data, State) :-
    % Do purely functional stuff
    Result.

-spec log(category(), term()) -> ok.
log(Cat, Data) ->
    % Do side-effecty stuff
    ok.

We can see exactly what cases result in another iteration and which don’t. Compare that with this:

loop(State) ->
  receive
    {process, Data}     -> do_process(Data, State);
    {send_state, Asker} -> tell(Asker, State);
    quit                -> exit(normal);
    Message             -> handle_unexpected(Message, State)
  end.

do_process(Data, State) ->
    % Do stuff.
    % Mutually recursive tail call; no return type.
    loop(NewState).

tell(Asker, State) ->
    % Do stuff; another tail call...
    loop(State).

handle_unexpected(Message, State) ->
    ok = log(unexpected, Message),
    % Do whatever else; end with tail call to loop/1...
    loop(NewState).

I like the way the code lines up visually in the last version of loop/1, sure, but I can’t know nearly as much about it as a process. Both styles are common, but the former lends itself to readability and testing while the latter is a real mixed bag. Pure functions would keep us conscious of what we are doing and commit our minds in ways to the definite-return form of code where our pure functions and our side-effecty ones are clearly separated. Of course, anyone could also continue to write Erlang any old way they used to — this would just be one more tool to assist with breaking complexity down and adding some compile-time checking in large systems.

I would love to see this sort of thing happen within Erlang eventually, but I am pretty certain that its the sort of change that won’t happen if I don’t roll up my sleeves and do it myself. We’ve got bigger fish to fry, in my opinion, (and I’ve certainly got higher priorities personally right now!) but perhaps someday…

Leave a Reply

Your email address will not be published. Required fields are marked *