Posts Tagged ‘pure functions’

Tiny strings-as-strings JSON in portable Erlang

Monday, June 25th, 2018

There are several JSON libs for Erlang at this point, and as there is no correct mapping between JSON types and Erlang types, all make different tradeoffs that either work or don’t for your project. Beyond that, various interface and implementation differences exist due to the tradeoffs inherent in manipulating elements of the Black Tongue known as lolscript:

  • Accept values to encode as magic tagged tuples so you can specify exactly what you want VS being ambiguous
  • Never allow “naked” values (everything must be in a list/array or a map or a [whatever]) VS “hanging” values
  • Treat all strings ever as binaries because “strings are big” VS treating all strings (and binaries) as strings because strings are easy to manipulate (io_lists…)
  • Decode JSON “objects” as proplists VS decode JSON objects to dicts or maps VS add an “options” argument to the decode function
  • Encode and decode values various ways based on optional switches VS “sane defaults” (aka “works for me”)
  • Achieve lolspeed via NIFs and only work on *nix VS maintain portability via pure Erlang
  • etc.

No combination is correct for every situation, hence the proliferation of libraries. In addition to proliferation, something as simple as what is described by RFC-8259 shouldn’t require a 20k LoC dependency to manage, at least not in Erlang of all languages.

The general strings-as-strings + portability tradeoffs were made by mochiweb years ago, with mochijson2 being the go-to JSON parser for lots of projects. Now that “tuple calls” have finally been retired after years of obsolescence and deprecation, mochijson2 is finally giving up the ghost as well (as it was based on tuple calls). As a replacement that makes mostly the same tradeoffs but is arguably simpler, I wrote a single-module JSON encoder/decoder lib. It treats all strings as strings, is in pure Erlang, and is utterly boring in how simple the code is. Nothing magical to see. At all. So don’t get excited.

If you need to read things in and read things out, in JSON, and don’t really care about lolspeed but want to understand what is happening, then ZJ is for you: ZJ project @ gitlab

Note that if you have roughly the same requirements but you want to make the strings-as-binaries tradeoff then JSX is the lib for you.

 

Pure Declarations in Erlang

Monday, December 21st, 2015

Over the last year or so I’ve gone back and forth in my mind and in discussions with other Erlangers about type systems in Erlang, or rather, I’ve been going back and forth about its lack of one and the way Dialyzer acts as our bandaid in this area. Types are useful enough that we need Dialyzer, but the pursuit of functional puritanism gets insane enough that its simply not worth it in a language intended for real-world production use, especially in the messy, massively concurrent, let-it-crash, side-effecty, message-centric world of Erlang.

But… types and pure functions are still really useful and setting a goal of making as much of a program as possible into provable, bounded, typed, pure functions tends to result in easy to understand, test and maintain code. So there is obviously some stress here.

What I would like to do is add a semantic that the compiler (or Dialyzer, but would prefer this be a compiler check, tbh) be aware of what functions are pure and which are not. The way I would do this is by using a different “arrow”, in particular the Prolog-style declaration indicator: :-

[Edit after further discussion…] What I would like to do is add a directive that Dialyzer can interpret according to a simply purity rule. Adding this to Dialyzer makes more sense than putting it in the compiler — Dialyzer is already concerned with checking; the compiler is already concerned with compiling.

The directive would be -pure(Name/Arity) (a compliment to -spec). The rule would be very simple: only guard-permissible BIFs and other pure functions are legal from within the body of a pure function. This is basically just an extension of the current guard rule (actually, I wonder why this version isn’t already the guard rule… other than the fact that unless something like this is implemented the compiler itself wouldn’t have any way of checking for purity, so currently it must blindly accept a handful of BIFs known to be pure and nothing else).

For example, here is a pure function in Erlang, but neither the compiler nor Dialyzer can currently know this:

-spec increment(integer()) -> integer().
increment(A) ->
    A + 1.

Here is the same function declared to be pure:

-pure(increment/1).
-spec increment(integer()) -> integer().
increment(A) ->
    A + 1.

Pretty simple change.

“ZOMG! The whold standard library!” And yes, this is true — the whole thing is out. Except that the most important bits of it (the data structures like lists, dict, maps, etc.) could be easily converted to pure functions with little more than changing -> to :- adding a single line to the definition.

Any pure function could be strongly typed and Dialyzer could adhere to strong types instead of looser “success types” in these cases. Some code that is currently written to take an input from a side-effecty function, pass it through a chain of non-returning and possible side-effecty functions as a way to process or act on the value, and ultimately then call some side-effecty final output function would instead change to a form where the side-effects are limited to a single function that does both the input and output, and all the processing in-between would be done in pure functions.

This makes code inherently more testable. In the first case any test of the code is essentially an integration test — as to really know how things will work requires knowing at least one step into side effects (and very often we litter our code with side-effects without a second thought, something prayer-style monadisms assist greatly with). In the second case, though, the majority of the program is pure and independently testable, with no passthrough chain of values that have to be checked. I would argue that in many cases such passthrough is either totally unnecessary, or when it really is beneficial passing through in functions is not as useful as passing through in processes — that is to say, that when transformational passthrough is desired it is easier to reason about an Erlang program as a series of signal transformations over a message stream than a chain of arbitrarily side-effecty function calls that collectively make a recursive tail-call (and that’s a whole different ball of wax, totally orthogonal to the issue of functional purity).

Consider what we can know about a basic receive loop:

loop(State) ->
  receive
    {process, Data} ->
        {ok, NewState} = do_process(Data, State),
        loop(NewState);
    {send_state, From} ->
        From ! State,
        loop(State);
    halt ->
        exit(normal);
    Message ->
        ok = log(unexpected, Unexpected),
        loop(State)
  end.

-spec do_process(term(), #state{}) -> {ok, #state{}} | {error, term()}.
do_process(Data, State) :-
    % Do purely functional stuff
    Result.

-spec log(category(), term()) -> ok.
log(Cat, Data) ->
    % Do side-effecty stuff
    ok.

We can see exactly what cases result in another iteration and which don’t. Compare that with this:

loop(State) ->
  receive
    {process, Data}     -> do_process(Data, State);
    {send_state, Asker} -> tell(Asker, State);
    quit                -> exit(normal);
    Message             -> handle_unexpected(Message, State)
  end.

do_process(Data, State) ->
    % Do stuff.
    % Mutually recursive tail call; no return type.
    loop(NewState).

tell(Asker, State) ->
    % Do stuff; another tail call...
    loop(State).

handle_unexpected(Message, State) ->
    ok = log(unexpected, Message),
    % Do whatever else; end with tail call to loop/1...
    loop(NewState).

I like the way the code lines up visually in the last version of loop/1, sure, but I can’t know nearly as much about it as a process. Both styles are common, but the former lends itself to readability and testing while the latter is a real mixed bag. Pure functions would keep us conscious of what we are doing and commit our minds in ways to the definite-return form of code where our pure functions and our side-effecty ones are clearly separated. Of course, anyone could also continue to write Erlang any old way they used to — this would just be one more tool to assist with breaking complexity down and adding some compile-time checking in large systems.

I would love to see this sort of thing happen within Erlang eventually, but I am pretty certain that its the sort of change that won’t happen if I don’t roll up my sleeves and do it myself. We’ve got bigger fish to fry, in my opinion, (and I’ve certainly got higher priorities personally right now!) but perhaps someday…