Posts Tagged ‘erlang’

zUUID: An Example Erlang/OTP Project

Tuesday, February 2nd, 2016

I was talking with a friend of mine yesterday about how UUID v2 seems to have evaporated. We looked into things further and found its not actually included in RFC 4122! One thing led to another and I wound up writing an example project that is yet another UUID generator/utility in Erlang — but this time it actually has duplicate v1 and v2 detection/correction and implements as close to what I can find is defined as UUID version 2 values.

As there are already plenty of UUID projects around I focused on making this one as readable as I possibly could — to include exported documentation, in-source documentation, obvious variable names, full typespecs, my silly little “pure” notation, blatantly obvious bitstring syntax, and the obligatory github presence.

Hopefully some folks newish to Erlang will come along and explain to me what confuses them about that code, the process of writing it, the documentation conventions, etc. so that I can become a better literate programmer. Of course, since the last thing the world needs is another UUID implementation I suppose I would have had better luck with something at least peripherally related to the web. (>.<)

Erlang R18.0 Docs

Thursday, January 21st, 2016

Language sites get knocked offline from time to time, so in addition to the Guile 2. manual I also keep a mirror of the Erlang R18 docs. I’ve never needed to link to them from here, but today has been a rough day around the internet for some reason so here we are. Depending on how long they stay down I might add the latest revision as well.

Pure Declarations in Erlang

Monday, December 21st, 2015

Over the last year or so I’ve gone back and forth in my mind and in discussions with other Erlangers about type systems in Erlang, or rather, I’ve been going back and forth about its lack of one and the way Dialyzer acts as our bandaid in this area. Types are useful enough that we need Dialyzer, but the pursuit of functional puritanism gets insane enough that its simply not worth it in a language intended for real-world production use, especially in the messy, massively concurrent, let-it-crash, side-effecty, message-centric world of Erlang.

But… types and pure functions are still really useful and setting a goal of making as much of a program as possible into provable, bounded, typed, pure functions tends to result in easy to understand, test and maintain code. So there is obviously some stress here.

What I would like to do is add a semantic that the compiler (or Dialyzer, but would prefer this be a compiler check, tbh) be aware of what functions are pure and which are not. The way I would do this is by using a different “arrow”, in particular the Prolog-style declaration indicator: :-

[Edit after further discussion…] What I would like to do is add a directive that Dialyzer can interpret according to a simply purity rule. Adding this to Dialyzer makes more sense than putting it in the compiler — Dialyzer is already concerned with checking; the compiler is already concerned with compiling.

The directive would be -pure(Name/Arity) (a compliment to -spec). The rule would be very simple: only guard-permissible BIFs and other pure functions are legal from within the body of a pure function. This is basically just an extension of the current guard rule (actually, I wonder why this version isn’t already the guard rule… other than the fact that unless something like this is implemented the compiler itself wouldn’t have any way of checking for purity, so currently it must blindly accept a handful of BIFs known to be pure and nothing else).

For example, here is a pure function in Erlang, but neither the compiler nor Dialyzer can currently know this:

-spec increment(integer()) -> integer().
increment(A) ->
    A + 1.

Here is the same function declared to be pure:

-pure(increment/1).
-spec increment(integer()) -> integer().
increment(A) ->
    A + 1.

Pretty simple change.

“ZOMG! The whold standard library!” And yes, this is true — the whole thing is out. Except that the most important bits of it (the data structures like lists, dict, maps, etc.) could be easily converted to pure functions with little more than changing -> to :- adding a single line to the definition.

Any pure function could be strongly typed and Dialyzer could adhere to strong types instead of looser “success types” in these cases. Some code that is currently written to take an input from a side-effecty function, pass it through a chain of non-returning and possible side-effecty functions as a way to process or act on the value, and ultimately then call some side-effecty final output function would instead change to a form where the side-effects are limited to a single function that does both the input and output, and all the processing in-between would be done in pure functions.

This makes code inherently more testable. In the first case any test of the code is essentially an integration test — as to really know how things will work requires knowing at least one step into side effects (and very often we litter our code with side-effects without a second thought, something prayer-style monadisms assist greatly with). In the second case, though, the majority of the program is pure and independently testable, with no passthrough chain of values that have to be checked. I would argue that in many cases such passthrough is either totally unnecessary, or when it really is beneficial passing through in functions is not as useful as passing through in processes — that is to say, that when transformational passthrough is desired it is easier to reason about an Erlang program as a series of signal transformations over a message stream than a chain of arbitrarily side-effecty function calls that collectively make a recursive tail-call (and that’s a whole different ball of wax, totally orthogonal to the issue of functional purity).

Consider what we can know about a basic receive loop:

loop(State) ->
  receive
    {process, Data} ->
        {ok, NewState} = do_process(Data, State),
        loop(NewState);
    {send_state, From} ->
        From ! State,
        loop(State);
    halt ->
        exit(normal);
    Message ->
        ok = log(unexpected, Unexpected),
        loop(State)
  end.

-spec do_process(term(), #state{}) -> {ok, #state{}} | {error, term()}.
do_process(Data, State) :-
    % Do purely functional stuff
    Result.

-spec log(category(), term()) -> ok.
log(Cat, Data) ->
    % Do side-effecty stuff
    ok.

We can see exactly what cases result in another iteration and which don’t. Compare that with this:

loop(State) ->
  receive
    {process, Data}     -> do_process(Data, State);
    {send_state, Asker} -> tell(Asker, State);
    quit                -> exit(normal);
    Message             -> handle_unexpected(Message, State)
  end.

do_process(Data, State) ->
    % Do stuff.
    % Mutually recursive tail call; no return type.
    loop(NewState).

tell(Asker, State) ->
    % Do stuff; another tail call...
    loop(State).

handle_unexpected(Message, State) ->
    ok = log(unexpected, Message),
    % Do whatever else; end with tail call to loop/1...
    loop(NewState).

I like the way the code lines up visually in the last version of loop/1, sure, but I can’t know nearly as much about it as a process. Both styles are common, but the former lends itself to readability and testing while the latter is a real mixed bag. Pure functions would keep us conscious of what we are doing and commit our minds in ways to the definite-return form of code where our pure functions and our side-effecty ones are clearly separated. Of course, anyone could also continue to write Erlang any old way they used to — this would just be one more tool to assist with breaking complexity down and adding some compile-time checking in large systems.

I would love to see this sort of thing happen within Erlang eventually, but I am pretty certain that its the sort of change that won’t happen if I don’t roll up my sleeves and do it myself. We’ve got bigger fish to fry, in my opinion, (and I’ve certainly got higher priorities personally right now!) but perhaps someday…

Iterators? We Don’t NEED No Stinking Iterators!

Tuesday, September 29th, 2015

Every so often a request for “implementation of iterators for maps” over hashes/maps/dicts or some other K-V data structure appears on mailing list for a functional langauge. I’ve spent years making heavy use of iterators in imperative languages, and the way they fit into Python is really great. For Python. I totally understand where some of these folks are coming from, they just don’t realize that functional languages are not where they came from.

So… “Is this post the result of some actual event”? Yeah, you got me. It is. On the erlang-questions mailing list someone asked “Are maps ever going to get an iterator?” Again.

Erlang is definitely not Kansas, but people thinking either that it is or (more dangerously) that it should be and then trying to influence the maintainers to make it that way (and then the powers-that-be getting in a panic over “market share” and doing horrible things to the language…) worries me a bit.

There is no faster way to paint a functional language into a corner than to try making it occasionally imperative. Conversely, consider the syntactic corner C++ and Java have painted themselves into by trying to include functional features as after-thoughts where they really didn’t belong.

(I know, I know, death-by-kitchen-sink is a proud C++ tradition. It is becoming one for Java. Even though I hate Java there is no sense in making it worse by cluttering its syntax and littering it with gotchas and newbie-unfriendly readability landmines in the interest of providing features few Java coders understand the point of, especially when the whole concept of state management in a bondage-and-discipline OOP language like Java is to keep everything in structs with legs (not anonymous closures over state that is temporarily in scope…). The lack of such problems were previously one of the main points that favored Java over C++… well, that and actual encapsulation. Hopefully Rust and D can resist this temptation.)

This frustrates me. It is almost as if instead of picking a tool that matches a given job, people learn one tool and then try over time to make a super-powered Swiss Army knife of it. This never turns out well. The result is more Frankenstein’s Monster than Swiss Army knife and in the best case it winds up being hard to learn, confusing to document and crap at everything.

What’s worse, people assume that the first tool they learned well is the standard by which everything else should be judged (if so, then why are they learning anything else?). It follows, then, that if a newly studied LangX does not have a feature of previously used LangY then it must be introduced because it is “missing”. (I do admit, though, to wishing other languages had pattern matching in function heads… but I don’t bring this up on mailing lists as if its a “missing feature”; I do, however, cackle insanely when overloading is compared with matching.)

Let’s say we did include iterators for maps into Erlang — whatever an “iterator” is supposed to mean in a list-are-conses type functional language. What would that enable?

-spec foreach(fun(), map()) -> ok.

That sort of looks pointless. Its exactly the same as lists:foreach(Fun, maps:to_list(Map)) or maybe lists:foreach(Fun, maps:values(Map)). Without knowing whether we’re trying to build a new map based on the old one or get some side effect out of Fun then its hard to know what the point is.

Maybe:

-spec map(fun(), OldMap :: map()) -> {ok, NewMap :: map()}.

But… wait, isn’t that just maps:map/2 all over again?

I think I know where this is going, though. These people really wish maps were ordered dictionaries, because they want keys to be ordered. So they want something like this:

-spec side_effects_in_order_dammit(fun(), map()) -> ok.
side_effects_in_order_dammit(F, M) ->
    Ordered = [{K, maps:get(K, M)} || K <- lists:sort(maps:keys(M))],
    ok = lists:foreach(F, Ordered).

But wait, what order should the keys be in, anyway?

This is a slow, steady march to insanity. “Give me iterators” begets “Let’s have ordered maps” begets “Let’s have ordered iterators for maps” and so on, and eventually you wind up with most of the Book of Genesis in the Devil’s Bible of Previously Decent Functional Languages. All the while, totally forgetting that these things already exist in another form. There are more data structures than just maps for a reason.

This just gets ridiculous, and it isn’t even what hashes are about to begin with.

Horrible, Drunk Code (but it works and demonstrates a point)

Thursday, September 3rd, 2015

Over on StackOverflow otopolsky was asking about how to make an Erlang program that could read selected lines in a huge file, offset from the bottom, without exploding in memory (too hard).

I mentioned the standard bit about front-loading and caching the work of discovering the linebreak locations, the fact that “a huge text file” nearly always means “a set of really huge log files” and that in this case tokenizing the semantics of the log file within a database is a Good Thing, etc. (my actual answer is here).

He clearly knew most of this, and was hoping that there was some shortcut already created. Well, I don’t know that there is, but it bothered me that his initial stab at following my advice about amortization of linebreak discovery resulted in an attempt to read a 400MB text file in to run a global match over it, and that this just ate up all his memory and made his machine puke. Granted, my initial snippet was a naive implementation that didn’t take size into account at all, but…

400MB? Eating ALL your memory? NO WAY. Something must be done… A call to action!

The main problem is I’m already a bit jeezled up because my wife broke out some 泡盛 earlier (good business day today)… so any demo code I will produce will be, ahem, a little less than massive-public-display worthy (not that the 6 or 7 guys on the planet who actually browse Erlang questions on SO would care, or don’t already know who I am). So… I’m posting here:

-module(goofy).
-export([linebreaks/1]).

linebreaks(File) ->
    {ok, FD} = file:open(File, [raw, read_ahead, binary]),
    Step = 1000,
    Count = 1,
    Loc = 1,
    Indexes = [],
    Read = file:read(FD, Step),
    {ok, Result} = index(FD, Read, Count, Loc, Step, Indexes),
    ok = file:close(FD),
    [{1, Loc} | Result].

index(FD, {ok, Data}, Count, Loc, Step, Indexes) ->
    NewLines = binary:matches(Data, <<$\n>>),
    {NewCount, Found} = indexify(NewLines, Loc, Count, []),
    Read = file:read(FD, Step),
    index(FD, Read, NewCount, Loc + Step, Step, [Found | Indexes]);
index(_, eof, _, _, _, Indexes) ->
    {ok, lists:reverse(lists:flatten(Indexes))};
index(_, {error, Reason}, _, _, _, Indexes) ->
    {error, Reason, Indexes}.

indexify([], _, Count, Indexed) ->
    {Count, Indexed};
indexify([{Pos, Len} | Rest], Offset, Count, Indexed) -> 
    NewCount = Count + 1,
    indexify(Rest, Offset, NewCount, [{Count, Pos + Len + Offset} | Indexed]).

As ugly as that is, it runs in constant space and the index list produced on a 7,247,560 line 614,754,920 byte file appears to take a bit of space (a few dozen MB for the 7,247,560 element list…), and temporarily requires a bit more space during part of the return operation (very sudden, brief spike in memory usage right at the end as it returns). But it works, and returns what we were looking for in a way that won’t kill your computer. And… it only takes a 14 seconds or so on the totally crappy laptop I’m using right now (an old dual-core Celeron).

Much better than what otopolsky ran into when his computer ran for 10 minutes before it started swapping after eating all his memory!

Results:

lceverett@changa:~/foo/hugelogs$ ls -l
合計 660408
-rw-rw-r-- 1 ceverett ceverett       928  9月  3 01:31 goofy.erl
-rw-rw-r-- 1 ceverett ceverett  61475492  9月  2 23:17 huge.log
-rw-rw-r-- 1 ceverett ceverett 614754920  9月  2 23:19 huger.log
ceverett@changa:~/foo/hugelogs$ erl
Erlang/OTP 18 [erts-7.0] [source] [64-bit] [smp:2:2] [async-threads:10] [kernel-poll:false]

Running $HOME/.erlang.
Eshell V7.0  (abort with ^G)
1> c(goofy).
{ok,goofy}
2> {HugeTime, HugeIndex} = timer:tc(goofy, linebreaks, ["huge.log"]).
{1404370,
 [{1,1},
  {2,119},
  {3,220},
  {4,...},
  {...}|...]}
3> {HugerTime, HugerIndex} = timer:tc(goofy, linebreaks, ["huger.log"]).
{14245673,
 [{1,1},
  {2,119},
  {3,220},
  {4,...},
  {...}|...]}
4> HugerTime / 1000000.
14.245673
5> HugeTime / 1000000.
1.40437
6> lists:last(HugeIndex).
{724757,61475493}
7> lists:last(HugerIndex).
{7247561,614754921}

Rather untidy code up there, but I figure it is readable enough that otopolsky can get some ideas from this and move on.

Erlang: Writing Terms to a File for file:consult/1

Sunday, March 8th, 2015

I notice that there are a few little helper functions I seem to always wind up writing given different contexts. In Erlang one of these is an inverse function for file:consult/1, which I have to write any time I use a text file to store config data*.

Very simply:

write_terms(Filename, List) ->
    Format = fun(Term) -> io_lib:format("~tp.~n", [Term]) end,
    Text = lists:map(Format, List),
    file:write_file(Filename, Text).

[Note that this *should* return the atom 'ok' — and if you want to check and assertion or crash on failure, you want to do ok = write_terms(Filename, List) in your code.]

This separates each term in a list by a period in the text file, which causes file:consult/1 to return the same list back (in order — though this detail usually does not matter because most conf files are used as proplists and are keysearched anyway).

An annoyance with most APIs is a lack of inverse functions where they could easily be written. Even if the original authors of the library don’t conceive of a use for an inverse of some particular function, whenever there is an opportunity for this leaving it out just makes an API feel incomplete (and don’t get me started on “web APIs”… ugh). This is just one case of that. Why does Erlang have a file:consult/1 but not a file:write_terms/2 (or “file:deconsult/2” or whatever)? I don’t know. But this bugs me in most libs in most languages — this is the way I usually deal with this particular situation in Erlang.

[* term_to_binary/1 ←→ binary_to_term/1 is not an acceptable solution for config data!]

Erlang: Maps, Comprehensions and Side-effecty Iteration

Saturday, February 28th, 2015

In Erlang it is fairly common to want to perform a side-effecty operation over a list of values, not because you want to collect an aggregate (fold), actually map the input list to the output list (map), or build a new list in some way (list comprehension) but because you just want the side-effects of the operation.

The typical idiom (especially for broadcasting messages to a list of processes) is to use a list comprehension for this, or sometimes lists:map/2:

%% Some procedural side-effect, like spawning list of processes or grabbing
%% a list of resources:
[side_effecty_procedure(X) || X <- ListOfThings],

%% Or, for broadcasting:
[Pid ! SomeMessage || Pid <- ListOfPids],

%% And some old farts still do this:
lists:map(fun side_effecty_procedure/1, ListOfThings),

%% Or even this (gasp!) which is actually made for this sort of thing:
lists:foreach(fun side_effecty_procedure/1, ListOfThings),
%% but lacks half of the semantics I describe below, so this part
%% of the function namespace is already taken... (q.q)

That works just fine, and is so common that list comprehensions have been optimized to handle this specific situation in a way that avoids creating a return list value if it is clearly not going to be assigned to anything. I remember thinking this was sort of ugly, or at least sort of hackish before I got accustomed to the idiom, though. “Hackish” in the sense that this is actually a syntax intended for the construction of lists and only incidentally a useful way to write a fast side-effect operation over a list of values, and “ugly” in the sense that it is one of the few places in Erlang you can’t force an assertion to check the outcome of a side-effecty operation.

For example, there is no equivalent to the assertive ok = some_procedure() idiom, or even the slightly more complex variation used in some other situations:

case foo(Bar) of
    ok    -> Continue();
    Other -> Other
end,

a compromise could be to write an smap/2 function defined something like

smap(F, List) ->
    smap(F, 1, List).

smap(F, N, []) -> ok;
smap(F, N, [H|T]) ->
    case F(H) of
        ok              -> smap(F, N + 1, T);
        {error, Reason} -> {error, {N, Reason}}
    end.

But this now has the problem of requiring that whatever is passed as F/1 have a return type of ok | {error, Reason} which is unrealistic without forcing a lot of folks to wrap existing side-effecty functions in something that coerces the return type to match this. Though that might not be bad practice ultimately, its still perhaps more trouble than its worth.

It isn’t like most Erlang code is written in a way where side-effecty iteration over a list is likely to fail, and if it actually does fail the error data from a crash will contain whatever was passed to the side-effecty function that failed. But this still just doesn’t quite sit right with me — that leaves the prospect of list iteration in the interest of achieving a series of side effects as at least a little bit risky to do in the error kernel (or “crash kernel”) of an application.

On the other hand, the specific case of broadcasting to a list of processes would certainly be nice to handle exactly the same way sending to a single process works:

%% To one message
Pid ! SomeMessage,
%% To a list of messages
Pids ! SomeMessage,

Which seems particularly obvious syntax, considering that the return of the ! form of send/2 is the message itself, meaning that the following two would be equivalent if the input to the ! form of send/2 also accepted lists:

%% The way it works now
Pid1 ! Pid2 ! SomeMessage,
%% Another way I wish it worked
[Pid1, Pid2] ! SomeMessage,

In any case, this is clearly not the sort of language wart that gets much attention, and its never been a handicap for me. It just seems a bit hackish and ugly to essentially overload the semantics of list comprehensions or (ab)use existing list operations like maps and folds to achieve the side effect of iterating over a list instead of having a function like smap/2 which is explicitly designed to achieve side-effecty iteration.

OpenSSL RSA DER public key PKCS#1 OID header madness

Thursday, February 26th, 2015

(Wow, what an utterly unappealing post title…)

I have run into an annoying issue with the DER output of RSA public keys in OpenSSL. The basic problem is that OpenSSL adds an OID header to its ASN.1 DER output, but other tools are not expecting this (be it iOS, a keystore in Java, the iPhone keystore, or Erlang’s public_key module).

I first noticed this when experiencing decode failures in Erlang trying to use the keys output by an openssl command sequence like:

openssl genpkey \
    -algorithm rsa \
    -out $keyfile \
    -outform DER \
    -pkeyopt rsa_keygen_bits:8192

openssl rsa \
    -inform DER \
    -in $keyfile \
    -outform DER \
    -pubout \
    -out $pubfile

Erlang’s public_key:der_decode(‘RSAPrivateKey’, KeyBin) would give me the correct #’RSAPrivateKey’ record but public_key:der_decode(‘RSAPublicKey’, PubBin) would choke and give me an asn1 parse error (ML thread I posted about this). Of course, OpenSSL is expecting the OID header, so it works fine there, just not anywhere else.

Most folks probably don’t notice this, though, because the primary use case for most folks is either to use OpenSSL directly to generate and then use keys, use a tool that calls OpenSSL through a shell to do the same, or use a low-level tool to generate the keys and then use the same library to use the keys. Heaven forbid you try to use an OpenSSL-generated public key in DER format somewhere other than OpenSSL, though! (Another reason this sort of thing usually doesn’t cause problems is that almost nobody fools around with crypto stuff in their tools to begin with, leaving this sort of thing as an issue far to the outside of mainstream hackerdom…)

Of course, the solution could be to chop off the header, which happens to be 24-bits long:

1> {ok, OpenSSLBin} = file:read_file("rsa3.pub.der.openssl").
{ok,<<48,130,4,34,48,13,6,9,42,134,72,134,247,13,1,1,1,5,
      0,3,130,4,15,0,48,130,4,...>>}
2> {ok, ErlangBin} = file:read_file("rsa3.pub.der.erlang").
{ok,<<48,130,4,10,2,130,4,1,0,202,167,130,153,242,77,196,
      252,167,142,159,17,13,69,148,41,161,50,...>>}
3> <<_:24/binary, ChoppedBin/binary>> = OpenSSLBin.
<<48,130,4,34,48,13,6,9,42,134,72,134,247,13,1,1,1,5,0,3,
  130,4,15,0,48,130,4,10,2,...>>
4> ChoppedBin = ErlangBin.
<<48,130,4,10,2,130,4,1,0,202,167,130,153,242,77,196,252,
  167,142,159,17,13,69,148,41,161,50,44,138,...>

But… that’s pretty darn arbitrary. It turns out the header is always:

<<48,130,4,34,48,13,6,9,42,134,72,134,247,13,1,1,1,5,0,3,130,4,15,0>>

I could match on that, but it still feels weird because that particular binary string just doesn’t mean anything to me, so matching on it is still the same hack. I could, of course, go around that by writing just the public key as PEM, loading it, encoding it to DER, and saving that as the DER file from within Erlang (thus creating a same-tool-to-same-tool situation). But that’s goofy too: PEM is just a base64 encoded DER binary wrapped in a text header/footer! Its completely arbitrary that PEM should work but DER shouldn’t!

-module(keygen).
-export([start/1]).

start([Prefix]) ->
    PemFile = string:concat(Prefix, ".pub.pem"),
    KeyFile = string:concat(Prefix, ".key.der"),
    PubFile = string:concat(Prefix, ".pub.der"),
    {ok, PemBin} = file:read_file(PemFile),
    [PemData] = public_key:pem_decode(PemBin),
    Pub = public_key:pem_entry_decode(PemData),
    PubDer = public_key:der_encode('RSAPublicKey', Pub),
    ok = file:write_file(PubFile, PubDer),
    io:format("Wrote private key to: ~ts.~n", [KeyFile]),
    io:format("Wrote public key to:  ~ts.~n", [PubFile]),
    case check(KeyFile, PubFile) of
        true  ->
            ok = file:delete(PemFile),
            io:format("~ts and ~ts agree~n", [KeyFile, PubFile]),
            init:stop();
        false ->
            io:format("Something has gone wrong.~n"),
            init:stop(1)
    end.

check(KeyFile, PubFile) ->
    {ok, KeyBin} = file:read_file(KeyFile),
    {ok, PubBin} = file:read_file(PubFile),
    Key = public_key:der_decode('RSAPrivateKey', KeyBin),
    Pub = public_key:der_decode('RSAPublicKey', PubBin),
    TestMessage = <<"Some test data to sign.">>,
    Signature = public_key:sign(TestMessage, sha512, Key),
    public_key:verify(TestMessage, sha512, Signature, Pub).

This is so silly its maddening — and apparently folks from the iOS, PHP and Java worlds have essentially built themselves hacks to handle DER keys that deal directly with this.

I finally found a way that is both semantically meaningful (well, somewhat) and is sure to either generate a proper PKCS#1 DER public RSA key or fail telling me that its looking at badly-formed ASN.1 (or binary trash): the magical command “openssl asn1parse -strparse [offset]”.

A key generation script therefore looks something like this:

#! /bin/bash

prefix=${1:?"Cannot proceed without a file prefix."}

keyfile="$prefix"".key.der"
pubfile="$prefix"".pub.der"

# Generate 8192 RSA key
openssl genpkey \
    -algorithm rsa \
    -out $keyfile \
    -outform DER \
    -pkeyopt rsa_keygen_bits:8192

# OpenSSL's PKCS#1 ASN.1 output adds a 24-byte header that
# other tools (Erlang, iOS, Java, etc.) choke on, so we clip
# the OID header off with asn1parse.
openssl rsa \
    -inform DER \
    -in $keyfile \
    -outform DER \
    -pubout \
| openssl asn1parse \
    -strparse 24 \
    -inform DER \
    -out $pubfile

The output on stdout will look something like:

$ openssl asn1parse -strparse 24 -inform DER -in rsa3.pub.der.openssl 
    0:d=0  hl=4 l=1034 cons: SEQUENCE          
    4:d=1  hl=4 l=1025 prim: INTEGER           :CAA78299F24DC4FCA78E9F110D459429A1322C8AF0BF558B7E49B21A30D5EFDF0FFF512C15E5292CD85209EBB21861235250ABA3FBD48F0830CC3F355DDCE5BA467EA03F58F91E12985E54336309D8F954525802949FA68A5A742B4933A5F26F148B912CF60D5A140C52C2E72D1431FA2D40A3E3BAFA8C166AF13EBAD774E7FEB17CAE94EB12BE97F3CABD668833691D2A2FB3001BF333725E65081F091E46597028C3D50ABAFAF96FA2DF16E6AEE2AE4B8EBF4360FBF84C1312001A10056AF135504E02247BD16C5A03C5258139A76D3186753A98B7F8E7022E76F830863536FA58F08219C798229A23D0DDB1A0B57F1A4CCA40FE2E5EF67731CA094B03CA1ADF3115D14D155859E4D8CD44F043A2481168AA7E95CCC599A739FF0BB037969C584555BB42021BDB65C0B7EF4C6640CCE89A860D93FD843FE77974607F194206462E15B141A54781A5867557D8A611D648F546B72501E5EAC13A4E01E8715DFE0B8211656CFA67F956BC93EA7FB70D4C4BCFEC764128EAE8314F15F2473FCF4C6EEA29299D0B13C2CCC11A73BF58209A056CB44985C81504013529C75B6563CED3CC27988C70733080B01F78726A3ABBCD8765538D515D2282EF79F4818A807A9173CC751E46BDB930E87F8DA64C6ABDD8127900D94A20EE87F435716F4871463854AD0B7443243BDA0682F999D066EA213952F296089DA4577B7B313D536185E7C37E68F87D43A53B37F1E0FB7969760F31C291FDB99F63F6010E966EC418A10EFE7D4EBD4B494693965412F0E78150150B61A4FF84AB26355FC607697B86BFB014E6450793D6BF5EA0565C63161D670CD64E99FFD9AE9CCF90C6F77949A137E562E08787D870604825EF955B3A26F078CBA5E889F9AFEEF48FE7F36A51D537E9ADF0746EA81438A91DF088C6C048622AC372AEEBD11A2427DFCBC2FE163FD62BE3DCBDC4961ECD685CBAD55898B22E07A8C7E3FB4AEA8AC212F9CA6D6FE522EC788FEDD08E403BD70D1ABEDE03B760057920B27B70CBDCB3C1977A7B21B5CD5AF3B7D57A0B2003864C4A56686592CBFC1BBAF746E35937A3587664A965B806C06C94EC78119A8F3BB18BCFCEB1E5193C0EFD025A86AD6C2AE81551475A14B81D5A153E993612E1D7AED44AB38E9A4915D4A9B3A3B4B2DF2C05C394FC65B46E3983EA0F951B04B7558099DB04E73F550D08A453020EFD7E1A4F2383C90E8F92A2A44CC03A5E0B4F024656741464D86E89EA66526A11193435EB51AED58438AA73A4A74ABF3CDE1DE645D55A09E33F3408AD340890A72D2C6E4958669C54B87B7C146611BFD6594519CDC9D9D76DF4D5F9D391F4A8D851FF4DAE5207585C0A544A817801868AA5106869B995C66FB45D3C27EE27C078084F58476A7286E9EE5C1EF76A6CF17F8CDD829EFBB0C5CED4AD3D1C2F101AB90DC68CEA93F7B891B217
 1033:d=1  hl=2 l=   3 prim: INTEGER           :010001

And if we screw it up and don’t actually align with the ASN.1 header properly things explode:

$ openssl asn1parse -strparse 32 -inform DER -in rsa3.pub.der.openssl 
Error parsing structure
140488074528416:error:0D07207B:asn1 encoding routines:ASN1_get_object:header too long:asn1_lib.c:150:
140488074528416:error:0D068066:asn1 encoding routines:ASN1_CHECK_TLEN:bad object header:tasn_dec.c:1306:
140488074528416:error:0D06C03A:asn1 encoding routines:ASN1_D2I_EX_PRIMITIVE:nested asn1 error:tasn_dec.c:814:

Now my real question is… why isn’t any of this documented? This was a particularly annoying issue to work my way around, has obviously affected others, and yet is obscure enough that almost no mention of it can be found in documentation anywhere. GHAH!

Why OTP? Why “pure” and not “raw” Erlang?

Wednesday, November 12th, 2014

I’ve been working on my little instructional project for the last few days and today finally got around to putting a very minimal, but working, chat system into the ErlMUD “scaffolding” code. (The commit with original comment is here. Note the date. By the time this post makes its way into Google things will probably be a lot different.)

I commented the commit on GitHub, but felt it was significant enough to reproduce here (lightly edited and linked). The state of the “raw Erlang” ErlMUD codebase as of this commit is significant because it clearly demonstrates the need for many Erlang community conventions, and even more significantly why OTP was written in the first place. Not only does it demonstrate the need for them, the non-trivial nature of the problem being handled has incidentally given rise to some very clear patterns which are already recognizable as proto-OTP usage patterns (without the important detail of having written any behaviors just yet). Here is the commit comment:

Originally chanman had been written to monitor, but not link or trap exits of channel processes [example]. At first glance this appears acceptable, after all the chanman doesn’t have any need to restart channels since they are supposed to die when they hit zero participants, and upon death the participant count winds up being zero.

But this assumes that the chanman itself will never die. This is always a faulty assumption. As a user it might be mildly inconvenient to suddenly be kicked from all channels, but it isn’t unusual for chat services to hiccup and it is easy to re-join whatever died. Resource exhaustion and an inconsistent channel registry is worse. If orphaned channels are left lying about the output of \list can never match reality, and identically named ones can be created in ways that don’t make sense. Even a trivial chat service with a tiny codebase like this can wind up with system partitions and inconsistent states (oh no!).

All channels crashing with the chanman might suck a little, but letting the server get to a corrupted state is unrecoverable without a restart. That requires taking the game and everything else down with it just because the chat service had a hiccup. This is totally unacceptable. Here we have one of the most important examples of why supervision trees matter: they create a direct chain of command, and enforce a no-orphan policy by annihilation. Notice that I have been writing “managers” not “supervisors” so far. This is to force me to (re)discover the utility of separating the concepts of process supervision and resource management (they are not the same thing, as we will see later).

Now that most of the “scaffolding” bits have been written in raw Erlang it is a good time to sit back and check out just how much repetitive code has been popping up all over the place. The repetitions aren’t resulting from some mandatory framework or environment boilerplate — I’m deliberately making an effort to write really “low level” Erlang, so low that there are no system or framework imposed patterns — they are resulting from the basic, natural fact that service workers form constellations of similarly defined processes and supervision trees provide one of the only known ways to guarantee fallback to a known state throughout the entire system without resorting to global restarts.

Another very important thing to notice is how inconsistent my off-the-cuff implementation of several of these patterns has been. Sometimes a loop has a single State variable that wraps the state of a service, sometimes bits are split out, sometimes it was one way to begin with and switched a few commits ago (especially once the argument list grew long enough to annoy me when typing). Some code_change/N functions have flipped back and forth along with this, and that required hand tweaking code that really could have been easier had every loop accepted a single wrapped State (or at least some standard structure that didn’t change every time I added something to the main loop without messing with code_change). Some places I start with a monitor and wind up with a link or vice versa, etc.

While the proper selection of OTP elements is more an art than a science in many cases, having commonly used components of a known utility already grouped together avoids the need for all this dancing about in code to figure out just what I want to do. I suppose the most damning point about all this is that none of the code I’ve been flip-flopping on has been essential to the actual problem I’m trying to solve. I didn’t set out to write a bunch of monitor or link or registry management code. The only message handler I care about is the one that sends a chat message to the right people. Very little of my code has been about solving that particular problem, and instead I consumed a few hours thinking through how I want the system to support itself, and spent very little time actually dealing with the problem I wanted to treat. Of course, writing this sort of thing without the help of any external libraries in any other language or environment I can think of would have been much more difficult, but the commit history today is a very strong case for making an effort to extract the common patterns used and isolate them from the actual problem solving bits.

The final thing to note is something I commented on a few commits ago, which is just how confusing tracing message passage can be when not using module interface functions. The send and receive locations are distant in the code, so checking for where things are sent from and where they are going to is a bit of a trick in the more complex cases (and fortunately none of this has been particularly complex, or I probably would have needed to write interface functions just to get anything done). One of the best things about using interface functions is the ability to glance at them for type information while working on other modules, use tools like Dialyzer (which we won’t get into we get into “pure Erlang” in v0.2), and easily grep or let Emacs or an IDE find calling sites for you. This is nearly impossible with pure ad hoc messaging. Ad hoc messaging is fine when writing a stub or two to test a concept, but anything beyond that starts getting very hard to keep track of, because the locations significant to the message protocol are both scattered about the code (seemingly at random) and can’t be defined by any typing tools.

I think this code proves three things:

  • Raw Erlang is amazingly quick for hacking things together that are more difficult to get right in other languages, even when writing the “robust” bits and scaffolding without the assistance of external libraries or applications. I wrote a robust chat system this afternoon that can be “hot” updated, from scratch, all by hand, with no framework code — that’s sort of amazing. But doing it sucked more than it needed to since I deliberately avoided adhering to most coding standards, but it was still possible and relatively quick. I wouldn’t want to have to maintain this two months from now, though — and that’s the real sticking point if you want to write production code.
  • Code convention recommendations from folks like Joe Armstrong (who actually does a good bit of by-hand, pure Erlang server writing — but is usually rather specific about how he does it), and standard set utilities like OTP exists for an obvious reason. Just look at the mess I’ve created!
  • Deployment clearly requires a better solution than this. We won’t touch on this issue for a while yet, but seriously, how in the hell would you automate deployment of a scattering of files like this?

Finally working on an intermediate Erlang instructional

Saturday, October 25th, 2014

After contemplating the state of the tiny constellation of Erlang resources available, both in-depth instructionals and books, I’ve finally decided that what is most lacking is an intermediate resource. So I’ve started working on one. The goal is to demonstrate a non-trivial system (a MUD, in this case) as it would be written in raw, beginner-style Erlang and show how a project can evolve to a for-real OTP system by providing complete project examples and code commentary throughout the intermediate stages of evolution.

Hopefully this will force me to cover many of the giant, undocumented gaps Erlangers tend to encounter when trying to make the transition from Erlang beginner to OTP master, and I’ll undoubtedly learn a lot in the process myself. That last bit about self education is, of course, my own selfish motive for doing this. I have yet to find an aggregate body of Erlang best practices or collection of “common things people do by hand in Erlang and the idiomatic way to handle them in OTP”, and I really wish there was such a thing. I especially wish there were such a thing based on a single, non-trivial project instead of toy examples so that learners could get a grasp on why OTP matters by reading a real project instead of trying to assemble a mental model of the grander sculpture by examining bits of dust cast off by the chisel.

I chose a MUD as an example for ease of understanding, non-triviality (too many example systems are unrealistically small), freedom from graphical distractions, and the striking number of parallels between many MUD subsystems and real-world business, game, and social software. Hopefully work and real life don’t interfere to the point that I have to stop this mid-way through!