Posts Tagged ‘erlang’

Silly: Hextexting via the command line…

Tuesday, August 14th, 2018

A silly thread on Twitter came to my attention today that stirred some late 1980’s/1990’s phreak/hax0r nostalgia in me. So, of course, I did what any geek would do, wrote a one-off utility script for it. Have fun confusing your parents, kids.

#! /usr/bin/env escript

-mode(compile).

main([Command | Input]) ->
    ok = io:setopts([{encoding, unicode}]),
    Output = convert(Command, Input),
    io:format("~ts~n", [Output]).

convert("t", Input) ->
    String = string:join(Input, " "),
    string:join(lists:map(fun(C) -> integer_to_list(C, 16) end, String), " ");
convert("h", Input) ->
    lists:map(fun(C) -> list_to_integer(C, 16) end, Input);
convert(_, _) ->
    "hextext usage: `hextext t|h [text]".

(Also, look up rot13 — ’twas all the rage 30 years ago, and still makes an appearance as a facilitator of hidden easter eggs in some games. A lot of “garbled alien/monster/otherling speech” text is rot13.)

Erlang: Getting Started Without Melting

Wednesday, July 11th, 2018

There are two things that might be meant when someone references “Erlang”: the language, and the environment (the EVM/BEAM and OTP). The first one, the language part, is actually super simple and quick to learn. The much larger, deeper part is learning what the BEAM does and how OTP makes your programs better.

It is clear that without an understanding of Erlang we’re not going to get very far in terms of understanding OTP and won’t be skilled enough to reliably interact with the runtime through a shell. So let’s forget about the runtime and OTP for a bit and just aim at the lowest, most common beginners’ task in coding: writing a script that tells me “Hello, World!” and shows whatever arguments I pass to it from the command line:

#! /usr/bin/env escript

% Example of an escript
-mode(compile).

main(Args) ->
    ok = io:setopts([{encoding, unicode}]),
    ok = io:format("Hello, world!~n"),
    io:format("I received the args: ~tp~n", [Args]).

Let’s save that in a file called e_script, run the command chmod +x e_script to make it executable, and take a look at how this works:

ceverett@takoyaki:~$ ./e_script foo bar
Hello, world!
I received the args: ["foo","bar"]
ceverett@takoyaki:~$

Cool! So it actually works. I can see a few things already:

  1. I need to know how to call some things from the standard library to make stuff work, like io:format/2
  2. io:setopts([{encoding, unicode}]) seems to makes it OK to print UTF-8 characters to the terminal in a script
  3. An escript starts execution with a traditional main/1 function call

Some questions I might have include how or why we use the = for both assignment and assertion in Erlang, what the mantra “crash fast” really means, what keywords are reserved, and other issues which are covered in the Reference Manual (which is surprisingly small and quick to read and reference).

An issue some newcomers encounter is that navigating an unfamiliar set of documentation can be hard. Here are the most important links you will need to know to get familiar and do useful things with the sequential language:

This is a short list, but it is the most common links you’ll want to know how to find. It is also easy to pull up any given module for doing a search for “erlang [module name]” on any search engine. (Really, any of them.)

In the rare case that erlang.org is having a hard time I maintain a mirror of the docs for various Erlang release versions here as well: http://zxq9.com/erlang/

Start messing with sequential Erlang. Don’t worry about being fancy and massively concurrent or maximizing parallelization or whatever — just mess around at first and get a feel for the language using escript. It is a lot of fun and makes getting into the more fully encompassing instructional material much more comfortable.

Erlang: R21 doc mirror

Wednesday, June 27th, 2018

Erlang doc mirror for R21 is now up.
(For those times when erlang.org takes a nap…)

http://zxq9.com/erlang/docs/reg/21.0/

Erlang: ZJ docs

Wednesday, June 27th, 2018

Docs for the ZJ Erlang JSON encoder/decoder are now available here:

http://zxq9.com/projects/zj/docs/

The binary_encode/1 function will probably be live tomorrow, along with a proper v1.0 release.

Your tests don’t tell you what you think they do

Tuesday, June 26th, 2018

Yesterday I wrote a tiny JSON encoder/decoder in Erlang. While the Erlang community wasn’t in dire need of yet another JSON parser, the ones I saw around do things just a tiny bit differently than I want them to and writing a module against RFC-8259 isn’t particularly hard or time consuming.

Someone commented on (gasp!) the lack of tests in that module. They were right. I just needed the module to do two things, the code is boring, and I didn’t write tests. I’m such a rebel! Or a villain! Or… perhaps I’m just someone who values my time.

Maybe you’re thinking I’m one of those coding cowboys who goes hog wild on unsafe code! No. I’m not. Nothing could be further from the truth. What I have learned over the last 30 years of fiddling about with software is that hand-written tests are mostly a waste of time.

Here’s what happens:

  1. You write a new thingy.
  2. You throw all the common cases at it in the shell. It seems to work. Great!
  3. Being a prudent coder you basically translate the things you thought to throw at it in the shell into tests.
  4. You hook it up to an actual project you’re using somewhere — and it breaks!
  5. You fix the broken bits, and maybe add a test for whatever you fixed.
  6. Then other people start using it in their projects and stuff breaks quite a lot more ZOMG AHHH!

Where in here did your hand-written tests help out? If you write tests to define the bounds of the problem before you actually wrote your functions then tests might help out quite a lot because they deepen your understanding of the problem before you really tackle it head-on. Writing tests before code isn’t particularly helpful if you already thoroughly understand the problem and just need something to work, though.

When I wrote ZJ yesterday I needed it to work in the cases that I care about — and it did, right away. So I was happy. This morning, however, someone else decided to drop ZJ into their project and give it a go — and immediately ran into a problem! ZJ v0.1.0 returns an error if it finds trailing commas in JSON arrays or objects! Oh noes!

Wait… trailing commas aren’t legal in JSON. So what’s the deal? Would tests have discovered this problem? Of course not, because hand-written tests would have been bounded by the limits of my imagination and my imagination was hijacked by an RFC all day yesterday. But the real world isn’t an RFC, and if you’ve ever dealt with JSON in the wild that you’re not generating you’ll know that all sorts of heinous and malformed crap is clogging the intertubes, and most of it sports trailing commas.

My point here isn’t that testing is bad or always a waste of time, my point is that hand-written tests are themselves prone to the exact same problems the code being tested is: you wrote them so they carry flaws of implementation, design and scope, just like the rest of your project.

“So when is testing good?” you might ask. As mentioned earlier, those cases where you are trying to model the problem in your mind for the first time, before you’ve written any handling code, is a great time to write tests for no other reason than they help you understand the problem. But that’s about as far as I go with hand-writing tests.

The three types of testing I like are:

  • type checks
  • machine generated (property testing)
  • real-world (user testing)

A good type checker like Dialyzer (or especially ghc’s type system, but that’s Haskell) can tell you a lot about your code in very short order. It isn’t unusual at all to have sections of code that are written to do things that are literally impossible, but you wouldn’t know about until much later because, due simply to lack of imagination, quite often hand-written tests would never have executed the code, or not in a way that would reveal the structural error.
Typespecs: USE THEM

Good property testing systems like PropEr and QuickCheck generate and run as many tests as you give them time to (really, it is just constrained by time and computing resources), and once they discover breakages can actually fuzz the problem out to pinpoint the exact failing cases and very often indicate the root cause pretty quickly. It is amazing. If you ever experience this you’ll never want to hand write tests again.
Property Testing: USE IT

What about user testing? It is simply necessary. You’ll never dream up the insane stuff to try that users will, and neither will a property-based test generation system. Your test and development environment will often bear little resemblance to your users’ environments (a few weirdos out there still use Windows!), the things you might think to store in your system will rarely look anything like the sort of stuff they will wind up storing in it (you were thinking text, they were thinking video), and the frequency of operation that you assumed might look realistic will almost never been anywhere close to the mark (your one-off utility program that you assumed would run in isolation initiated by a user command in ~/bin/ may become the core part of a massively parallelized service script executed every minute by a cron job running as root).
Your Users: COMMUNICATE WITH THEM

Ultimately, hand-written tests tend to reveal a lot more about the author of the tests than the status of the software being tested.

Tiny strings-as-strings JSON in portable Erlang

Monday, June 25th, 2018

There are several JSON libs for Erlang at this point, and as there is no correct mapping between JSON types and Erlang types, all make different tradeoffs that either work or don’t for your project. Beyond that, various interface and implementation differences exist due to the tradeoffs inherent in manipulating elements of the Black Tongue known as lolscript:

  • Accept values to encode as magic tagged tuples so you can specify exactly what you want VS being ambiguous
  • Never allow “naked” values (everything must be in a list/array or a map or a [whatever]) VS “hanging” values
  • Treat all strings ever as binaries because “strings are big” VS treating all strings (and binaries) as strings because strings are easy to manipulate (io_lists…)
  • Decode JSON “objects” as proplists VS decode JSON objects to dicts or maps VS add an “options” argument to the decode function
  • Encode and decode values various ways based on optional switches VS “sane defaults” (aka “works for me”)
  • Achieve lolspeed via NIFs and only work on *nix VS maintain portability via pure Erlang
  • etc.

No combination is correct for every situation, hence the proliferation of libraries. In addition to proliferation, something as simple as what is described by RFC-8259 shouldn’t require a 20k LoC dependency to manage, at least not in Erlang of all languages.

The general strings-as-strings + portability tradeoffs were made by mochiweb years ago, with mochijson2 being the go-to JSON parser for lots of projects. Now that “tuple calls” have finally been retired after years of obsolescence and deprecation, mochijson2 is finally giving up the ghost as well (as it was based on tuple calls). As a replacement that makes mostly the same tradeoffs but is arguably simpler, I wrote a single-module JSON encoder/decoder lib. It treats all strings as strings, is in pure Erlang, and is utterly boring in how simple the code is. Nothing magical to see. At all. So don’t get excited.

If you need to read things in and read things out, in JSON, and don’t really care about lolspeed but want to understand what is happening, then ZJ is for you: ZJ project @ gitlab

Note that if you have roughly the same requirements but you want to make the strings-as-binaries tradeoff then JSX is the lib for you.

 

Erlang: Eventually Things Will Change

Wednesday, May 30th, 2018

I finally got a few days to really dedicate to the whole Zomp/ZX thing and wrote some docs.

If you actually click this link soon you’ll see an incomplete pile of poo, but it is a firm enough batch of poo that I can show it now, and you can get a very basic idea what this system is supposed to do:

Zomp/ZX docs

Some pages are missing and things are still a bit self-conflicted. The problem is that until you really use a system like this a bit it is hard to know what the actual requirements need to be. So that’s been a long internal journey.

If my luck holds I’ll have something useful out in short order, though. Here’s to keeping fingers crossed and creating useful on-ramps for new programmers in desperate need of easy-to-use power tools. While we can all only hope the gods will help them when it comes to tackling their actual human-relevant problems, the environment in which they render their solutions should not be actively hostile.

Erlang: R20.3 doc mirror

Sunday, April 15th, 2018

The main Erlang website has been super snappy for the last several months so I had slacked off on mirroring the documentation. Today it seems there are a few problems (DDOS-type symptoms, but I have no idea what is going on) so I’ve gone ahead and mirrored the R20 docs.

I also updated the “Erlang Stuff” page — though that page is going to get a few more changes once the Erlang tooling suite I’m working on is out, as now tinder, firekit, flint and zx are now all incorporated into the new (much better) thing… but more on that thing later once I’m done.

Confounding Beginner Question: What is an Erlang Atom and Why is it Useful?

Thursday, February 1st, 2018

Like other Erlangers, I tend to take the atom data type for granted. Coming from another language, however, you might be puzzled at why we have all these little strings that aren’t really strings.

The common definition you’ll hear most frequently is something like:

An atom is a label. Its only meaning is itself.

Well, that’s true, but that also sounds a bit useless to someone coming from Python or R or JavaScript or whatever. So let’s break that down: what is a “label” useful for in programs?

  • Variable names are labels.
  • Function names are labels.
  • Module names are labels.
  • The strings you use as keys in a key/value data structure are labels.
  • The enums and label macros you might use in C for semantically significant internal values are almost exactly like atoms

OK, so we use labels all the time, why don’t any of those other languages have atoms, though? Let’s examine those last two reasons for a moment for a hint why.

In Python strings are objects and while building them is expensive, hashing them can be done ahead of time as a cached operation. This means comparing two strings of arbitrary length for equality is extremely cheap, because it is reduced to a large integer comparison for equality. This is not true in, say, C or Erlang or Lisp unless you build your own data structure to carry around the pre-hashed data. In Python it is simple enough to say:

if 'foo' in some_dict:
  # stuff
else:
  # other stuff

In C, however, string comparison is a bit of a hassle and dealing with string data in a cross-platform environment at all can be super annoying depending the age of the systems you might be interacting with or running/building your code on. In Erlang the syntax of string comparison is super simple, but the overhead is not pre-paid like in Python. So what is the alternative?

We use integer values to represent keys that are semantically meaningful to the program at the time it is written. But integers are hard to remember, so instead of having magic numbers floating all around the place we typically have semantically significant integer values aliased from a text label as a macro. This is helpful so that I don’t have to remember the meaning of code like

if (condition == 42) launch_missiles();
if (condition == 86) eat_kittens();

Instead I can write code like:

#define UNDER_ATTACK    42
#define VILE_UNDERBEAST 86

if (condition == UNDER_ATTACK)    launch_missiles();
if (condition == VILE_UNDERBEAST) eat_kittens();

It is extremely common in programs to have variables or arguments like condition in the above example. It doesn’t matter whether your language has matching (like Erlang, Rust, logic languages, etc.) or uses explicit conditionals like the fake C example above — there will always be a huge number of micro datatypes that carry great semantic significance within your program and only within your program and it is as useful to be able to label these enumerated values in a way that the human coders can understand and remember as it is useful for the computer to be able to compare them as simple integers instead of going to the trouble of string comparison every time your code needs to make a decision (because string comparison entails an arbitrarily long sequence of integer comparisons every single time you compare two strings).

In C we use those macros like above (well, not always; C actually does have super convenient enums that work a lot like atoms, but didn’t when I started using it as a kid in the stone age). In Erlang we just use an atom right there in place. You don’t need a declaration or definition anywhere, the runtime just keeps track of these things for you.

Underneath the hood Erlang maintains a running table of atom label values and translates them to integer values on the way into the system and on the way out of the system. The integer each atom actual resolves to is totally unimportant to you, so Erlang abstracts that detail away, but leaves the machine comparing integer values instead of doing full-string comparisons all over the place.

“But Erlang maps don’t do string comparisons on keys!” you might say.

And indeed, you would be right. Because map keys might be any arbitrary value each key is hashed on the way in, and every time keys are compared the comparing term is hashed the same way, so the end comparison is super fast, but we have to hash the input value first for it to mean anything. With atoms, though, we have a shortcut, because we already know they are both unambiguous integer values throughout the system, and this is a slight win over having to hash first before comparing keys.

In other situations where the comparison values cannot be hashed ahead of time, like function-head matching, however, atoms are a huge win over string comparisons:

-module(atoms).
-export([foo/1, bar/1]).

foo("Some string value that I don't really recall") ->
    {ok, 1};
foo("Some string value that I don't really care about") ->
    {ok, 2};
foo("Where is my cheeseburger?") ->
    {ok, 3};
foo(_) ->
    {error, wonky_input}.

bar(dont_recall) ->
    {ok, 1};
bar(dont_care) ->
    {ok, 2};
bar(cheeseburger) ->
    {ok, 3};
bar(_) ->
    {error, wonky_input}.

I’ve slowed the clockspeed of the system so that we can notice any difference here in microseconds.

1> timer:tc(fun() -> atoms:foo("Some string value that I don't really care about.") end).
{16,{error,wonky_input}}
2> timer:tc(fun() -> atoms:foo("Where is my cheeseburger?") end).
{13,{ok,3}}
3> timer:tc(fun() -> atoms:foo("arglebargle") end).
{12,{error,wonky_input}}
4> timer:tc(fun() -> atoms:bar(dont_care) end).
{9,{ok,2}}
5> timer:tc(fun() -> atoms:bar(cheeseburger) end).                                      
{10,{ok,3}}
6> timer:tc(fun() -> atoms:bar(arglebargle) end).                                        
{10,{error,wonky_input}}

See what happened? The long string that varies only at the tail end from two options in the function head takes 16 microsecond to compare and return a value. The string that differs at the head is evaluated as a bad match for the first two options the moment the very first character is compared. The total mismatch is our fastest return because that string never need be traversed even a single time to know that it doesn’t match any of the available definitions of foo/1. With atoms, however, we see a pretty constant speed of comparison. That speed would not change at all even if the atoms were a hundred characters long in text, because underneath they are all just integer values.

Now take a look back and consider the return values defined for foo/1 and bar/1. They don’t return naked values, they return pairs where the first member is an atom. This is a pretty common technique in Erlang when writing either a library intended for 3rd party use or when defining functions that have side-effecty operations that might fail (here we have pure functions, but I’m just using this as an example). Remember, the equal sign in Erlang is both an assignment operator and an assertion operator, when calling a function that nests its return values you have the freedom to decide whether to crash the current process on an unexpected value or to handle the “error” (in which case for your program it becomes an expected condition and not an exception).

blah(Condition) ->
    {ok, Value} = foo(Condition),
    do_stuff(Value).

The code above will crash if the tuple {error, wonky_input} is returned, because the expected atom 'ok' does not match the actually returned atom ‘error’.

bleh(Condition) ->
   case foo(Condition) of
       {ok, Value}          -> do_stuff(Value);
       {error, wonky_input} -> get_new_condition()
   end.

The code above now does not crash on that error return value and instead moves on to get another condition to try out, because the error tuple matches one of the case conditions that is defined as a return value. All this can happen really fast because atoms comparisons are really integer comparisons, and that means we save a ton of processor time (and space) by avoiding string/list or binary comparisons all over the place.

In addition to atoms being a much nicer and dramatically more flexible version of global enumerated types that let us write code in a more natural style that uses normal-language labels for program semantics, it turns out that function and module names are also atoms. This is a really nice feature in itself, because it allows us to write highly dynamic code with a lot less confusion about what types both sides of a call needs to be as well as making the code easier to read. I can even implement my own version of apply/3:

my_apply(Module, Function, Args) ->
    Module:Function(Args).

Of course, there is a whole pile of reasons why you will never want to actually write a function like this in a real program, but that’s the sort of power we have without doing any type casting magic, introspection, or on-the-fly modification of our program, references or memory space.

Once you get used to using atoms and matching you’ll really start to miss them in other languages and wonder how you ever got along without them. Now run off and start writing some code to practice thinking with atoms. They will become natural to you before the day is out.

Zomp/zx: Yet Another Repository System

Tuesday, December 12th, 2017

I’ve been working on a from-source repo system for Erlang on and off for the last few months, contributing time to it pretty much whenever real-life is not interfering. I’m getting close to making a release. Now that my main data bits are worked out, the rest isn’t all that hard. I need to figure out what I want to say in an announcement.

The problem is that I’m really horrible at announcements and this system does things in a pretty different way to other repository systems out there, so I’m not sure what things are going to be important about it to users (worth putting into an announcement) and what things are going to be important to only me because I’m the one who wrote it (and am therefore obsessed with its externally inconsequential internals). What is internally interesting about a project is almost never what is externally interesting about it. Marketing; QED. So I need to sort that out, and writing sometimes helps me sort that kind of thing out.

I’m making this deliberately half-baked, disorganized, over-long post public because Joe Armstrong gave me some food for thought the other day. I had written him my thoughts on a subject posted to a mailing list but sent the message in private. I made my message to him off-list for two reasons: first, I wasn’t comfortable with my way of expressing the idea just yet; and second, I am busy with real-life stuff and side projects, including the repo system, and don’t want to get sucked into online chatter that might amount to nothing more than bikeshedding. (I’m a world-class bikeshedder!) Joe wrote me back asking why I made the reply private, I told him my reasons, and he made me change my mind. He hopes that more people will publish their ideas all the time, good or bad, fully baked or still soggy — because that’s the only way we can ever find any other interesting ideas these days is by searching for them, usually in text, on the net somewhere. It isn’t like we can’t go back and revise, but whether or not we do go back and clean up our literary messes, the availability of core ideas and exposure of thought processes are more important than polish. He’s been on a big drive to make sure that he posts most of his thoughts to public mailing lists or blogs so that his ideas get at least indexed and archived. On reflection I agree with him.

So here I am, trying to publicly organize my thoughts on my repository system.

I should start with the goals of the system.

This system is intended to smooth over a few points of pain experienced when trying to get a new Erlang project off the ground, and in particular avert the path of pain peculiar to Erlang newcomers when they encounter the “how to set up a project” problem. Erlang’s tooling is great but a bit crufty (deeply featured, but confusing to interface with) and not at all what the kool kids expect these days. And anyway I’m really just trying to scratch my own itch here.

At the moment we have two de facto standards for publishing Erlang systems: erlang.mk and Rebar. I like both of these, especially erlang.mk, but they do one thing that annoys me and never seems to quite fit my need: they build Erlang releases.

Erlang releases are great. They cut all the cruft of a release out and pack everything needed to actually run a system into a single blob of digits that you can move, in a single shot, to a new target system — including the Erlang runtime itself. Awesome! Self-contained deployment and it never misses. This has been an Erlang feature since before people even realized that they needed repeatable deployment infrastructure outside of the classic “let’s build a monolithic, static binary executable” approach. (Erlang is perpetually ahead of its time, even by today’s standards. I look at the poor kids stubbing their toes with Docker and language du jour and just shake my head — though part of that is because many shops are using Docker to solve concurrency issues that they haven’t even become cognizant of, thinking that they are experiencing “scaling” problems but missing the point entirely.)

Erlang releases are awesome when the deployment target is an embedded system, but not so awesome if the target is a full-blown operating system, VM, container, or virtual environment fully stocked with gobs of memory and storage and flush with system utilities and resources. Erlang releases sort of kitchen-sink the deployment itself. What if you want to run several different Erlang programs, all delivered as releases, all depending on the same library? You’ve got tons of copies of that library. Which is OK, but still sort of weird, because you also have tons of copies of the runtime (among other things). Each release is self-contained and lean, but in aggregate this is a bit odd.

Erlang releases make sense when you’re deploying to a phone switch or a sensor device in the middle of nowhere and the runtime is basically acting as its own operating system. Erlang releases are, in that context, analogous to putting a Gentoo stage 3 binary image on a system to leapfrog most of the toolchain process. Very cool when you’re in that situation, but a bit tinker-tacky when you’re just trying to run, say, a client program written in Erlang or test a web front-end for something that uses YAWS or Cowboy.

So that’s the siloed-kitchen-sink issue. The other issue is that newcomers are perpetually confused about releases. This makes teaching elementary Erlang hard. In my view we should really focus on escript for beginner code — just let the new guy run something out of a single file the way he is used to doing when learning a new language instead of showing him pages of really slick code, then some interpreter stuff, and then leaping straight from that to a complex and advanced packaging setup necessarily tailored for conducting embedded deployments to slim hardware devices. Seriously. WTF. Escripts give beginners all the power of Erlang necessary for exploring the more interesting bits of code and refactoring needed to learn sequential Erlang with the major advantage of being able to interface with the system the same way programmers from other environments are used to dealing with langauge runtimes like Bash, AWK, Python, Ruby, Perl, etc.

But what about that gap between scripts and full-blown production deployments for embedded hardware?

Erlang has… nothing.

That’s right! There is no agreed-upon way to deploy or even run Erlang code in the same manner a Python coder would expect to execute a python program. There is no virtualenv type system, there is no standard answer to the question “if I’m in the project directory and type ./do_thingy it will just work, right?” The answer is always “Well, it depends…” and what actually winds up happening is that people either roll a whole release just to crank a trivial amount of code up or (quite often) implement an ad hoc way to get the same effect in a lighter-weight way. (erlang.mk shines here, actually.)

Erlang does provide a number of ways to make a system run locally from source of .beam files — and has actually quite reasonable built-in resources for this — but nothing has been built around these tools that also deals with external dependencies, argument passing in a standard way, or any of the other little things you really need if you want to claim a complete solution. Hence all the ad hoc solutions that “work on my machine” but certainly aren’t something you expect your users to use (not with broad success, anyway).

This wouldn’t be such a big problem if it weren’t for the fact that not having any standard way to “just run a program” also means that there really isn’t any standard way to deal with client side code in Erlang. This is a big annoyance for me because much of what I do is client-side code. In Erlang.

In fact, it totally boggles my mind that client-side Erlang isn’t more common, especially considering that AMD is already fielding zillion-core processors for desktops, yet most languages are fundamentally single-threaded. That doesn’t mean you can’t do concurrency and parallelism in other languages, but most problems are not parallel in nature to begin with (parallel problems are easy to write solutions to in any language) while most real-world problems are concurrent. But concurrent systems are hard to write in almost every language. Concurrent problems are the bulk of the interesting problems we’re still not very good at solving with computers. AMD is moving to make the tools available to make much more interesting concurrent processing tools available on the client side (which means Intel will soon start pouring it gajillions worth of blood diamond money into a similar effort), but most languages and environments have no good way to make use of that on the client side. (Do you see why I hear Lady Fortune knocking?)

Browsers? Oh yeah. That’s a great plan. Have you noticed that most sites slowly move toward the “Single Page App” design over time (read as: the web sucks, so now we write full-but-crippled client-programs and deliver them over the web), invest heavily in do-sneaky-things-without-telling-you JavaScript and try to hog every core your system has if you allow it the slightest permission to do so? No. In the age of bitcoin miners embedded in nearly every ad this is not the direction I think we should be envisioning things going.

I want to take better advantage of the cores users have available, and that doesn’t necessarily mean make more efficient use of every cycle as much as it means to make scheduling across processes more efficient to reduce latency throughout the system overall. That’s something users care about quite a lot. This is the problem Erlang has already solved in a way no other runtime out there has. So I want to capitalize on it.

And yet, there is still not standardish way of dealing with code from source, running it locally, declaring or resolving dependencies, or even launching a client-side program at all.

So… how am I approaching it?

I have a project called “zomp” which is a repository system. It is a distributed repository system, so not everything has to be held in one place. Code in the zomp universe is held in little semantic silos called “realms”. Each realm can have whatever packages the owner (sysop) wants it to have. Each realm must have one server node somewhere that is its “prime” — the node in charge of that realm. That node is where system operator tasks for that realm take place, packagers and maintainers submit code for inclusion, where the package index is built, where the canonical copy of everything is stored. Other nodes configured to see that realm connect to the prime node and receive a copy of the current indexes and are tested for availability and published as available resources for querying indexes or downloading packages.

When too many subordinate nodes connect to a prime the prime will redirect a new node to a subordinate, when a subordinate gets “full” of subordinates itself, it picks a subordinate for new redirects itself, etc. so each realm winds up forming a resource tree of mirror nodes that connect back to the realm prime by a single path. A single node might be prime for several realms, or other nodes may act as prime for different realms — and any node can be configured to become a part of any number of realm trees.

That’s the high-level code division.

The zomp constellation is interfaced with via the “zx” program (short for “zomp explorer”, or “zomp exchanger”, or “Zomp eXtreem!”, or homage to the Sinclair ZX-81, or whatever else might lend itself to the letters “zx” that you might want to make up — I actually forget what it originally stood for, but it is remarkably convenient to type so it’s staying that way)

zx is configured to have visibility on zomp realms the same way a zomp node is (in fact, they use the same configuration files and it isn’t weird to temporarily host a zomp node on your desktop the same way you might host a torrent node for a while — the only extra effort is that you do have to open a port, zomp doesn’t (yet) do hole punching magic).

You can tell zx to run a program using the highly counter-intuitive command:

zx run Realm-ProgramName[-Version]

It breaks the program name down into:

  • Realm (optional, defaulting to the main realm of public FOSS packages called “otpr”)
  • Name (necessary — sort of the whole point)
  • Version (which is optional and can also be partial: “1.0.3” vs just “1.0” or “1”, defaulting to the latest in a series or latest overall)

With those components it then contacts any zomp node it knows provides the needed realm, resolves the latest version number of the requested program, downloads and unpacks it, checks and downloads any missing dependencies, builds the program, and launches it. (And if it doesn’t know any active mirrors it asks the prime node and is seeded with known mirror nodes in addition to getting its query answered.)

The packages are kept in a local cache stored at the user level, not the system level (sort of like how browsers keep their JS and page caches) — though if you want to daemonize zomp and run it as a permanent service (if you run a realm prime, for example) then you would want to create an unprivileged system user specifically for the purpose. If you specify a fully-qualified “realm-name-version” for execution and the packages already exist and are built, zx just launches the code directly (which is the majority case, so no delay there — fast startup).

All zomp nodes carry a complete index of their configured realms and can answer queries with very little overhead, but only the prime node has a copy of all the packages for that realm

 

Zomp realms are write-only. There is no facility for removing a package from a realm entirely, only for upgrading the versions of packages whenever necessary. (Removal is, of course, possible, but requires manual intervention by the sysop.)

When a zx client or zomp node asks an upstream node for a package and the upstream node does not have a copy it will query its upstream until the request reaches a node that does have a copy. Once found a “found” notice goes back down to the client telling it how many hops away the package is, and new “hops away” notices are sent as the package is passed downstream toward the original requestor (avoiding timeouts and allowing the user to get some feedback about what is going on). The package is cached at each node along the way, so subsequent requests for that same package will be handled immediately without any more relay downloading.

Because the tree of nodes is expected to be relatively ephemeral and in a constant state of flux, the tendency is for package stores on mirror nodes to be populated by only the latest, most popular packages. This prevents the annoying problem with old realms having gobs of packages that nobody uses but mirror hosts being burdened with maintaining them all anyway.

But why not just keep the latest of everything and ditch old packages?

Ever heard of “version shear”? Yeah. Me too. It sucks. That’s why.

There are no “up to” or “greater than” or “abstract version 3” type dependency declarations in zomp package metadata. As a package maintainer you must explicitly declare the complete version of each dependency in your system. In the case of diamond-shaped dependencies (where two packages in your system depend on slightly different versions of the same package) the burden is on the packagers to declare a version that works for a given release of that package. There are no dependency trees for this reason. If your package depends on X, and X depends on Y and Z then your package must be defined as depending on X, Y and Z — and fully specify the versions involved.

Semver is strictly enforced, by the way. That is, all release numbers are “Major.Minor.Patch”. And that’s it. No more, no less. This is one of the primary criteria for inclusion into a public realm and central to the way both zx and zomp interpret package semantics. If an upstream project has some other numbering scheme the packager will need to create a semver standard of his own. And actually, this turns out to not be very hard in practice. There is one weird side-effect of full, static dependency version declarations and semver: updating dependencies results in incrementing your package’s patch number, so even if you don’t change anything in a program for a long time, a program with many dependencies under heavy development may wind up on version 2.3.257 without much change other than the {deps, PackageIDs}. line in the package meta file.

zx helps make you aware of these situations, so solving them has not been particularly difficult in practice.

Why do things this way?

The “static dependencies forever and ever, amen” decision is a tradeoff between the important feature of fully repeatable builds Erlang releases are famous for (to the point of bug-compatibility between deployment sites — which is critical in production) and the flexibility users and developers have come to expect from source repository systems like pip, pypi, CPAN, etc. Because each realm is write-only there is no danger that a package will be superceded and disappear. The way trickle-down caching works for mirror zomp nodes does not unduly burden the subordinate realm mirrors, and the local caching behavior of zx itself at launch time tends to make all of this mostly delay-free for zx clients and still gives them the option to always run “latest available version” if they want.

And on the note of “latest version”…

Client-side programs are not expected to be run too terribly long at a time. People shut desktop programs down, restart computers, update their kernels, etc. So even if a client program runs a long time (on the order of web, email, IRC, certain games, crypto wallets/miners, torrent nodes, Freenode, Tor, etc) it will still have a chance to restart every few days or weeks to check for a new version (if invoked in a way that omits the version number so that it always queries the latest version).

But what about for long-running server-side type programs? When zx starts a script checks the initial environment and then starts the erlang runtime with zx as its target application, passing it the package ID of the desired program to run and its arguments as arguments. That last sentence was odd. An example is helpful:

zx run foo-bar arg1 arg2 arg3

zx invokes the launching script (a Bash script on Linux, BSD and OSX, a batch file on Windows — so actually the command is zx.bash or zx.cmd)  with the arguments run foo-bar arg1 arg2 arg3. zx receives the instruction “run” and then breaks “foo-bar” into {Realm, Name} = {"foo", "bar"}. Everything after that is passed in as strings which wind up being the input arguments to the program being run: “foo-bar”.

zx registers a process called zx_daemon which remains resident in the runtime and waits for a subscription request or zomp query. Any Erlang program written with the intention of being used with zx can send a message to zx_daemon and ask it to maintain a connection to the program’s parent realm and enroll for update notifications. If the target program itself is the subject of a realm index update then it will get a message letting it know what has changed. The program can respond any way the author wants to such a notification.

In this way it is possible to write a client-side or server-side application that can enroll to become aware of updates to itself without any extra infrastructure and a minimal amount of code. In some programs I’ve used this to cause a pop up notification to appear to desktop users so they know that a new version has become available and they should restart the program (the way Firefox does on Windows). It could also be used to initiate a restart on its own, or whatever else you might come up with.

There are several benefits to developers of using this system as well.

As a developer I can start a new project by doing zx init app [Realm-Name] or zx init lib [Realm-Name] in an existing project root directory and a zomp.meta file will be generated for it, or a new project template directory will be created (populated with a functioning sample skeleton project). I can do zx dailyze and zx will make sure a generally relevant PLT exists or is built (if not up to date) and used to check the typespecs of the project and its dependencies. zx create package [Path] will create a zomp package, sign it, and populate the metadata for it. zomp keygen will generate the kind of keys necessary to interact with a zomp server. zomp submit PackageFilePath will submit a package for review.

And so on.. It is a lot easier to do most things now, and that’s the main point.

(There are commands for reviewing, approving, or rejecting package submissions, adding packagers and maintainers to package projects, adding dependencies to projects, X.Y.Z version incrementing, etc. as well.)

This is about 90% of the way I want it to be, but that means about 90% of the effort remains (pessimistically assuming the 90/10 rule, because life sucks and nobody cares). Most of that is probably going to be finagling some network lunacy, but a lot of the effort is going to be in putting polish to it.

Zomp/zx is based on a similar project I wrote for use within Tsuriai a few years ago that has much sparser features but does basically the same thing: eases packaging and repeatable deployment from source to client systems. I would never release that version publicly because it has a lot of “works for me!” level functionality, but very little polish and requires manually diddling quite a few settings files in error-prone ways (which is fine because it was just us diddling them).

My intention here is to Cadillac this out a bit so that newcomers can slide into the new language and just focus on that language after learning a minimum of tooling commands or environmental details. I think zx init app foo-bar and zx runlocal are a low enough bar for entry.