Zomp/zx: Yet Another Repository System

I’ve been working on a from-source repo system for Erlang on and off for the last few months, contributing time to it pretty much whenever real-life is not interfering. I’m getting close to making a release. Now that my main data bits are worked out, the rest isn’t all that hard. I need to figure out what I want to say in an announcement.

The problem is that I’m really horrible at announcements and this system does things in a pretty different way to other repository systems out there, so I’m not sure what things are going to be important about it to users (worth putting into an announcement) and what things are going to be important to only me because I’m the one who wrote it (and am therefore obsessed with its externally inconsequential internals). What is internally interesting about a project is almost never what is externally interesting about it. Marketing; QED. So I need to sort that out, and writing sometimes helps me sort that kind of thing out.

I’m making this deliberately half-baked, disorganized, over-long post public because Joe Armstrong gave me some food for thought the other day. I had written him my thoughts on a subject posted to a mailing list but sent the message in private. I made my message to him off-list for two reasons: first, I wasn’t comfortable with my way of expressing the idea just yet; and second, I am busy with real-life stuff and side projects, including the repo system, and don’t want to get sucked into online chatter that might amount to nothing more than bikeshedding. (I’m a world-class bikeshedder!) Joe wrote me back asking why I made the reply private, I told him my reasons, and he made me change my mind. He hopes that more people will publish their ideas all the time, good or bad, fully baked or still soggy — because that’s the only way we can ever find any other interesting ideas these days is by searching for them, usually in text, on the net somewhere. It isn’t like we can’t go back and revise, but whether or not we do go back and clean up our literary messes, the availability of core ideas and exposure of thought processes are more important than polish. He’s been on a big drive to make sure that he posts most of his thoughts to public mailing lists or blogs so that his ideas get at least indexed and archived. On reflection I agree with him.

So here I am, trying to publicly organize my thoughts on my repository system.

I should start with the goals of the system.

This system is intended to smooth over a few points of pain experienced when trying to get a new Erlang project off the ground, and in particular avert the path of pain peculiar to Erlang newcomers when they encounter the “how to set up a project” problem. Erlang’s tooling is great but a bit crufty (deeply featured, but confusing to interface with) and not at all what the kool kids expect these days. And anyway I’m really just trying to scratch my own itch here.

At the moment we have two de facto standards for publishing Erlang systems: erlang.mk and Rebar. I like both of these, especially erlang.mk, but they do one thing that annoys me and never seems to quite fit my need: they build Erlang releases.

Erlang releases are great. They cut all the cruft of a release out and pack everything needed to actually run a system into a single blob of digits that you can move, in a single shot, to a new target system — including the Erlang runtime itself. Awesome! Self-contained deployment and it never misses. This has been an Erlang feature since before people even realized that they needed repeatable deployment infrastructure outside of the classic “let’s build a monolithic, static binary executable” approach. (Erlang is perpetually ahead of its time, even by today’s standards. I look at the poor kids stubbing their toes with Docker and language du jour and just shake my head — though part of that is because many shops are using Docker to solve concurrency issues that they haven’t even become cognizant of, thinking that they are experiencing “scaling” problems but missing the point entirely.)

Erlang releases are awesome when the deployment target is an embedded system, but not so awesome if the target is a full-blown operating system, VM, container, or virtual environment fully stocked with gobs of memory and storage and flush with system utilities and resources. Erlang releases sort of kitchen-sink the deployment itself. What if you want to run several different Erlang programs, all delivered as releases, all depending on the same library? You’ve got tons of copies of that library. Which is OK, but still sort of weird, because you also have tons of copies of the runtime (among other things). Each release is self-contained and lean, but in aggregate this is a bit odd.

Erlang releases make sense when you’re deploying to a phone switch or a sensor device in the middle of nowhere and the runtime is basically acting as its own operating system. Erlang releases are, in that context, analogous to putting a Gentoo stage 3 binary image on a system to leapfrog most of the toolchain process. Very cool when you’re in that situation, but a bit tinker-tacky when you’re just trying to run, say, a client program written in Erlang or test a web front-end for something that uses YAWS or Cowboy.

So that’s the siloed-kitchen-sink issue. The other issue is that newcomers are perpetually confused about releases. This makes teaching elementary Erlang hard. In my view we should really focus on escript for beginner code — just let the new guy run something out of a single file the way he is used to doing when learning a new language instead of showing him pages of really slick code, then some interpreter stuff, and then leaping straight from that to a complex and advanced packaging setup necessarily tailored for conducting embedded deployments to slim hardware devices. Seriously. WTF. Escripts give beginners all the power of Erlang necessary for exploring the more interesting bits of code and refactoring needed to learn sequential Erlang with the major advantage of being able to interface with the system the same way programmers from other environments are used to dealing with langauge runtimes like Bash, AWK, Python, Ruby, Perl, etc.

But what about that gap between scripts and full-blown production deployments for embedded hardware?

Erlang has… nothing.

That’s right! There is no agreed-upon way to deploy or even run Erlang code in the same manner a Python coder would expect to execute a python program. There is no virtualenv type system, there is no standard answer to the question “if I’m in the project directory and type ./do_thingy it will just work, right?” The answer is always “Well, it depends…” and what actually winds up happening is that people either roll a whole release just to crank a trivial amount of code up or (quite often) implement an ad hoc way to get the same effect in a lighter-weight way. (erlang.mk shines here, actually.)

Erlang does provide a number of ways to make a system run locally from source of .beam files — and has actually quite reasonable built-in resources for this — but nothing has been built around these tools that also deals with external dependencies, argument passing in a standard way, or any of the other little things you really need if you want to claim a complete solution. Hence all the ad hoc solutions that “work on my machine” but certainly aren’t something you expect your users to use (not with broad success, anyway).

This wouldn’t be such a big problem if it weren’t for the fact that not having any standard way to “just run a program” also means that there really isn’t any standard way to deal with client side code in Erlang. This is a big annoyance for me because much of what I do is client-side code. In Erlang.

In fact, it totally boggles my mind that client-side Erlang isn’t more common, especially considering that AMD is already fielding zillion-core processors for desktops, yet most languages are fundamentally single-threaded. That doesn’t mean you can’t do concurrency and parallelism in other languages, but most problems are not parallel in nature to begin with (parallel problems are easy to write solutions to in any language) while most real-world problems are concurrent. But concurrent systems are hard to write in almost every language. Concurrent problems are the bulk of the interesting problems we’re still not very good at solving with computers. AMD is moving to make the tools available to make much more interesting concurrent processing tools available on the client side (which means Intel will soon start pouring it gajillions worth of blood diamond money into a similar effort), but most languages and environments have no good way to make use of that on the client side. (Do you see why I hear Lady Fortune knocking?)

Browsers? Oh yeah. That’s a great plan. Have you noticed that most sites slowly move toward the “Single Page App” design over time (read as: the web sucks, so now we write full-but-crippled client-programs and deliver them over the web), invest heavily in do-sneaky-things-without-telling-you JavaScript and try to hog every core your system has if you allow it the slightest permission to do so? No. In the age of bitcoin miners embedded in nearly every ad this is not the direction I think we should be envisioning things going.

I want to take better advantage of the cores users have available, and that doesn’t necessarily mean make more efficient use of every cycle as much as it means to make scheduling across processes more efficient to reduce latency throughout the system overall. That’s something users care about quite a lot. This is the problem Erlang has already solved in a way no other runtime out there has. So I want to capitalize on it.

And yet, there is still not standardish way of dealing with code from source, running it locally, declaring or resolving dependencies, or even launching a client-side program at all.

So… how am I approaching it?

I have a project called “zomp” which is a repository system. It is a distributed repository system, so not everything has to be held in one place. Code in the zomp universe is held in little semantic silos called “realms”. Each realm can have whatever packages the owner (sysop) wants it to have. Each realm must have one server node somewhere that is its “prime” — the node in charge of that realm. That node is where system operator tasks for that realm take place, packagers and maintainers submit code for inclusion, where the package index is built, where the canonical copy of everything is stored. Other nodes configured to see that realm connect to the prime node and receive a copy of the current indexes and are tested for availability and published as available resources for querying indexes or downloading packages.

When too many subordinate nodes connect to a prime the prime will redirect a new node to a subordinate, when a subordinate gets “full” of subordinates itself, it picks a subordinate for new redirects itself, etc. so each realm winds up forming a resource tree of mirror nodes that connect back to the realm prime by a single path. A single node might be prime for several realms, or other nodes may act as prime for different realms — and any node can be configured to become a part of any number of realm trees.

That’s the high-level code division.

The zomp constellation is interfaced with via the “zx” program (short for “zomp explorer”, or “zomp exchanger”, or “Zomp eXtreem!”, or homage to the Sinclair ZX-81, or whatever else might lend itself to the letters “zx” that you might want to make up — I actually forget what it originally stood for, but it is remarkably convenient to type so it’s staying that way)

zx is configured to have visibility on zomp realms the same way a zomp node is (in fact, they use the same configuration files and it isn’t weird to temporarily host a zomp node on your desktop the same way you might host a torrent node for a while — the only extra effort is that you do have to open a port, zomp doesn’t (yet) do hole punching magic).

You can tell zx to run a program using the highly counter-intuitive command:

zx run Realm-ProgramName[-Version]

It breaks the program name down into:

  • Realm (optional, defaulting to the main realm of public FOSS packages called “otpr”)
  • Name (necessary — sort of the whole point)
  • Version (which is optional and can also be partial: “1.0.3” vs just “1.0” or “1”, defaulting to the latest in a series or latest overall)

With those components it then contacts any zomp node it knows provides the needed realm, resolves the latest version number of the requested program, downloads and unpacks it, checks and downloads any missing dependencies, builds the program, and launches it. (And if it doesn’t know any active mirrors it asks the prime node and is seeded with known mirror nodes in addition to getting its query answered.)

The packages are kept in a local cache stored at the user level, not the system level (sort of like how browsers keep their JS and page caches) — though if you want to daemonize zomp and run it as a permanent service (if you run a realm prime, for example) then you would want to create an unprivileged system user specifically for the purpose. If you specify a fully-qualified “realm-name-version” for execution and the packages already exist and are built, zx just launches the code directly (which is the majority case, so no delay there — fast startup).

All zomp nodes carry a complete index of their configured realms and can answer queries with very little overhead, but only the prime node has a copy of all the packages for that realm

 

Zomp realms are write-only. There is no facility for removing a package from a realm entirely, only for upgrading the versions of packages whenever necessary. (Removal is, of course, possible, but requires manual intervention by the sysop.)

When a zx client or zomp node asks an upstream node for a package and the upstream node does not have a copy it will query its upstream until the request reaches a node that does have a copy. Once found a “found” notice goes back down to the client telling it how many hops away the package is, and new “hops away” notices are sent as the package is passed downstream toward the original requestor (avoiding timeouts and allowing the user to get some feedback about what is going on). The package is cached at each node along the way, so subsequent requests for that same package will be handled immediately without any more relay downloading.

Because the tree of nodes is expected to be relatively ephemeral and in a constant state of flux, the tendency is for package stores on mirror nodes to be populated by only the latest, most popular packages. This prevents the annoying problem with old realms having gobs of packages that nobody uses but mirror hosts being burdened with maintaining them all anyway.

But why not just keep the latest of everything and ditch old packages?

Ever heard of “version shear”? Yeah. Me too. It sucks. That’s why.

There are no “up to” or “greater than” or “abstract version 3” type dependency declarations in zomp package metadata. As a package maintainer you must explicitly declare the complete version of each dependency in your system. In the case of diamond-shaped dependencies (where two packages in your system depend on slightly different versions of the same package) the burden is on the packagers to declare a version that works for a given release of that package. There are no dependency trees for this reason. If your package depends on X, and X depends on Y and Z then your package must be defined as depending on X, Y and Z — and fully specify the versions involved.

Semver is strictly enforced, by the way. That is, all release numbers are “Major.Minor.Patch”. And that’s it. No more, no less. This is one of the primary criteria for inclusion into a public realm and central to the way both zx and zomp interpret package semantics. If an upstream project has some other numbering scheme the packager will need to create a semver standard of his own. And actually, this turns out to not be very hard in practice. There is one weird side-effect of full, static dependency version declarations and semver: updating dependencies results in incrementing your package’s patch number, so even if you don’t change anything in a program for a long time, a program with many dependencies under heavy development may wind up on version 2.3.257 without much change other than the {deps, PackageIDs}. line in the package meta file.

zx helps make you aware of these situations, so solving them has not been particularly difficult in practice.

Why do things this way?

The “static dependencies forever and ever, amen” decision is a tradeoff between the important feature of fully repeatable builds Erlang releases are famous for (to the point of bug-compatibility between deployment sites — which is critical in production) and the flexibility users and developers have come to expect from source repository systems like pip, pypi, CPAN, etc. Because each realm is write-only there is no danger that a package will be superceded and disappear. The way trickle-down caching works for mirror zomp nodes does not unduly burden the subordinate realm mirrors, and the local caching behavior of zx itself at launch time tends to make all of this mostly delay-free for zx clients and still gives them the option to always run “latest available version” if they want.

And on the note of “latest version”…

Client-side programs are not expected to be run too terribly long at a time. People shut desktop programs down, restart computers, update their kernels, etc. So even if a client program runs a long time (on the order of web, email, IRC, certain games, crypto wallets/miners, torrent nodes, Freenode, Tor, etc) it will still have a chance to restart every few days or weeks to check for a new version (if invoked in a way that omits the version number so that it always queries the latest version).

But what about for long-running server-side type programs? When zx starts a script checks the initial environment and then starts the erlang runtime with zx as its target application, passing it the package ID of the desired program to run and its arguments as arguments. That last sentence was odd. An example is helpful:

zx run foo-bar arg1 arg2 arg3

zx invokes the launching script (a Bash script on Linux, BSD and OSX, a batch file on Windows — so actually the command is zx.bash or zx.cmd)  with the arguments run foo-bar arg1 arg2 arg3. zx receives the instruction “run” and then breaks “foo-bar” into {Realm, Name} = {"foo", "bar"}. Everything after that is passed in as strings which wind up being the input arguments to the program being run: “foo-bar”.

zx registers a process called zx_daemon which remains resident in the runtime and waits for a subscription request or zomp query. Any Erlang program written with the intention of being used with zx can send a message to zx_daemon and ask it to maintain a connection to the program’s parent realm and enroll for update notifications. If the target program itself is the subject of a realm index update then it will get a message letting it know what has changed. The program can respond any way the author wants to such a notification.

In this way it is possible to write a client-side or server-side application that can enroll to become aware of updates to itself without any extra infrastructure and a minimal amount of code. In some programs I’ve used this to cause a pop up notification to appear to desktop users so they know that a new version has become available and they should restart the program (the way Firefox does on Windows). It could also be used to initiate a restart on its own, or whatever else you might come up with.

There are several benefits to developers of using this system as well.

As a developer I can start a new project by doing zx init app [Realm-Name] or zx init lib [Realm-Name] in an existing project root directory and a zomp.meta file will be generated for it, or a new project template directory will be created (populated with a functioning sample skeleton project). I can do zx dailyze and zx will make sure a generally relevant PLT exists or is built (if not up to date) and used to check the typespecs of the project and its dependencies. zx create package [Path] will create a zomp package, sign it, and populate the metadata for it. zomp keygen will generate the kind of keys necessary to interact with a zomp server. zomp submit PackageFilePath will submit a package for review.

And so on.. It is a lot easier to do most things now, and that’s the main point.

(There are commands for reviewing, approving, or rejecting package submissions, adding packagers and maintainers to package projects, adding dependencies to projects, X.Y.Z version incrementing, etc. as well.)

This is about 90% of the way I want it to be, but that means about 90% of the effort remains (pessimistically assuming the 90/10 rule, because life sucks and nobody cares). Most of that is probably going to be finagling some network lunacy, but a lot of the effort is going to be in putting polish to it.

Zomp/zx is based on a similar project I wrote for use within Tsuriai a few years ago that has much sparser features but does basically the same thing: eases packaging and repeatable deployment from source to client systems. I would never release that version publicly because it has a lot of “works for me!” level functionality, but very little polish and requires manually diddling quite a few settings files in error-prone ways (which is fine because it was just us diddling them).

My intention here is to Cadillac this out a bit so that newcomers can slide into the new language and just focus on that language after learning a minimum of tooling commands or environmental details. I think zx init app foo-bar and zx runlocal are a low enough bar for entry.

The Great Blockchain Race

There is a big hustle going on right now over blockchain-based systems, most notably digital cryptocurrencies. It is as if the public just became aware of the word “blockchain”, saw that Bitcoin posted some crazy value gains, and decided “Oh? It went up? That means it is going to be a safe bet that it will go up forever!” and just hopped in with both feet.

Despite blockchain’s inherent scalability problems…

Despite the totally insane energy cost behind every single transaction going forward…

This has, of course, attracted the attention of The Sneakies. The Sneakies are people who realize that running a confidence game on a single person is moderately difficult, but running one on a large population that doesn’t really have the time or interest to dig into the details is quite easy — especially if you have a piece of cake in one hand, and even easier if they are panicked about something at the same time. Fear and hope are a powerful combination when aligned.

Since about 2014 an interesting proliferation of digital currencies (most being cryptocurrencies, but some even being created by banking consortia — har har!) has occurred. Some try to attract attention by spreading FUD about Bitcoin (not that the things they say about Bitcoin are inaccurate, but the same criticisms usually apply to the newly proposed currency as well), some try to attract attention using a “proof-of-work” system analogous to the original Bitcoin algorithm (“Get in now on the ground-floor!”), some try to leverage pre-existing FUD about Trump or the Euro or whatever. Most use a subtle combination and target a specific demographic (Antifa sympathizers, Randite Objectivist libertarians, Neo-Commies, Neo-Nazis, retirees and other “near-deads”, veterans, even Neo-Pagans).

Catching a trend? This is how trends that become confidence scams start to look.

Are cryptocurrencies the future of lightweight value exchange? Yeah, probably something like that. But we already have something more concrete backed by violence: actual currencies that can be electronically divided, transferred and calculated at a much lower cost to energy.

So what will happen? The early miners are punching out now — because while the run has been great and Bitcoin & co will be worth more than $0 even after the market correction, nobody knows when the correction will come. Full disclosure: I’m holding some Bitcoin. Mostly stuff I mined a few years ago. The value is sort of preposterous at the moment. Will I cash in? Maybe — but who knows what sort of pain that might cause me with tax services? It might not even be worth it unless I’m prepared to be shady about things.

But the scammers are starting to cash in, and it won’t be too much longer before one of two things happen:

  1. Scary but predictable: The Bitcoin “whales” cash in and the market collapses, causing a race to the bottom (like a short-call on everyone who has been betting against the Yen, Dollar, Pound and gold)
  2. Crap your pants scary and unpredictable: A quantum breakthrough or algorithmic development makes the entire blockchain transparent and manipulable — POOF!

I’m not saying these sort of efforts are a bad idea, just that they are unrefined and this is unexplored territory.

Also, as a parting thought… Every piece of software used for running crypto wallets, miners, etc. right now is rushed into production with little to no validation or security testing whatsoever. Maybe that isn’t the best way to safeguard something many non-techhies are hoping to be The Next Big Thing. Many of these platforms require Oracle’s Java, for example, and cannot even run on IBM’s JVM or the OpenJDK. Maybe that’s also not a good plan. That’s like having all your eggs in one big basket inside another basket of baskets. Whoops.

18 U.S. Code § 793 – Gathering, transmitting or losing defense information

Quite a few high-profile instances of leaks, breaches, infractions, cracks and “extreme carelessness in the handling of” classified information have been in the news over the last few years, and while folks like to talk a lot of fluff about whether this or that instance was truly vile or truly virtuous, I’ve never actually seen anyone reference the underlying rules regarding defense information.

So here it is: 18 U.S. Code § 793

Cornell Law has the text posted here as well.

(Personal) Guidelines for Software Projects

A few guidelines for non-trivial, large projects you actually care about and want to maintain for more than a month or so.

1. Typespecs

Learn to use them. If you are writing a large, complex project in a language that doesn’t support this or have tooling for it then use a different language. Yes, it actually saves so much heartache that it is important enough to switch.

Why? Because for-real type checking can tell you, without the futility or religious interference of unit testing, whether or not your program is valid. A valid program is not necessarily a correct program, but an invalid program is necessarily an incorrect one. (Also, it is worth keeping in mind that classes are not types. There is a subtle, and critical, difference.)

2. Property testing, not unit testing

Don’t simply write a few “unit tests” and assume things work. They don’t. As Rich Hickey (the creator of Clojure) so aptly put it: “What is the one thing that is true about all bugs found in the wild? Every one of them passed all the tests!” It can be useful to engage in regression testing, but regression testing is a subset of integration testing and even crosses over with user testing (the ultimate of all) and project documentation and history management.

When you write code, it has bugs.

  • Some are syntactic: You forgot some ant poop somewhere (things like: : ; . ,), failed to close a brace or paren, or misspelled something.
  • Some are structural: You passed in a foo type but the function is defined as accepting bar (statistically this is the greatest category of compilable, invisible errors — reference point 1 above).
  • Some are scheduling and timing: You have races and deadlocks all over the place and never knew it because they don’t usually get triggered and are super complex to work out in your head.
  • Some are semantic: The program does precisely what you told it to do, but you told it to do the wrong thing (the most frequent place where protocol failures creep in).

You write every one of these kinds of bugs into your programs every time you write a non-trivial program. I can’t just tell you to knock it off and tighten your shot group because I do the same stuff because it is impossible to avoid! If you write all these stupid bugs into your programs, what do you think lurks in your hand-written test code? MORE BUGS!

So what do?

In the same way that we can write a type specification for a function (declare its domain and codomain, basically) we can also write a specification for the function’s valid inputs, and outputs and the expected rules the output should follow (its range and image, basically). This defines the properties of the function.

Neat-O. But what would we do with such a specification? Property declarations are like me explaining to you what a function does, but not how it manages to do it. To test whether our implementation of the function does the expected thing and lacks corner cases, however, we can use a property-based testing system to generate tests for us on the fly and run them to check whether the expected properties of the function hold true. Not only that, smart property based testing systems not only find bugs (values that are defined as valid but produce invalid results that violate the property specification) but can quite often home in on specific broken cases and give you a good indication what sorts of values are problematic. That is to say, a property-based testing engine equipped with good property definitions can locate the corner cases for you.

Why wouldn’t we do this by hand? Because typically unit tests cover a handful of most-common cases with their expected values and that’s about it. Property based testing is much less merciful and also much less prone to error because a property based tester will generate an endless stream of tests according to the provided properties and run them for as much CPU time as you’re willing to give for testing. You are never going to write millions of different test cases for your code. A property based testing engine will do precisely that if you give it the CPU time to do so. Compared to how testing is done in most projects this is like having nuclear power in the age of wooden stoves.

This is magical.

3. DO USER TESTING

When you release something that has worked for you so far, that’s about as much confidence as you should put in an alpha release. “Works for me!” are the bold last words of many an abandonned project.

Don’t be That Guy. Don’t release That Project as a final. Be clear its a beta or even alpha, and development is an ongoing thing, forever. Manage expectations, your users (paying or community) will reward you for being honest.

When you release a project understand that this is your beta period, even if you’re on a relatively mature version. In a sense all significant features go through their own little beta phase. This is true in part because you’ve no clue if power users are going to find a way to break it (they will) or if it will be instantly appreciated and adopted by the userbase (random gamble there). Whatever you think is important or intuitive might have never even occurred to them.

Power users are going to push the button the wrong way and don’t know how to deal. That’s actually a good thing if you maintain a relationship with your users, because you’re basically getting directions straight from the affected party about how to make your program better. This is important whether you’re doing community open source for some sweet Ego Points, or trying to feed the kids at your soul-crushing job.

No amount of unit testing (which we’ve already sort of debunked — write typespecs, don’t blindly churn out unit tests) or property testing (which is vastly superior to unit testing, but misses a lot of side-effecty issues, which are often the central purpose of your program) can catch everything. No amount of integration testing will uncover everything that is wrong with your program. None of these tests will tell you whether your program sucks to use and is reviled by users. But user testing will.

4. Don’t be afraid to change stuff

You have a version control system for your code. You use git. Or something. It doesn’t matter, though, because you have something that does version control for you and creating a new branch is painless. (Unless you’re not using a version control system… then you really need to start. You don’t have to submit to the dark cabal of Ruby hipsters that controls github, but you should at least be using git locally.)

If you have an idea try it out. It is probably a great idea in spirit but won’t be so great in reality until you’ve shaken a bit of the stupid, self-indulgent fantasy out of it. You can’t do that without exploring the idea in actual implementation and that sort of exploration requires hacking up your pristine project a bit until you discover exactly why, in mechanical terms, the Universe hates your idea. Once you know exactly why the Universe hates you and your ideas you can adjust your plan to accommodate the whims of the math gods, tame the vagaries of digital magika, and tap out the proper incantations in much less time than you could had you just held endless meetings about it.

Break stuff. Remember the Cardinal Rule of Hacking:

“If you understand what you’re doing, you’re not learning anything.”
– Some guy (who was not actually Abraham Lincoln)

Sometimes the best sign of progress is a change in the error messages you are getting.

Simplicity follows complexity. Until you write a godawful fugly version of your solution you don’t really understand the problem. If you don’t fully grok the problem how can you ever hope to come up with a solution? Only after you have encountered all the little gotchas that made the code ugly in the first place are you ready to rewrite that steaming pile of (working) poo into an elegant solution that is almost guaranteed to have fewer bugs if for no other reason than increased transparency and better organization of the code.

(But note that you could stick with the ugly version for a bit in a pinch — so not all is lost. Getting something working at all is better than having a bunch of great ideas that don’t exist in reality.)

5. Don’t be afraid of new languages

At this point in my life I’ve written code in about 30 or 40 languages. I don’t know the exact number. I have written a lot of code and gained intimacy with about 10 of those. That’s a lot of languages by some standards and not many at all by others. It is enough, though, that I have come to realize that most languages are minor syntactic variations on a couple of basic paradigms, and really none of that crap matters too much.

It’s all shitty. All languages suck. Some suck a little less than others. Try to find one from the handful that sucks dramatically less than others in a specific domain, then get comfortable with it as a go-to tool for that domain. But remember that it is just a tool. Jackhammers are tools, but I don’t see anyone building houses with them.

When you hop on to a new project that someone is already working on you’re going to have to pretty much adhere to the rules of their house, and that means dealing with whatever annoying language they wrote their awesome project in.

Want to hack on Freenode‘s core implementation? Better not mind dealing with network code and file operations in Java (eek!). And what if you don’t even know Java or it has been years since you saw it last and everything is different now? This is the concern that should worry you the least of all.

If you squint a little projects basically are languages. They have their own semantics (the project libs, its functions, it type specification, its class definitions, its decision tables, its… whatever its got that is relevant). They have their own sort of syntax. In fact, every very large project I’ve ever worked on tended to actually follow Greenspun’s Tenth Rule and if it was a concurrent system (so common today) they even tend to follow Virding’s First Rule. (That becomes less of a joke and more of a law of nature the longer you do this and the more you know about both lisp and OTP.)

What does this mean? It means that learning the language a program is written in is the easy part. Learning the libs of that language tend to take about twice as long as learning the language itself. Learning the internals of a large project, however, tend to take about ten times longer than that. So where is the real cost in effort here? It isn’t in the adoption of a new language. It is in the adoption of a new project because every project is a tarbaby.

6. JUST OPEN YOUR EDITOR YOU PROCRASTINATING SACK OF POO!

Getting started is the hardest part of writing anything, whether prose, code, or poetry is sitting down and typing out something.

How to tackle the procrastination problem? Easier said than done: OPEN YOUR EDITOR

3 to 5 letters is all you need: `vim` or `emacs` and away you go!

Once you’re fully in the Matrix, write a function or spec or something. It doesn’t matter what you try to do: it will be wrong. And then you’ll have been wrong, but not exhausted yet. And suddenly you’ll realize that you are the one being wrong on the internet today and that situation just cannot stand. So you’ll start fixing it. And tinkering on it. And before you know it you’ll actually have some something productive, the curse of social media will be temporarily suspended, and you’ll finally stop feeling so crap about yourself (for a few minutes, anyway).

Erlang: Naive Matrix Multiplication

Someone asked what was surely a homework question today on StackOverflow about matrix multiplication in Erlang. I set out to answer him in as simple a way as possible, but wound up writing a naive matrix generation and multiplication module.

The code to the module might be of interest to new Erlangers, as it adheres both to the style of zuuid and includes many examples of using a combination of list operations and explicit recursion to cut clutter and make the meaning of otherwise complex operations clear.

Here is the code:

%%% @doc
%%% A naive matrix generation, rotation and multiplication module.
%%% It doesn't concern itself with much checking, so input dimensions must be known
%%% prior to calling any of these functions lest you receive some weird results back,
%%% as most of these functions do not crash on input that go against the rules of
%%% matrix multiplication.
%%%
%%% All functions crash on obviously bad values.
%%% @end 

-module(naive_matrix).
-export([random/2, random/3, rotate/1, multiply/2]).

-type matrix() :: [[number()]].


-spec random(Size, MaxValue) -> Matrix
    when Size     :: pos_integer(),
         MaxValue :: pos_integer(),
         Matrix   :: matrix().
%% @doc
%% Generate a square matrix of dimensions {Size, Size} populated with random
%% integer values inclusive of 1..MaxValue.

random(Size, MaxValue) when Size > 0, MaxValue > 0 ->
    random(Size, Size, MaxValue).


-spec random(X, Y, MaxValue) -> Matrix
    when X        :: pos_integer(),
         Y        :: pos_integer(),
         MaxValue :: pos_integer(),
         Matrix   :: matrix().
%% @doc
%% Generate a matrix of dimensions {X, Y} populated with random integer values
%% inclusive 1..MaxValue.

random(X, Y, MaxValue) when X > 0, Y > 0, MaxValue > 0 ->
    Columns = lists:duplicate(X, []),
    Populate = fun(Col) -> row(Y, MaxValue, Col) end,
    lists:map(Populate, Columns).


-spec row(Size, MaxValue, Acc) -> NewAcc
    when Size     :: non_neg_integer(),
         MaxValue :: pos_integer(),
         Acc      :: [pos_integer()],
         NewAcc   :: [pos_integer()].
%% @private
%% Generate a single row of random integers.

row(0, _, Acc) ->
    Acc;
row(Size, MaxValue, Acc) ->
    row(Size - 1, MaxValue, [rand:uniform(MaxValue) | Acc]).


-spec rotate(matrix()) -> matrix().
%% @doc
%% Takes a matrix of {X, Y} size and rotates it left, returning a matrix of {Y, X} size.

rotate(Matrix) ->
    rotate(Matrix, [], [], []).


-spec rotate(Matrix, Rem, Current, Acc) -> Rotated
    when Matrix  :: matrix(),
         Rem     :: [[number()]],
         Current :: [number()],
         Acc     :: matrix(),
         Rotated :: matrix().
%% @private
%% Iterates doubly over a matrix, packing the diminished remainder into Rem and
%% packing the current row into Current. This is naive, in that it assumes an
%% even matrix of dimentions {X, Y}, and will return one of dimentions {Y, X}
%% based on the length of the first row, regardless whether the input was actually
%% even.

rotate([[] | _], [], [], Acc) ->
    Acc;
rotate([], Rem, Current, Acc) ->
    NewRem = lists:reverse(Rem),
    NewCurrent = lists:reverse(Current),
    rotate(NewRem, [], [], [NewCurrent | Acc]);
rotate([[V | Vs] | Rows], Rem, Current, Acc) ->
    rotate(Rows, [Vs | Rem], [V | Current], Acc).


-spec multiply(ValueA, ValueB) -> Product
    when ValueA  :: number() | matrix(),
         ValueB  :: number() | matrix(),
         Product :: number() | matrix().
%% @doc
%% Accept any legal combination of scalar and matrix values to be multiplied.
%% The correct operation will be chosen based on input values.

multiply(A, B) when is_number(A), is_number(B) ->
    A * B;
multiply(A, B) when is_number(A), is_list(B) ->
    multiply_scalar(A, B);
multiply(A, B) when is_list(A), is_list(B) ->
    multiply_matrix(A, B).


-spec multiply_scalar(A, B) -> Product
    when A       :: number(),
         B       :: matrix(),
         Product :: matrix().
%% @private
%% Simple scalar multiplication of a matrix.

multiply_scalar(A, B) ->
    multiply_scalar(A, B, []).


-spec multiply_scalar(A, B, Acc) -> Product
    when A       :: number(),
         B       :: matrix(),
         Acc     :: matrix(),
         Product :: matrix().
%% @private
%% Scalar multiplication is implemented here as an explicit recursion over
%% a list of lists, each element of which is subjected to a map operation.

multiply_scalar(A, [B | Bs], Acc) ->
    Row = lists:map(fun(N) -> A * N end, B),
    multiply_scalar(A, Bs, [Row | Acc]);
multiply_scalar(_, [], Acc) ->
    lists:reverse(Acc).


-spec multiply_matrix(A, B) -> Product
    when A       :: matrix(),
         B       :: matrix(),
         Product :: matrix().
%% @doc
%% Multiply two matrices together according to the matrix multiplication rules.
%% This function does not check that the inputs are actually proper (regular)
%% matrices, but does check that the input row/column lengths are compatible.

multiply_matrix(A = [R | _], B) when length(R) == length(B) ->
    multiply_matrix(A, rotate(B), []).


-spec multiply_matrix(A, B, Acc) -> Product
    when A       :: matrix(),
         B       :: matrix(),
         Acc     :: matrix(),
         Product :: matrix().
%% @private
%% Iterate a row multiplication operation of each row of A over matrix B until
%% A is exhausted.

multiply_matrix([A | As], B, Acc) ->
    Prod = multiply_row(A, B, []),
    multiply_matrix(As, B, [Prod | Acc]);
multiply_matrix([], _, Acc) ->
    lists:reverse(Acc).


-spec multiply_row(Row, B, Acc) -> Product
    when Row     :: [number()],
         B       :: matrix(),
         Acc     :: [number()],
         Product :: [number()].
%% @private
%% Multiply each row of matrix B by the input Row, returning the list of resulting sums.

multiply_row(Row, [B | Bs], Acc) ->
    ZipProd = lists:zipwith(fun(X, Y) -> X * Y end, Row, B),
    Sum = lists:sum(ZipProd),
    multiply_row(Row, Bs, [Sum | Acc]);
multiply_row(_, [], Acc) ->
    Acc.

Hopefully reading that on a blog won’t drive anyone too nuts. I’ll probably include an expanded version of that (or something related) in a convenience library eventually. Unless I forget. Meh.

Web Designers: Stop making SPAs for inherently web 1.0 style sites

It is 2017. What’s with the drive to make everything an SPA whether it needs to be or not? This is getting a little ridiculous. I’m going to ramble on below a bit because I’ve got a hankering to do so — pay this no mind.

All around the web I see sites that are best represented as a collection of inter-linked documents, and all around the web I see many of those being changed into single-page application (SPAs). Even more stupid is when the SPA in question was built by some naive dope who included a little bit of almost every JS framework in existence — including a random selection from the thousands of obsolete and dead ones.

What is the goal? What’s the deal? Do web authors today not know how the web was actually intended to work originally? That document publication is actually its reason for existence in the first place and that “web applications” are a new thing that is a backhack to an incomplete standard that only sorta-kinda-works?

Granted, the reason it only sorta-kinda-works is due mostly to the problems inherent in the fact that only a single language is allowed in scripts… which is ridiculous. Was nobody paying attention to the Guile2 approach all those years? The only lesson learned from the Java applet and Flash experience seems to have been that “it sucks to force users to install runtimes as plugins”. Ugh.

Anyway, back to web applications…

I get it. For the moment we don’t have a solid distinction between “a document browser” and “an application browser” so we are stuck with this insufficient worst-of-both-worlds nether region of “applications that masquerade as documents”. And that drives anyone nuts who has given this much thought.

Not that a lot of people have considered the difference deeply. I imagine that is probably because very few new coders today have ever written more than a line or two of code intended to run natively on a user’s local system. Nearly everyone has written thousands of lines of code intended to run natively on server-side systems, but even that is getting wonky because many youngsters today don’t know how to deploy without using Docker yet lack the faintest inkling as to what problems Docker actually is intended to solve and wind up bypassing better solutions when they exist.

Tools shine when they are used in a focused way, performing they job for which they were intended. The web is the same way. Yes, it is a big jumble of crap. So let’s just leave that there. Networks are a big jumble of crap, too, and so are human societies — so we’ve adopted dirty ways of dealing with the dirt. The jumbly pile of shit that is the web is one of our ways of dealing with that. Everything times out. Everything is sent in text. Protocols are bloated and redundant. There isn’t even a proper definition of what “valid” HTML and XML and JSON and whatever else is in most cases. Its all racing toward a singularity where everything is uniformly stupid. But… whatever, it sort of kind of still works — and humans just barely work themselves, so that’s par for the course.

The original web was designed to function as an insecure document publication system where documents could be interlinked. We realized that we could include more interesting stuff by expanding the definition of “document” to include more than just text, and quite recently with HTML5 the way in which documents can be written is only a few orders of magnitude behind, say, LaTeX, in its ability to arrange things on the screen (that’s feature lag is not entirely the fault of the HTML5 authors).

This gives a lot of freedom to website authors — perhaps too much.

If a website is a set of news articles or academic papers (or even tweets) then you really don’t need a SPA, you need a more traditional sort of “web site”. It can be dressed up all pretty with shiny things sprinkled around, of course, but we don’t want a SPA that mysteriously changes state in ways that users cannot bookmark things, can’t easily send links to one another to specific resources (something Twitter got right despite some initial confusion over how to frame their content), etc.

If a website is actually just a delivery front end for a graphical RPG, well, obviously the game part of the site is probably best designed as a SPA, but the rest of the site — the forums, armory, character pages, beastiary, fan wiki, manual, guild rankings, lore pages, etc. — are absolutely best presented outside of that SPA as an actual website.

See the difference?

The game example is actually quite useful to contemplate for a variety of reasons. I’ll probably come back and cut this post down to just that part. Either that or eventually come back and rewrite the first bits to more accurately convey the humor with which I, as a graybeard resident in cyberspace for about 30 years now, view the state of the web today.

Whatever you do, dear reader, have fun coding, and remember: Don’t outsmart yourself!

Las Vegas shooting prediction: Most casualties were not due to gunshot wounds

Looking over the data for large stampedes and crowd crush events at concerts and sporting events, and comparing this to what I know personally from a career spent mostly handling various weapons in a tactical environment, I expect that we will discover fairly soon that the vast majority of casualties during the Las Vegas shooting — both injuries and fatalities — were actually due to stampede, and not anything to do with gunshot wounds at all.

Of course, in the confusion this issue has become politicized to an absolutely ridiculous degree by various anti-gun factions, and much of the US and European media is loathe to report anything other than anti-gun statistics for the moment, so we are seeing language tailored to evoke images of hundreds of people with actual gunshot wounds and zero people with stampede injuries.

For example: “Shooter in Las Vegas [blah blah blah] over 500 wounded.” This makes the reader or listener immediately envision 500 people actually wounded, as in due to violent trauma — and deliberate violent trauma at that. Which in this case would be exclusively due to gunshot wounds. But we have never seen a breakdown of causes of bodily harm by type, and this data will take a while to assemble.

By the time we do see these stats most people will not really be interested because immigration in Europe or stubborn people in Madrid/Barcelona or NFL SJW activity or whatever else will steal the spotlight and public attention before then. In other words, people will be distracted with another issue-of-the-day by then and forget that the new factoids they see relate to a previous event they felt very strongly about at the time it occurred.

Watch for this one.

Asian Governments Making Social Moves Together

I expect Asian governments to manifest a low-key but characteristically firm and absolute (and often official) position against Islam. Actually, I don’t expect it, I’m watching it happen and just now recognizing a fairly uniform trend. Something is going on in Asia with regard to this, and I don’t know quite what it is, but there is no doubt that doors are closing all across Asia for Muslims in general.

I think the timing is not a coincidence — the nature of Islamic threats are changing, becoming more diffuse, and taking on a different character just as a new generation of indoctrination is beginning across the West and Asia.

  • Myanmar has found something much more compelling than mere domestic political expediency to engage in its current operations (ISIS returners, as are turning up in Malaysia, Indonesia and the Philippines, is one possibility).
  • China has begun confiscating the Koran and categorized it as a book containing extremist political sentiment.
  • Thailand is readying a firm move against the southern Muslim rebels — and at the same time ISIS returners are very effectively influencing the young generation throughout the old Pattani region.
  • Saudis and other donors are standing up madrasas throughout Malaysia and Indonesia, and the Malaysian government is both unable to stop the trend while at the same time higher-ups in Putrajaya are strangely blind to the problem while also complaining about it.
  • The Philippines is obviously on a “you’re with us or against us” path politically and socially. And a certain of portion of the younger Muslim generation today is much more willing to take that as a challenge instead of an offer to pledge fealty (or at least negotiate terms).
  • Japanese are, at least anecdotally, becoming increasingly uneasy with the idea of accepting any Muslims, even as guest workers. The striking thing there is that ten years ago (well after 9/11) the topic of religion would never have been mentioned discussing this issue socially, but now it is brought up. This change over the last year or two coincides with the first mosque in Kyoto trying to promote itself via online ads and Japanese demonstrating an instant and strong aversion to the very concept of proselytization. They are now in “wait and see” mode socially — to watch and see how things turn out in Europe.
  • South Koreans seem to be on the same page as the Japanese — the attitude toward Islam having soured considerably over the last five years or so. Once again, this is anecdotal, but the subject has come up more than once, and many South Koreans keep up with news of attacks in France, Sweden and the UK.
  • Indonesia is seeing the rise of extra-judicial Islamic enforcement gangs.
  • Malaysia is seeing a similar rise in extra-judicial Islamic enforcement gangs, but the effect is somewhat muted by considerable repression by the special police and more active engagement with the group leaders.
  • Returners, returners, returners. ISIS veterans are flooding into various part of Asia, fresh off a tour in Syria, North Africa, Iraq or Afghanistan with ISIS and keeping in touch with one another. Of course, nobody feels comfortable with that. Unlike in Europe, though, well-known jihadis are not left to their own devices and most go missing somewhere in transit — but it is clear and evident that many are still returning and building new lines of communication and influence locally.

Any one of these issues, from official government actions to simple social reactions, would be grounds for certain groups to rally large responses — Islamic groups as well as Western-based political groups with strong anti-Asian nationalist agendas (something I’ve always found very odd). But the only thing making the news is Myanmar right now, and that’s a pretty hopeless fight to try to pick in terms of political pressure. Myanmar is about as pliable as North Korea as long as China is on their side, and China is indeed on their side with regard to this detail.

I do not see a future where Asian governments will feel compelled to do anything other than increase their resistance to an increased domestic Muslim presence. I fully expect that religious questions will be incorporated on visa applications to places like China eventually (not that repression of religion is anything new there).

I have no idea how any of this is going to turn out, but I find this trend notable and the timing troubling. I don’t know exactly what is triggering this much activity just now (why not a decade ago?), but something is clearly going on. It could be the outcome of some government assessments, or simply a change in the domestic social outlook, or both — but something is going on with this. And, of course, it is impossible to say “they are wrong”. It is just what they are doing and I’m just pointing it out.

Erlang: Silly way to see if your shell supports VT100 commands

There are a few cases where it can be useful to use VT100 terminal commands in shell interaction scripts to draw frames, progressbars, menu lines, position the cursor, clear the screen, colorize text, etc.

I actually have a small library of utilities like this I might eventually release, but its a pretty niche need.

Anyway, within that niche need, here is a really silly way to see if the terminal you run your shell in supports VT100 commands. (If you’re on Linux using pretty much any prepackaged terminal then your terminal supports VT100 commands, but that is not always so true on Windows, depending on how you are accessing your shell.) Paste the following into your shell:

Z =
  fun() ->
    {ok, S} = gen_tcp:connect("towel.blinkenlights.nl", 23, []),
    ok = gen_tcp:send(S, "\r\n"),
    Q =
      fun R() ->
        receive
          {tcp, S, B} ->
            ok = io:format("~ts", [B]),
            R();
          {tcp_closed, S} ->
            done
          after 60000 ->
            ok = gen_tcp:close(S),
            timeout
        end
      end,
    Q()
  end.

And then do Z().

(I remember seeing this first years ago and had forgotten it was even a thing! Sysop excuses is still live at port 666 as well, btw…)

Erlang: Converting text strings to Erlang terms

We all love file:consult/1 and are familiar now with its inverse function. And of course everyone knows how comfortable it is to use the BIFs term_to_binary/1 and binary_to_term/1,2 to communicate over the network between nodes and even among other networked thingies written in other programming languages using BERT-RPC.

But we still have a gap.

There is not a very well known way to convert a text string that represents Erlang terms directly into a list of actual Erlang terms without writing to a file first and then calling file:consult/1. Most of the time you will never have this problem. But when you do encounter this problem it can be mighty annoying to figure out the steps to convert the string or binary to internal Erlang terms (to the point that I sometimes see people actually write to a temporary file just so they can then call file:consult/1 and then delete the file).

So, let’s take a look:

scan_binary(Bin) ->
    TermString = binary_to_list(Bin),
    scan_string(TermString).

scan_string(TermString) ->
    {_, Strings} = lists:foldl(fun break_terms/2, {"", []}, TermString),
    Tokens = [T || {ok, T, _} <- lists:map(fun erl_scan:string/1, Strings)],
    AbsForms = [A || {ok, A} <- lists:map(fun erl_parse:parse_exprs/1, Tokens)],
    [V || {value, V, _} <- lists:map(fun eval_terms/1, AbsForms)].

break_terms($., {String, Lines}) ->
    Line = lists:reverse([$. | String]),
    {"", [Line | Lines]};
break_terms(Char, {String, Lines}) ->
    {[Char | String], Lines}.

eval_terms(Abstract) ->
    erl_eval:exprs(Abstract, erl_eval:new_bindings()).

You’ll notice that I did not simply use string:lexemes(TermString, [$.]) (the successor to the now obsolete string:tokens/2) to break the original into discrete strings. That is because each string requires a period at the end or else erl_scan:string/1 will reject it. It is dramatically more efficient to run through the string a single time breaking at the periods and adding them back than traversing it once to break it into segments, then traversing every resulting string again just to add a period at the end (which also means an extra traversal of the list of that list to make the adjustments!).

Everything in that happens in scan_string/1 can, of course, crash if there is anything wrong in the input. If used as-is it should probably be run inside of a try..catch clause (and you should almost never, ever be using try..catch in Erlang to begin with, but this is one of the very few cases it is probably a good idea to). That could be accomplished by wrapping it in a non-insane function such as:

-spec maybe_scan(String) -> Outcome
    when String  :: string(),
         Outcome :: {ok, [term()]}
                  | {error, Reason :: term()}.

maybe_scan(String) ->
    try
        Terms = scan_string(String),
        {ok, Terms}
    catch
        error:Reason -> {error, Reason}
    end.

You’ll notice that I have a specific scan_binary/1 and a scan_string/1 also. I haven’t played around with this enough yet to feel comfortable throwing a full-blown io_list() at this, so my assumption is that you’re either reading data in from a file and will have a binary to start with, or would have a string that arrives or is constructed somewhere internally and know that you should flatten it yourself before calling scan_string/1 or maybe_scan/1.

How did I arrive at this?

The larger problem I have had to solve just now is unpacking and reading in configuration data from a large number of tar archives that I receive over the wire. While I could unpack them to disk, then read the file I want with file:consult/1, it is dramatically faster to unpack only the file I wanted from the archive in memory (as the archive itself has never been written to disk anyway), and that leaves me with a binary string of the file contents, but nothing on which I can call file:consult/1. Dhoh!

My solution to that problem was the above. This function has done its work now and I don’t need it anymore, but it strikes me as not such a crazy situation for other programmers to run into at some point so I’m leaving this here for my future self. I’ll probably include this function in a future version of a convenience library, and at that point I will either refactor it to break down all the possible error returns in a proper way (crash reports from within list operations inside list comprehensions can be mysterious), or decide that the details of an error from, say, erl_scan would be more confusing than its worth and instead provide a more generic return from some interface like maybe_scan/1.