Why OTP? Why “pure” and not “raw” Erlang?

I’ve been working on my little instructional project for the last few days and today finally got around to putting a very minimal, but working, chat system into the ErlMUD “scaffolding” code. (The commit with original comment is here. Note the date. By the time this post makes its way into Google things will probably be a lot different.)

I commented the commit on GitHub, but felt it was significant enough to reproduce here (lightly edited and linked). The state of the “raw Erlang” ErlMUD codebase as of this commit is significant because it clearly demonstrates the need for many Erlang community conventions, and even more significantly why OTP was written in the first place. Not only does it demonstrate the need for them, the non-trivial nature of the problem being handled has incidentally given rise to some very clear patterns which are already recognizable as proto-OTP usage patterns (without the important detail of having written any behaviors just yet). Here is the commit comment:

Originally chanman had been written to monitor, but not link or trap exits of channel processes [example]. At first glance this appears acceptable, after all the chanman doesn’t have any need to restart channels since they are supposed to die when they hit zero participants, and upon death the participant count winds up being zero.

But this assumes that the chanman itself will never die. This is always a faulty assumption. As a user it might be mildly inconvenient to suddenly be kicked from all channels, but it isn’t unusual for chat services to hiccup and it is easy to re-join whatever died. Resource exhaustion and an inconsistent channel registry is worse. If orphaned channels are left lying about the output of \list can never match reality, and identically named ones can be created in ways that don’t make sense. Even a trivial chat service with a tiny codebase like this can wind up with system partitions and inconsistent states (oh no!).

All channels crashing with the chanman might suck a little, but letting the server get to a corrupted state is unrecoverable without a restart. That requires taking the game and everything else down with it just because the chat service had a hiccup. This is totally unacceptable. Here we have one of the most important examples of why supervision trees matter: they create a direct chain of command, and enforce a no-orphan policy by annihilation. Notice that I have been writing “managers” not “supervisors” so far. This is to force me to (re)discover the utility of separating the concepts of process supervision and resource management (they are not the same thing, as we will see later).

Now that most of the “scaffolding” bits have been written in raw Erlang it is a good time to sit back and check out just how much repetitive code has been popping up all over the place. The repetitions aren’t resulting from some mandatory framework or environment boilerplate — I’m deliberately making an effort to write really “low level” Erlang, so low that there are no system or framework imposed patterns — they are resulting from the basic, natural fact that service workers form constellations of similarly defined processes and supervision trees provide one of the only known ways to guarantee fallback to a known state throughout the entire system without resorting to global restarts.

Another very important thing to notice is how inconsistent my off-the-cuff implementation of several of these patterns has been. Sometimes a loop has a single State variable that wraps the state of a service, sometimes bits are split out, sometimes it was one way to begin with and switched a few commits ago (especially once the argument list grew long enough to annoy me when typing). Some code_change/N functions have flipped back and forth along with this, and that required hand tweaking code that really could have been easier had every loop accepted a single wrapped State (or at least some standard structure that didn’t change every time I added something to the main loop without messing with code_change). Some places I start with a monitor and wind up with a link or vice versa, etc.

While the proper selection of OTP elements is more an art than a science in many cases, having commonly used components of a known utility already grouped together avoids the need for all this dancing about in code to figure out just what I want to do. I suppose the most damning point about all this is that none of the code I’ve been flip-flopping on has been essential to the actual problem I’m trying to solve. I didn’t set out to write a bunch of monitor or link or registry management code. The only message handler I care about is the one that sends a chat message to the right people. Very little of my code has been about solving that particular problem, and instead I consumed a few hours thinking through how I want the system to support itself, and spent very little time actually dealing with the problem I wanted to treat. Of course, writing this sort of thing without the help of any external libraries in any other language or environment I can think of would have been much more difficult, but the commit history today is a very strong case for making an effort to extract the common patterns used and isolate them from the actual problem solving bits.

The final thing to note is something I commented on a few commits ago, which is just how confusing tracing message passage can be when not using module interface functions. The send and receive locations are distant in the code, so checking for where things are sent from and where they are going to is a bit of a trick in the more complex cases (and fortunately none of this has been particularly complex, or I probably would have needed to write interface functions just to get anything done). One of the best things about using interface functions is the ability to glance at them for type information while working on other modules, use tools like Dialyzer (which we won’t get into we get into “pure Erlang” in v0.2), and easily grep or let Emacs or an IDE find calling sites for you. This is nearly impossible with pure ad hoc messaging. Ad hoc messaging is fine when writing a stub or two to test a concept, but anything beyond that starts getting very hard to keep track of, because the locations significant to the message protocol are both scattered about the code (seemingly at random) and can’t be defined by any typing tools.

I think this code proves three things:

  • Raw Erlang is amazingly quick for hacking things together that are more difficult to get right in other languages, even when writing the “robust” bits and scaffolding without the assistance of external libraries or applications. I wrote a robust chat system this afternoon that can be “hot” updated, from scratch, all by hand, with no framework code — that’s sort of amazing. But doing it sucked more than it needed to since I deliberately avoided adhering to most coding standards, but it was still possible and relatively quick. I wouldn’t want to have to maintain this two months from now, though — and that’s the real sticking point if you want to write production code.
  • Code convention recommendations from folks like Joe Armstrong (who actually does a good bit of by-hand, pure Erlang server writing — but is usually rather specific about how he does it), and standard set utilities like OTP exists for an obvious reason. Just look at the mess I’ve created!
  • Deployment clearly requires a better solution than this. We won’t touch on this issue for a while yet, but seriously, how in the hell would you automate deployment of a scattering of files like this?

3 thoughts on “Why OTP? Why “pure” and not “raw” Erlang?

  1. Found this and have been reading what you’ve written on ErlMUD. I was crushed when I found that it was an unfinished work (More the text than the actual implementation). I think what you’ve started is and would be the missing link for many fledgling Erlang converts such as myself. I.e. the next step after learning the syntax of the language. Existing books give very trivial examples and don’t really cover the development of project as you work here started. I encourage you to get back to this as I for one would be first in line to purchase the resulting book (assuming that was the original goal).

    1. Oh no! Sorry about that! You are correct. I have not finished ErlMUD, which is a real shame. The trouble there is a lack of time to commit to it (unpaid work) as opposed to any lack of interest. I would really like to finish it and get a few little MUDs going that we could all play around in (I’m still a huge fan of WoTMUD, just with very little time to actually play).

      I do have a few other smallish projects that are complete and have been written in the interest of providing a bit of a vaulting pole for people to be able to leap from the “I can do some things in escripts” to writing a full blown system that does useful work.

      “Erltris”, an Erlang implementation of Tetris.
      https://zxq9.com/archives/1882

      A telnet chat server from scratch
      https://zxq9.com/archives/1872

      There are a few more videos and discussions on my (not very exciting) video channels on YouTube and Rumble.
      YouTube (currently the most mature): https://www.youtube.com/@zxq9419
      Rumble channel I’ll be doing programming stuff on: https://rumble.com/c/c-4190498
      Rumble channel I post personal stuff on (but not all that often): https://rumble.com/c/zxq9

      Feel free to mine my GitLab for ideas as well: https://gitlab.com/zxq9

      Stay in touch and if you run into a weird spot and don’t know what to do, feel free to send me a message or email. I’m pretty active.

      1. Thank you for your reply. Much appreciated. Working through your Erltris videos now (on part 2). Excellent content – not to much into the weeds but giving some of the thought process behind what is being done. Looking forward to getting through all of them and actually going to start over with the first and try to implement as I go through it.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.