ErlMUD Commentary

It's all scaffolding

The game we want to write is one thing, but just defining the game isn't enough. It must be written, debugged, checked for impossible situations (especially type mismatches), tested, built, distributed, executed, initialized by the host, given necessary resources, contacted by users, and about a hundred other things. None of those things are the game, though. The details of how we do those things are incidental to the fact we are writing a game in Erlang and we hope to be able to maintain it through an upgrade lifecycle (this isn't an exercise in abandonware!).

The essential nature of what we are doing is writing a game platform, incidentally it must be written in some language and run on some platform. This remains true regardless what language or platform we choose. Incidentally, we chose Erlang/OTP. The language is Erlang and the platform is a suite of facilities OTP provides atop the Erlang VM.

All that incidental stuff is central to what Erlang/OTP exists to help you do (especially the OTP part), but none of it is inherently game-related. This is all just scaffolding, and there are about as many ways to do it as there are programs. We're going to do things the naïve, raw Erlang way first and gradually migrate to the OTP way. The raw Erlang way is initially very simple and easy, and fast to hack experimental ideas with. It is also a ton of fun; something about writing processes that exist in an abstract universe of messages and isolated slices of time is just exciting. Very cool.

The problem we will encounter later is that handling the bazillion annoying things one encounters in production become tedious to deal with in raw Erlang. We would wind up basically re-writing OTP if we stick with raw Erlang to the end (but it would probably turn out less good, with no support team behind it, and not understood by the majority of Erlangers who expect OTP idioms). There is a natural transition point where the programmatic complexity of handling the details of production issues in raw Erlang surpasses the mental overhead of learning something like OTP that manages those things for you. Even before that, though, there is a transition point in code complexity where common practices that define "pure" Erlang (as distinct from ad hoc "raw" Erlang) become necessary to adopt or reinvent, lest Complexity Bear wrestle us to the ground and tap us out.

The previous two chapters discussed the initial design in terms of the game and its elements, but not in terms of an Erlang program and system components. In this chapter we will discuss the pieces necessary to interact with users and environment (things outside the Erlang world), implement a chat system (things within the Erlang world), and get our system started as a very simple Erlang program (get the Erlang world to go in the first place). Of all the parts of the code, the ones here are likely to undergo the most genuine structural change as ErlMUD matures, particularly the details of source organization and project structure.

Network interface

MUDs traditionally work over telnet and do not require any special client-side software. We will follow suit in ErlMUD, at least for a while. Using TCP sockets in Erlang is so simple that it is tempting to break with tradition and ditch telnet, develop a new TCP protocol and write our own client-server suite. But we're not going to do that, not right now. Telnet is easy, and easy is a good place to start.

I'm pretty sure I can get the main MUD game elements to work without much fuss. I'm also pretty sure that if I mess around a bit I can write a telnet server module (should be a lot easier than it was in C and Pascal!). Writing the game bits and a telnet module is already a significant amount of work, and both are things I've never tried in Erlang. I don't want to add development of a new network protocol, a whole new application (the client), a client installer or package(s), and a distribution method to the TODO list until we have something worth connecting to. We're not even into the code yet. Remember, YAGNI.

In the next chapter we will look at some basic network code that implements a question-response service over a telnet connection. Then we'll figure out a way to make it fit within a very basic server. For now we need to recognize that the networking code cannot go into the main program loop and it can't go inside the controllers, either. (You might think "duh!" but its amazing how often such structures occur in the wild.) That means we need to put it someplace on its own and think about it as a subordinate component in the overall system.

So how should the network module work? It must listen for and accept connections over TCP. To handle several connections concurrently it must spawn a connection handler per connection and go back to waiting for new connections. The handlers should talk to controllers after doing some initialization or authentication stuff ("controller" as in last chapter's mob controllers). That's a pretty straightforward concept, abstract enough that it represents the way we would want pretty much any TCP network interface to work.

It is up to us to establish what the controller and connection will say to one another and use that as a template for any future connection-controller communication. If we step outside telnet someday all we have to do is make sure the shiny new Foo connection handler speaks the same protocol to the controller and we'll be fine.

We will need the following networking bits:

Not so bad. We will also have to figure out a way to plug this into the system without tying ourselves to telnet forever, but we can put that worry to sleep for a while by wrapping it in a warm, fuzzy function and singing it the abstraction lullaby.

Telnet listener

The listener should be pretty simple. Just like a TCP listener in any other environment, it should grab a socket on a designated TCP port and listen (the telnet default is 23, but many MUDs run on custom ports). Once a new connection is made the listener should spawn another process to handle the connection and let it take control of whatever the new session's port is. Then it should loop back to listening and wait for another connection. That's it.

If you've never written network code before, this is the essence of what TCP servers do, with all the gritty details in the middle cut out. If you're interested to see what goes on down below in C, Beej's Guide to Networking is a decent primer for application and server developers. (Kernel and hardware networking code can get pretty insane, though.)

Connection handler

The connection handler will be slightly more complicated than the listener. When the listener receives a new client it will establish a new TCP connection on an ephemeral port, spawn a connection handler and pass it the connection port. Once the connection handler has been spawned it should initiate whatever the authentication procedure is, send the initial welcome/login message to the client, and then wait for data.

When the handler receives messages that represent TCP data it will put it into an aggregate collection, check if it was the end of a complete transmission and loop back to wait for more. If the data received indicates it is the end of a complete transmission (not the end of the session) it will have to assemble the data the client sent from the message aggregate and then process the received data before going back to check for more. If it receives an indication the TCP connection has ended it should close the port and kill itself.

Telnet defines a way of identifying when a transmission is complete and TCP defines a way of telling us a connection has ended. Lucky for us, the Erlang standard library has a gen_tcp module that makes dealing with the TCP parts really easy. It is up to us to make sure we write the telnet bits so they behave the way telnet clients expect. Fortunately telnet is also pretty simple.

I hand-waved the "process the data" step above. This is where it can get a bit complicated. We have to decide if the handler will pass each raw transmission's payload off to another process for interpretation (parsing, tokenizing, figuring out the meaning of whatever the user typed), or call interpretation functions directly and send messages to the controller process. I'm inclined to call interpretation functions from within the handler directly, because we want to block the connection (TCP data will queue, so this is no big deal) if processing takes some time. Blocking like that prevents the incoming messages from getting out of phase with whatever is being processed, also, and that's a good thing. I'll probably split the telnet code bits from the MUD-specific bits (and wind up with independent MUD-connection and telnet libraries) but if the details are properly hidden behind the "network interface" abstraction I won't take an inordinate amount of time worrying about that early on.

Connection-Controller protocol

This can be easy and generic or hard and overloaded with rich features. I'm inclined to make it easy. Because we are doing two-way communication with the client we will need the connection handler to listen for outgoing TCP data in addition to incoming. In essence, the handler is the translation piece for the controller that represents the actual user session from the perspective of the server. Because we will always be receiving complete telnet messages from clients and always sending complete responses back, it is OK to have the initiation of a receive or a send message block the other direction until the transmission is complete. This should be easy because we have two levels of message data buffering working for us. The networking subsystem will maintain a queue of incoming TCP messages we can receive when we are ready, and the Erlang runtime maintains a similar queue of incoming Erlang messages for the handler process. When we receive outgoing data we can translate its payload in a single step and send it, and when we receive incoming data we can collect the entire message and then block until it is interpreted and send to the controller process.

This means we will be primarily concerned with just two message types: incoming data and outgoing data. Because we want the controller and the handler to act as a single unit it is probably best to link them (remember, this means if one dies the other does as well). It is probably OK to make their communication asynchronous: they do not expect meaningful responses from one another, so there is no need to block either when sending a message. The only obvious purpose making the protocol between them synchronous would be to send acknowledgements of receipt, but this is not necessary since the Erlang runtime already guarantees delivery for us so long as the target process is alive.

Chat system

So far this is the only user-facing element that is not arbitrated by locations. Chat is central to the purpose of MUDs so we can't dodge thinking about this early on.

Each location is sort of its own little chat channel, since all verbal and emote activity is visible to other occupants. Locations are interconnected and channels are not, though, which means a channel manager -would be easier to write and actually more necessary than the location manager. We might want to implement something like a "shout" feature, where any really loud action (like an actual shout) can be heard by all mobs within a zone or maybe N-locations away. What is similar and what is different between chatting on global channels, talking in a room, and shouting across a zone?

Channels? Locations? Zones? Races? Factions? Hmmm. Let's break the underlying ideas down and see if we can identify some similarities; if we can there is a good chance we can prevent ourselves from duplicating a ton of code.

We know that locations maintain a roster of mobs. Channels would have to do a similar thing, but maybe with controllers instead of mobs (and we'll leave this undetermined for now). In a sense, a location is a chat channel with a radically advanced set of extended features, but the basics of receiving an action that should be broadcast and performing the broadcast are the same. We know that zones would have to maintain an inventory of the locations from which it is composed. Assuming that races/homelands are a sub-element of factions (sort of like racial affiliation with a faction in World of Warcraft) then faction-wide channels might be an aggregation of racial channels. Obviously locations and zones have an impact on gameplay and chat doesn't, but the communication responsibilities of both are strikingly similar. In fact, the only real difference between the two so far is that gameplay messages definitely go to mobs, and chat might only go to controllers (which indicates there may be more to the concept of controllers than we've realized yet).

We'll probably never fit communication and world simulation duties into the same module (or if we did, it wouldn't be a module I'd want to try reading), so I'll abandon that idea. It is quite likely, however, that with so many of the same procedures occurring we will find a way to abstract the basic pattern of communication channel operations and roster maintenance that we can create a library of common functions that all these modules call. This is the opportunity we need to stay alert to as we write location, chat, zone and channel code.

Note that the above reads "stay alert to" instead of "commit to right now". We don't know what location or chat code looks like. Its almost certain that the first time around we'll do something wrong or silly when it comes to communication handling. This is one of those cases where we know we'll probably write the code once, get it to work, and then realize something we didn't notice before and rewrite the code better. We may as well do that in the course of writing the location and chat code instead of writing the same (bad) code twice, then fixing it twice in two places, and then extracting it into a library. I prefer to write code once in one module to get it to work, then write it in another module and see if I learn anything by comparing the two versions. Then I collapse the base idea within those modules to make it look the same, and then extract it, knowing for certain now that it is safe to do so.

What about private messages, like "whisper" or "tell" features? Its easy enough to implement a "tell" that makes one mob send a tell directly to another mob, bypassing anything else. We can even say that each mob maintains its own "ignore" list and avoid having to write a central message nanny. I like this idea because it is simple, but we will have to decide if its better to have tells be a gameplay feature (handled by mobs and passed to controllers) or a system feature (handled by controllers directly). And that brings up the issue of targeting.

We've never established what the global identity of a mob is. In a location I might target the first of a group of similar mobs by using a generic alias, but what if I want to target the third one in the group? This implies that there is an ordering to mobs in a location, another idea we've never dealt with. This is an issue I was hoping to avoid until we dealt with direct interaction (combat, trading, etc.) but if private messaging occurs at the mob level then it is clear we can't sidestep it. So instead I think we'll separate the idea of "tell" and the idea of "whisper". A tell will be direct chat message between controllers, and a whisper will be a direct game action between two mobs in the same location. This disambiguates the concept and allows us to be clear about which idea we mean, and it also allows us to continue to defer the gameplay concept until later.

This leaves us with the issue of routing. In Erlang routing is pretty simple: send a message to some process and the runtime figures it out for you. But we want a chat system that has channels. Above we realized that chat channels are rather similar to locations in several ways. So we'll make channels their own processes, have them maintain a roster of participating controllers, and have the channel receive messages from participants and reflect them to every controller on the roster. Most everyone is familiar with basic IRC commands, so our system can mimic the behavior of basic ones like /join and /quit without getting too crazy.

Many MUD systems have a factional chat system that is considered game related. If we add a layer of chat such as this it could work the same way, and even be based on the same code, but would have to work through mobs (and hence be sending messages not just to player controllers of a faction, but to AI controllers as well). I'm not going to mess with this yet, because I think its too gameplay involved and complex to spend any time on yet. But I don't want to forget the idea, so I'm documenting it here; later on it may inspire a cool feature or two.

So that's the plan for now: first get a simple, global broadcast type chat working, and diversify from there based on whatever features we decide to implement later on. At first, anyway, with zero or very few players we won't really need to do much with channels, so we can keep in mind that we want to implement channels later on, but skip doing anything with that idea for now.

It looks like we need:


Very similar to locations, but with a more limited scope of operation. Where locations have game actions to arbitrate, channels may have chat commands to arbitrate, depending on where we decide to put the governing logic for chat system actions. Channels still maintain a roster, validate receipt of a message against membership in the roster, and the roster is the broadcast list for received messages.

We haven't really addressed the issue of how to deal with commands or what commands to receive, but we can worry about that when we discuss the channel-controller protocol and chat commands later. What we need to know about channels right now is that they are a sort of isolated, gameplay independent, non-topographical set of locations within which players can communicate.

Channel Manager

Something has to spawn channels, keep track the live ones, and perform cleanup actions if one dies. This is similar to the location manager, but simpler and more necessary. It would be possible for locations to manage their peers because they are connected by their way definitions*. Channels are more independent and won't be aware that they have siblings, so we have a clear need for a supervisor above them. Also, channels don't need to be linked to anything else, which radically changes the recovery conditions. For now we can just assume that if a person wants to talk on a particular channel and it goes away that they will rejoin it.

When a user wants to create a channel, the channel manager will create it and maintain or create a way to get its pid. If it dies a global will go out letting users know that channel died. This will probably benefit from some refinement later, but that's the basic idea we're going for.

Channel-Controller Protocol

At the moment it is obvious that there are three categories of messages that need to go from controllers to channels: commands, monitor 'EXIT', and chat messages. The other direction there are also three: channel information, monitor 'EXIT', and chat messages. These could all be wrapped up in a single type, and we could dispense with having controllers monitor channels (and at first we probably will), but if we want channel system messages and channel chat messages to be displayed differently (colors or "*" prefixes or something) then its easier to make the formatting a reaction to a semantic tag on the message tuple we receive than to match every incoming message to see if the beginning matches "Channel [foo] says: " or whatever. That's just silly. Much easier if we are listening for {chat, Channel, {Type, Message}} or something similar instead.

I'm not going to specify the messages here, the above is just an impulsive example to illustrate the sort of thing we want to do instead of string parsing or regex matching. The important thing is that we know there are three basic categories, and that its OK to make them asynchronous. The tricky thing about sending commands asynchronously is to remember to make it OK to receive a command late or the same one multiple times. For example, we shouldn't add two entries of the same controller into the channel roster if its already there when a second "join" message is received, nor should a channel crash if it receives a "leave" command from a controller that isn't on the list.

Chat Commands

Commands are going to be ridiculously simple for now: join, leave, and send. To make this as familiar as possible I'm going to try to make a leading slash indicate chat commands, and a leading hash indicate a channel target. So "/join #foo" puts me in that channel. "/leave #foo" removes me from it, and "#foo blah blah blah" sends the message "blah blah blah" to #foo for broadcast to the other participants. I might change to something different later, but this is pretty easy to remember and is fairly IRC-like. I'm not going to worry about all the other crap that comes with a serious chat system, like ban systems, moderator hierarchies, voice conditions, lurker status, or whatever else. That can come later.

The only thing I don't like about the chat system this way is that doing "#alongchannelname" is a pain, and this will encourage meaninglessly short channel names. Handles, names and descriptions being separate from one another alleviate this a bit, as does a user-definable command alias system, but it still has the potential to be awkward. But for not I don't care. We're dealing with telnet and that means working within the constraints of a genuine line-oriented interface.

This discussion of chat commands and channel-controller protocols and whatnot have me wondering if it might be better to separate the concept of chat control from mob control, because we still have the question of where to put the incoming string parsing (currently a part of what the connection controller does), etc. For now I'm going to leave things as they are, but later on some of this might get split up a bit. Mob control is not inherently related to the chat system. I don't want controllers to get insanely complicated, at least not without understanding why. So I'm going to tuck this idea in the back of my mind for now and move on with a simple implementation based on the discussion above, knowing I might break things up later on.


Users require some way of identifying themselves to the game server so it can locate their character(s).

Or not. We might want to design a game world where every time you log in you randomly inhabit a pre-existing mob and engage in whatever is going on nearby on your own. This is actually a pretty interesting idea, but one I'll write down and set to the side. Right now we're writing an adventure MUD of the typical form, so for now I'm going to use the old-fashioned (and not very secure, but easy to implement) username/password authentication for account login.

What are users loging in to? Do players have accounts and characters, or is every character an account? Here we find ourselves back at the issue of identity again, but from the direction of account management instead of chat and gameplay targeting. Obviously we have to do something about this. So what is an account? It is clearly different from a mob's identity, but is it different from a controller's identity? In most systems accounts are just rows in a user table or file somewhere (consider /etc/passwd), and a token representing authentication is granted to a connection and the combination of that connection and the token represent the user from the perspective of the system for the duration of a session. Our system could work a similar way, but instead of the connection representing the user, the controller spawned by the session connection would. Account records would still be entries in a user table or file somewhere.

It is important to take a moment and consider the semantic distinction that has just been introduced: controllers are no longer just the embodiment of a mob's volition, they have come to represent the presence and identity of a user in the system as well. In a sense this also means that AIs are a category of user, and if we add the right bits to them they could participate in non-game activities like chat. For now I'll leave open the question of whether this is good or not, but recognition of the changed nature of controllers is critical to our understanding of the system we are building.

So how do we create an account? I like the free feeling old MUDs give by making characters painless to create and totally independent of one another with no account system above this. On the other hand, players today are pretty used to the World of Warcraft account model where one account owns several characters. The problem there, of course, is that if your account gets compromised all your characters are compromised at once. Its a player-side version of putting all one's eggs in one basket, which is good to avoid. I don't see an obvious happy middleground here. If we wrap the account authentication bits behind a function it can be possible to create one account to one mob/character for the time being, and implement multiple character/account relationships later on without hurting anything. In either case, we will have to store account data and unique character mob data in a persistent way.

That's three things to implement: a storage system for persistent character data (whatever the system remembers of your character when you "rent" or logout), an account storage system for tracking account/character relationships, and an authentication routine that can assign a controller/connection a verified identity.


A system needs a way to get started, a main() sort of function that we know will get called and kick off execution of all the other bits. We need an erlmud module definition. Some folks write the top module in as much detail as they can first, and then work from there down to the details. Some folks work on the tiny components first, assembling their code from little blocks as they go. I mostly do the top-down way, but don't spend much time on detail early on. Instead I usually write a foo:start/0 function that barely suffices to spawn a unique process, and then write a small part of the tiny bits below and revise from there as I go. Where I know I'll start other components I add calls to stub functions that drop me reminders like printing "thingy supervisor starting" or whatever, but basically blow past that stuff and fill it in whenever I start needing it.

I've decided that building a little skeleton of a project like this prevents me wasting time on too much detail that I don't have any hope of understanding just yet. It also allows me to be lazy at the high level and geek out at whatever level of the problem interests me first, so long as I am diligent enough to fill in the execution path between start and whatever I'm working on. Perhaps most importantly, though, it motivates me because within a few minutes of starting to type I've got something that talks back to me, even if the talk isn't very useful at first.

This is one place that it can be good to start trying to iron out the verbiage of your project very early on. Write your initialization bits in a way that deals with your major system components at the level of abstraction you want the rest of the world to. Starting this early on forces pushing details to other modules, particularly in the case of whatever data models are used to start up the system. The last thing we want is for whoever configures the system and might actually deal with some top-level component to have to know that object state is represented as a tuple of form X underneath. Yuk.

I'm going to leave this one wide open for now, and only implement a direct code path that spins up the application and phrase things in the initialization procedure in a way I think is natural. It will probably be wrong, or at least not look much like whatever the production version of this module will look like, but that's OK for now. Whatever code winds up here will do what I need it to for now, which is give me a place to begin execution.

Parting Thoughts

At this point we have a pretty solid idea what the different system components are and what they should do, or at least we have a better idea than when we stated the problem "Write a MUD." two chapters ago. The only way to tell if our architecture makes sense is to try sketching it out in code. In the next section we'll start implementing very simple versions of each of these pieces in an order I think is reasonable, and add code a bit at a time until we have a very basic MUD engine in place. Keep in mind we're not aiming at a full implementation of any particular style of gameplay yet; if we get this groundwork part right the gameplay will be easier to focus on because we will be able to (mostly) assume the world works as we expect.

It might seem strange to leave unaddressed the question of whether hordes of processes (potentially tens or even hundreds of thousands) sitting around in memory incessantly messaging each other will be a problem later on or not. With Erlang I've found it best to blithely assume that the "everything is a process" concept will work out fine (unless there is some really obvious reason it won't), and that tittering over performance concerns before any hard data is available is a waste of time. Actually, its worse than that: it invites all sorts of ridiculous structural "optimization" ideas at the proto-architecture stage of design (the part where you're musing out the window, thinking, without having done so much as type git init). Our system lacks both its first user and, well, a system. Consider how silly it is to start postulating system loads in this situation!

Early assumptions made before any code has been written tend to doom the resulting project with a cancerous affliction of radically un-idiomatic counter-Erlang antipatterns. It is relatively easy to switch message routes around and replace the representation of components within a complex system so long as the code uses functional abstraction properly and embraces the per-process partition of state. It is usually not as easy to move the execution point of large chunks of interdependent logic from a single monolithic process to a pool of workers, though. We'd almost never admit this out loud, but part of the difficulty is overcoming our emotional investment in a tricky part of code that we found a clever solution to (your ego motivates you to work, but it also damns you to dine alone with dusty old clever skeletons). Until we can generate some hard performance numbers we have to trust that Erlang's concept of per-process encapsulation and strict message passing will work in our favor and go with the flow.