Monthly Archives: March 2016

The most basic Erlang service ⇒ worker pattern

There has been some talk about identifying “Erlang design patterns” or “functional design patterns”. The reason this sort of talk rarely gets very far is because generally speaking “design patterns” is a phrase that means “things you have to do all the time that your language provides both no primitives to represent, and no easy way to write a library function behind which to hide an abstract implementation”. OOP itself, being an entire paradigm built around a special syntax for writing dispatching closures, tends to lack a lot of primitives we want to represent today and has a litany of design patterns.

NOTE: This is a discussion of a very basic Erlang implementation pattern, and being very basic it also points out a few places new Erlangers get hung up, like the context in which a specific call is — because that’s just not obvious if you’re not already familiar with concurrency at the level Erlang does it. If you’re already a wizard, this article probably isn’t for you.

But what about Erlang? Why have so few design patterns (almost none?) emerged here?

The main reason is that what would have been design patterns in Erlang have mostly become either functional abstractions or OTP (here “OTP” refers to the framework shipped with Erlang). This is about as far as the need for patterns has needed to go in the most general case. (Please note that it very often is possible to write a framework that implements a pattern, though it is very difficult to make such frameworks completely generic.)

But there is one thing the ole’ Outlaw Techno Psychobitch doesn’t do for us that quite a few of us do have a common need for but we have to discover for ourselves: how to create a very basic arrangement of service processes, supervisors, and workers that spawn workers according to some ongoing global state or node configuration. (Figuring this out is almost like a rite of passage for Erlangers — and often even experienced Erlangers have never distilled this down to a pattern, even if many projects do eventually evolve into something structured similarly.)

The case I will describe below involves two things:

  • There is some service you want to create that is represented by a named process that manages it and acts as its sole interface. Higher-level code in the system doesn’t want to call low-level code to get things done, the service should know how to manage itself.
  • There is some configurable state that is relevant to the service as a whole, should be remembered, and you should not be forced to pass in as arguments every time you call for this work to be done.

For example, let’s say we have an artificial world written in Erlang. Let’s say its a game world. Let’s say mob management is abstracted behind a single mob manager service interface. You want to spawn a bunch of monster mobs according to rules such as blahlblahblah… (Who cares? The game system should know the details, right?) So that’s our task: spawning mobs. We need to spawn a bunch of monster mob controller processes, and they (of course) need to be supervised, but we shouldn’t have to know all the details to be able to tell the system to create a mob.

The bestiary is really basic config data that shouldn’t have to be passed in every time you call for a new monster to be spawned. Maybe you want to back up further and not even want to have to specify the type of monster — perhaps the game system itself should know what the correct spawn/live percentages are for different types of mobs. Maybe it also knows the best way to deal with positioning to create a playable density, deal with position conflicts, zone conflicts, leveling or phasing influences, and other things. Like I said already: “Who cares?”

Wait, what am I really talking about here? I’m talking about sane defaults, really. Sane defaults that should rule the default case, and in Erlang that generally means some sane options that are comfortably curried away in the lowest-arity calls to whatever the service functions are.  But from whence come these sane defaults? The service state, of course.

So now that we have our scenario in mind, how does this sort of thing tend to work out? As three logical components:

  • The service interface and state keeper, let’s call it a “manager” (typically shortened to “man”)
  • The spawning supervisor (typically shortened to “sup”)
  • The spawned thingies (not shortened at all because it is what it is)

How does that typically look in Erlang? Like three modules in this imaginary-but-typical case:

  • game_mob_man.erl
  • game_mob_sup.erl
  • game_mob.erl

The game_mob_man module represents the Erlang version of a singleton, or at least something very similar in nature: a registered process. So we have a definite point of contact for all requests to create mobs: calling game_mob_man:spawn_mob/0,1,... which is defined as

spawn_mob() ->
    spawn_mob(sane_default()).

spawn_mob(Options) ->
    gen_server:cast(?MODULE, {beget_mob, Options}).

 

Internally there is the detail of the typical

handle_cast({beget_mob, Options}, State) ->
    ok = beget_mob(Options, State),
    {noreply, State};
%...

and of course, since you should never be putting a bunch of logic or side-effecty stuff in directly in your handle_* function clauses beget_mob/2 is where the work actually occurs. Of course, since we are talking about common patterns, I should point out that there are not always good linguistic parallels like “spawn” ⇒ “beget” so a very common thing to see is some_verb/N becomes a message {verb_name, Data} becomes a call to an implementation do_some_verb(Data, State):

spawn_mob(Options) ->
    gen_server:cast(?MODULE, {spawn_mob, Options}).

%...

handle_cast({spawn_mob, Options}, State) ->
    ok = do_spawn_mob(Options, State),
    {noreply, State};

% ...

do_spawn_mob(Options, State = #s{stuff = Stuff}) ->
    % Actually do work in the `do_*` functions down here

The important thing to note above is that this is the kind of registered module that is registered under its own name, which is why the call to gen_server:cast/2 is using ?MODULE as the address (and not self(), because remember, interface functions are executed in the context of the caller, not the process defined by the module).

Also, are the some_verb/N{some_verb, Data}do_some_verb/N names sort of redundant? Yes, indeed they are. But they are totally unambiguous, inherently easy to grep -n and most importantly, give us breaks in the chain of function calls necessary to implement abstractions like managed messaging and supervision that underlies OTP magic like the gen_server itself. So don’t begrudge the names, its just a convention. Learn the convention so that you write less annoyingly mysterious code; your future self will thank you.

So what does that have to do with spawning workers and all that? Inside do_spawn_mob/N we are going to call another registered process, game_mob_sup. Why not just call game_mob_sup directly? For two reasons:

  1. Defining spawn_mob/N within the supervisor still requires acquisition of world configuration and current game state, and supervisors do not hold that kind of state, so you don’t want data retrieval tasks or evaluation logic to be defined there. Any calls to a supervisor’s public functions are being called in the context of the caller, not the supervisor itself anyway. Don’t forget this. Calling the manger first gives the manager a chance to wrap its call to the supervisor in state and pass the message along — quite natural.
  2. game_mob_sup is just a supervisor, it is not the mob service itself. It can’t be. OTP already dictates what it is, and its role is limited to being a supervisor (and in this particular case of dynamic workers, a simple_one_for_one supervisor at that).

So how does game_mob_sup look inside? Something very close to this:

-module(game_mob_sup).
-behavior(supervisor).

%%% Interface
spawn_mob(Conf) ->
    supervisor:start_child(?MODULE, [Conf]).

%%% Startup
start_link() ->
    supervisor:start_link({local, ?MODULE}, ?MODULE, []).

init([]) ->
    RestartStrategy = {simple_one_for_one, 5, 60},
    Mob = {game_mob,
           {game_mob, start_link, []},
           temporary,
           brutal_kill,
           worker,
           [game_mob]},
    Children = [Mob],
    {ok, {RestartStrategy, Children}}.

(Is it really necessary to define these things as variables in init/1? No. Is it really necessary to break the tuple assigned to Mob vertically into lines and align everything all pretty like that? No. Of course not. But it is pretty darn common and therefore very easy to catch all the pieces with your eyes when you first glance at the module. Its about readability, not being uber l33t and reducing a line count nobody is even aware of that isn’t even relevant to the compiled code.)

See what’s going on in there? Almost nothing. That’s what. The interesting part to note is that very little config data is going into the supervisor at all, with the exception of how supervision is set to work. These are mobs: if they crash they shouldn’t come back to life, better to leave them dead and signal whatever keeps account of them so it can decide what to do (the game_mob_man, for example, which would probably be monitoring these). Setting them as permanent workers can easily (and hilariously) result in a phenomenon called “highly available mini bosses” — where a crash in the “at death cleanup” routine or the mistake of having the mob’s process retire with an exit status other than 'normal' causes it to just keep coming back to life right there, in its initial configuration (i.e. full health, full weapons, full mana, etc.).

But what stands above this? Who supervises the supervisor?

Generally speaking, a component like mob monsters would be a part of a larger concept of world objects, so whatever the world object “service” concept is would sit above mobs, and mobs would be one component of world entities in general.

To sum up, here is a craptastic diagram:

Yes, my games involve wildlife and blonde nurses.

Yes, my games involve wildlife and blonde nurses.

The diagram above shows solid lines for spawn_link, and dashed lines to indicate the direction of requests for things like spawn_link. The diagram does not show anything else. So monitors, messages, etc. are all just not there. Imagine them. Or don’t. That’s not the point of this post.

“But wait, I see what you did there… you made a bigger diagram and cut a bunch of stuff out!”

Yep. I did that. I made an even huger, much crappier, more inaccurate diagram because I wasn’t sure at first where I wanted to fit this into my imaginary game system.

And then I got carried away and diagrammed a lot more of the supervision tree.

And then I though “Meh, screw it, I’ll just push this up to a rough imagining of what it might look like pushed all the way back to the SuperSup”.

Here is the result of that digression:

It wouldn't look exactly like this, so use your imagination.

It wouldn’t look exactly like this, so use your imagination.

ALL. THAT. SUPERVISION.

Yep. All that. Right there. That’s why its called a “supervision tree” instead of a “supervision list”. Any place in there you don’t have a dependency between parts, a thing can crash all by itself and not bring down the system. Consider this: the entire game can fail and chat will still work, users will still be logged in, etc. Not nearly as big a deal to restart just that one part. But what about ItemReg? Well, if that fails, we should probably squash the entire item system (I’ve got guns, but no bullets! or whatever) because game items are critical data. Are they really critical data? No. But they become critical because gamers are much more willing to accept a server interruption than they are losing items and having bad item data stored.

And with that, I’m out! Hopefully I was able to express a tiny little bit about one way supervision can be coupled with workers in the context of an ongoing, configured service that lives within a larger Erlang system and requires on-the-fly spawning of supervised workers.

(Before any of you smarties that have been around a while and point out how I glossed over a few things, or how spawning a million items as processes might not be the best idea… I know. That’s not the point of this post, and the “right approach” is entirely context dependent anyway. But constructive criticism is, as always, most welcome.)