[Part 2 of a short series on messaging systems. (Part 1)]
Having implemented messaging systems of various sizes and scopes in all sorts of environments, I’ve come up with a few guidelines for myself:
- If messaging is not the core service, make it an orthogonal network service.
- If possible make the messages ephemeral.
- If the messages must persist use the lightest storage solution possible and store as little as possible.
- Accept that huge message traffic will mean partitioning, partitions will be eventually consistent, and this is OK.
- You don’t need full text search.
- If you really do need full text search then use a DB that is built for this — its a major time-sink to hack it in later and get it right.
- If the messages are threaded annotations over other existing relational data, swallow your pride and consult your DBA.
- VERSION. Your. DATA. And. PROTOCOLS.
- If anything about messages over existing data records feels hacky, awkward, or like it might put pressure on the existing DB, separate message storage and accept that some data integrity may be delayed or lost from time to time.
- Messaging is likely more important to your users than you (or they) think it is.
- The messages themselves are likely less important to your users than you (or they) think they are.
- If you can skip a feature, DO.
- You don’t need [AdvancedMessagingProtocol] (aka XMPP).
- If you *really* need XMPP it will be painfully obvious (and if you do chances are you certainly don’t need the extensions).
- [insert more things about avoiding feature creep]
Adding messaging to an existing system can get a little messy if you’re not really sure why you are doing it. If you know that your users really do have a need for messaging within your system, but you don’t know what features or type of messaging it should be, then think carefully about how you would want to use it as a user yourself.
Do users work together within your system to accomplish immediate goals? A concurrent multiuser system (multiplayer game, concurrent design tool, pair programming environment, etc.) benefits most from an ephemeral, instant chat system that centers around the immediate task. When the task is over the messages become meaningless and should be allowed to decay. It may be nice to give users an easy way to export or save significant messages, but that is really a client-side issue and is orthogonal to how the messaging system itself works.
Is the system used to coordinate real-world tasks or events? A real-world coordination system benefits most from a threaded comment/discussion system that centers around those tasks, and must only enhance but not replace the existing task-assignment and tracking features of the system. Threaded annotation is powerful here, but the threads need only persist as long as the task records do and messaging should never be mistaken for a task assignment tool (it can be leveraged as a part of task notification but never assignment or tracking). Remember
ON DELETE CASCADE? This is where it is super helpful.
Do users self-organize into groups whose sole purpose is communication? Social group systems benefit most from mail systems implemented directly within the system they use to organize themselves. Such systems may also benefit from some form of ephemeral immediate chat, but it is critical that we keep in mind that that indicates a need for two different messaging systems, not One Message System To Rule Them All.
Many different flavors of message systems exist between the extremes of “persistent point-to-point mail” and “ephemeral, instant, channelized (group) chat”. Consider:
- Persistent chat (Campfire,
SocialObstructionistStackOverflow chat, etc.))
- Message boards (“forums” — though this term is far broader in actual meaning…)
- Usenet-style newsgroups
- Mailing list systems
- Email bridges
- IRC + bridges + bots
- Anything else you can imagine…
Some other things to think about are the nature of the users relationship to one another. That’s not just about communication channels and point-to-point delivery issues, it is also about what concept of identity exists within the system. Is the system one with strong authentication, total or partial anonymity, or a hybrid? This will dictate everything about your approach to permissions — from moderation, channel creation and administrative control to whether private messages are permissible and have a huge impact on what the implementation of a messaging system will require in terms of access to the original host system it is intended to support.
The issues of identity, authentication and public visibility are largely orthogonal to questions of persistence duration, storage and message-to-record relationship, but they can become intertwined issues without you realizing it whenever it comes time to design the storage schema, the serialization format(s), or the protocol(s). Of course these are three flavors of basically the same issue — though the modern trend is shy away from thinking about this either by hiding behind HTTP (like, uh, who even knows how to program sockets anymore? zomg!) and sticking your fingers in your ears when someone says that “schemaless JSON is the schema” or that XML can do the job of all three because it is the pinnacle of data representations. Consider whether this may change in the future. Keep YAGNI in mind, but when it comes to schemas, serialization and protocols it is always good to design something that can be extended without requiring core modification.