Tag Archives: Development

Zomp/zx: Yet Another Repository System

I’ve been working on a from-source repo system for Erlang on and off for the last few months, contributing time to it pretty much whenever real-life is not interfering. I’m getting close to making a release. Now that my main data bits are worked out, the rest isn’t all that hard. I need to figure out what I want to say in an announcement.

The problem is that I’m really horrible at announcements and this system does things in a pretty different way to other repository systems out there, so I’m not sure what things are going to be important about it to users (worth putting into an announcement) and what things are going to be important to only me because I’m the one who wrote it (and am therefore obsessed with its externally inconsequential internals). What is internally interesting about a project is almost never what is externally interesting about it. Marketing; QED. So I need to sort that out, and writing sometimes helps me sort that kind of thing out.

I’m making this deliberately half-baked, disorganized, over-long post public because Joe Armstrong gave me some food for thought the other day. I had written him my thoughts on a subject posted to a mailing list but sent the message in private. I made my message to him off-list for two reasons: first, I wasn’t comfortable with my way of expressing the idea just yet; and second, I am busy with real-life stuff and side projects, including the repo system, and don’t want to get sucked into online chatter that might amount to nothing more than bikeshedding. (I’m a world-class bikeshedder!) Joe wrote me back asking why I made the reply private, I told him my reasons, and he made me change my mind. He hopes that more people will publish their ideas all the time, good or bad, fully baked or still soggy — because that’s the only way we can ever find any other interesting ideas these days is by searching for them, usually in text, on the net somewhere. It isn’t like we can’t go back and revise, but whether or not we do go back and clean up our literary messes, the availability of core ideas and exposure of thought processes are more important than polish. He’s been on a big drive to make sure that he posts most of his thoughts to public mailing lists or blogs so that his ideas get at least indexed and archived. On reflection I agree with him.

So here I am, trying to publicly organize my thoughts on my repository system.

I should start with the goals of the system.

This system is intended to smooth over a few points of pain experienced when trying to get a new Erlang project off the ground, and in particular avert the path of pain peculiar to Erlang newcomers when they encounter the “how to set up a project” problem. Erlang’s tooling is great but a bit crufty (deeply featured, but confusing to interface with) and not at all what the kool kids expect these days. And anyway I’m really just trying to scratch my own itch here.

At the moment we have two de facto standards for publishing Erlang systems: erlang.mk and Rebar. I like both of these, especially erlang.mk, but they do one thing that annoys me and never seems to quite fit my need: they build Erlang releases.

Erlang releases are great. They cut all the cruft of a release out and pack everything needed to actually run a system into a single blob of digits that you can move, in a single shot, to a new target system — including the Erlang runtime itself. Awesome! Self-contained deployment and it never misses. This has been an Erlang feature since before people even realized that they needed repeatable deployment infrastructure outside of the classic “let’s build a monolithic, static binary executable” approach. (Erlang is perpetually ahead of its time, even by today’s standards. I look at the poor kids stubbing their toes with Docker and language du jour and just shake my head — though part of that is because many shops are using Docker to solve concurrency issues that they haven’t even become cognizant of, thinking that they are experiencing “scaling” problems but missing the point entirely.)

Erlang releases are awesome when the deployment target is an embedded system, but not so awesome if the target is a full-blown operating system, VM, container, or virtual environment fully stocked with gobs of memory and storage and flush with system utilities and resources. Erlang releases sort of kitchen-sink the deployment itself. What if you want to run several different Erlang programs, all delivered as releases, all depending on the same library? You’ve got tons of copies of that library. Which is OK, but still sort of weird, because you also have tons of copies of the runtime (among other things). Each release is self-contained and lean, but in aggregate this is a bit odd.

Erlang releases make sense when you’re deploying to a phone switch or a sensor device in the middle of nowhere and the runtime is basically acting as its own operating system. Erlang releases are, in that context, analogous to putting a Gentoo stage 3 binary image on a system to leapfrog most of the toolchain process. Very cool when you’re in that situation, but a bit tinker-tacky when you’re just trying to run, say, a client program written in Erlang or test a web front-end for something that uses YAWS or Cowboy.

So that’s the siloed-kitchen-sink issue. The other issue is that newcomers are perpetually confused about releases. This makes teaching elementary Erlang hard. In my view we should really focus on escript for beginner code — just let the new guy run something out of a single file the way he is used to doing when learning a new language instead of showing him pages of really slick code, then some interpreter stuff, and then leaping straight from that to a complex and advanced packaging setup necessarily tailored for conducting embedded deployments to slim hardware devices. Seriously. WTF. Escripts give beginners all the power of Erlang necessary for exploring the more interesting bits of code and refactoring needed to learn sequential Erlang with the major advantage of being able to interface with the system the same way programmers from other environments are used to dealing with langauge runtimes like Bash, AWK, Python, Ruby, Perl, etc.

But what about that gap between scripts and full-blown production deployments for embedded hardware?

Erlang has… nothing.

That’s right! There is no agreed-upon way to deploy or even run Erlang code in the same manner a Python coder would expect to execute a python program. There is no virtualenv type system, there is no standard answer to the question “if I’m in the project directory and type ./do_thingy it will just work, right?” The answer is always “Well, it depends…” and what actually winds up happening is that people either roll a whole release just to crank a trivial amount of code up or (quite often) implement an ad hoc way to get the same effect in a lighter-weight way. (erlang.mk shines here, actually.)

Erlang does provide a number of ways to make a system run locally from source of .beam files — and has actually quite reasonable built-in resources for this — but nothing has been built around these tools that also deals with external dependencies, argument passing in a standard way, or any of the other little things you really need if you want to claim a complete solution. Hence all the ad hoc solutions that “work on my machine” but certainly aren’t something you expect your users to use (not with broad success, anyway).

This wouldn’t be such a big problem if it weren’t for the fact that not having any standard way to “just run a program” also means that there really isn’t any standard way to deal with client side code in Erlang. This is a big annoyance for me because much of what I do is client-side code. In Erlang.

In fact, it totally boggles my mind that client-side Erlang isn’t more common, especially considering that AMD is already fielding zillion-core processors for desktops, yet most languages are fundamentally single-threaded. That doesn’t mean you can’t do concurrency and parallelism in other languages, but most problems are not parallel in nature to begin with (parallel problems are easy to write solutions to in any language) while most real-world problems are concurrent. But concurrent systems are hard to write in almost every language. Concurrent problems are the bulk of the interesting problems we’re still not very good at solving with computers. AMD is moving to make the tools available to make much more interesting concurrent processing tools available on the client side (which means Intel will soon start pouring it gajillions worth of blood diamond money into a similar effort), but most languages and environments have no good way to make use of that on the client side. (Do you see why I hear Lady Fortune knocking?)

Browsers? Oh yeah. That’s a great plan. Have you noticed that most sites slowly move toward the “Single Page App” design over time (read as: the web sucks, so now we write full-but-crippled client-programs and deliver them over the web), invest heavily in do-sneaky-things-without-telling-you JavaScript and try to hog every core your system has if you allow it the slightest permission to do so? No. In the age of bitcoin miners embedded in nearly every ad this is not the direction I think we should be envisioning things going.

I want to take better advantage of the cores users have available, and that doesn’t necessarily mean make more efficient use of every cycle as much as it means to make scheduling across processes more efficient to reduce latency throughout the system overall. That’s something users care about quite a lot. This is the problem Erlang has already solved in a way no other runtime out there has. So I want to capitalize on it.

And yet, there is still not standardish way of dealing with code from source, running it locally, declaring or resolving dependencies, or even launching a client-side program at all.

So… how am I approaching it?

I have a project called “zomp” which is a repository system. It is a distributed repository system, so not everything has to be held in one place. Code in the zomp universe is held in little semantic silos called “realms”. Each realm can have whatever packages the owner (sysop) wants it to have. Each realm must have one server node somewhere that is its “prime” — the node in charge of that realm. That node is where system operator tasks for that realm take place, packagers and maintainers submit code for inclusion, where the package index is built, where the canonical copy of everything is stored. Other nodes configured to see that realm connect to the prime node and receive a copy of the current indexes and are tested for availability and published as available resources for querying indexes or downloading packages.

When too many subordinate nodes connect to a prime the prime will redirect a new node to a subordinate, when a subordinate gets “full” of subordinates itself, it picks a subordinate for new redirects itself, etc. so each realm winds up forming a resource tree of mirror nodes that connect back to the realm prime by a single path. A single node might be prime for several realms, or other nodes may act as prime for different realms — and any node can be configured to become a part of any number of realm trees.

That’s the high-level code division.

The zomp constellation is interfaced with via the “zx” program (short for “zomp explorer”, or “zomp exchanger”, or “Zomp eXtreem!”, or homage to the Sinclair ZX-81, or whatever else might lend itself to the letters “zx” that you might want to make up — I actually forget what it originally stood for, but it is remarkably convenient to type so it’s staying that way)

zx is configured to have visibility on zomp realms the same way a zomp node is (in fact, they use the same configuration files and it isn’t weird to temporarily host a zomp node on your desktop the same way you might host a torrent node for a while — the only extra effort is that you do have to open a port, zomp doesn’t (yet) do hole punching magic).

You can tell zx to run a program using the highly counter-intuitive command:

zx run Realm-ProgramName[-Version]

It breaks the program name down into:

  • Realm (optional, defaulting to the main realm of public FOSS packages called “otpr”)
  • Name (necessary — sort of the whole point)
  • Version (which is optional and can also be partial: “1.0.3” vs just “1.0” or “1”, defaulting to the latest in a series or latest overall)

With those components it then contacts any zomp node it knows provides the needed realm, resolves the latest version number of the requested program, downloads and unpacks it, checks and downloads any missing dependencies, builds the program, and launches it. (And if it doesn’t know any active mirrors it asks the prime node and is seeded with known mirror nodes in addition to getting its query answered.)

The packages are kept in a local cache stored at the user level, not the system level (sort of like how browsers keep their JS and page caches) — though if you want to daemonize zomp and run it as a permanent service (if you run a realm prime, for example) then you would want to create an unprivileged system user specifically for the purpose. If you specify a fully-qualified “realm-name-version” for execution and the packages already exist and are built, zx just launches the code directly (which is the majority case, so no delay there — fast startup).

All zomp nodes carry a complete index of their configured realms and can answer queries with very little overhead, but only the prime node has a copy of all the packages for that realm

 

Zomp realms are write-only. There is no facility for removing a package from a realm entirely, only for upgrading the versions of packages whenever necessary. (Removal is, of course, possible, but requires manual intervention by the sysop.)

When a zx client or zomp node asks an upstream node for a package and the upstream node does not have a copy it will query its upstream until the request reaches a node that does have a copy. Once found a “found” notice goes back down to the client telling it how many hops away the package is, and new “hops away” notices are sent as the package is passed downstream toward the original requestor (avoiding timeouts and allowing the user to get some feedback about what is going on). The package is cached at each node along the way, so subsequent requests for that same package will be handled immediately without any more relay downloading.

Because the tree of nodes is expected to be relatively ephemeral and in a constant state of flux, the tendency is for package stores on mirror nodes to be populated by only the latest, most popular packages. This prevents the annoying problem with old realms having gobs of packages that nobody uses but mirror hosts being burdened with maintaining them all anyway.

But why not just keep the latest of everything and ditch old packages?

Ever heard of “version shear”? Yeah. Me too. It sucks. That’s why.

There are no “up to” or “greater than” or “abstract version 3” type dependency declarations in zomp package metadata. As a package maintainer you must explicitly declare the complete version of each dependency in your system. In the case of diamond-shaped dependencies (where two packages in your system depend on slightly different versions of the same package) the burden is on the packagers to declare a version that works for a given release of that package. There are no dependency trees for this reason. If your package depends on X, and X depends on Y and Z then your package must be defined as depending on X, Y and Z — and fully specify the versions involved.

Semver is strictly enforced, by the way. That is, all release numbers are “Major.Minor.Patch”. And that’s it. No more, no less. This is one of the primary criteria for inclusion into a public realm and central to the way both zx and zomp interpret package semantics. If an upstream project has some other numbering scheme the packager will need to create a semver standard of his own. And actually, this turns out to not be very hard in practice. There is one weird side-effect of full, static dependency version declarations and semver: updating dependencies results in incrementing your package’s patch number, so even if you don’t change anything in a program for a long time, a program with many dependencies under heavy development may wind up on version 2.3.257 without much change other than the {deps, PackageIDs}. line in the package meta file.

zx helps make you aware of these situations, so solving them has not been particularly difficult in practice.

Why do things this way?

The “static dependencies forever and ever, amen” decision is a tradeoff between the important feature of fully repeatable builds Erlang releases are famous for (to the point of bug-compatibility between deployment sites — which is critical in production) and the flexibility users and developers have come to expect from source repository systems like pip, pypi, CPAN, etc. Because each realm is write-only there is no danger that a package will be superceded and disappear. The way trickle-down caching works for mirror zomp nodes does not unduly burden the subordinate realm mirrors, and the local caching behavior of zx itself at launch time tends to make all of this mostly delay-free for zx clients and still gives them the option to always run “latest available version” if they want.

And on the note of “latest version”…

Client-side programs are not expected to be run too terribly long at a time. People shut desktop programs down, restart computers, update their kernels, etc. So even if a client program runs a long time (on the order of web, email, IRC, certain games, crypto wallets/miners, torrent nodes, Freenode, Tor, etc) it will still have a chance to restart every few days or weeks to check for a new version (if invoked in a way that omits the version number so that it always queries the latest version).

But what about for long-running server-side type programs? When zx starts a script checks the initial environment and then starts the erlang runtime with zx as its target application, passing it the package ID of the desired program to run and its arguments as arguments. That last sentence was odd. An example is helpful:

zx run foo-bar arg1 arg2 arg3

zx invokes the launching script (a Bash script on Linux, BSD and OSX, a batch file on Windows — so actually the command is zx.bash or zx.cmd)  with the arguments run foo-bar arg1 arg2 arg3. zx receives the instruction “run” and then breaks “foo-bar” into {Realm, Name} = {"foo", "bar"}. Everything after that is passed in as strings which wind up being the input arguments to the program being run: “foo-bar”.

zx registers a process called zx_daemon which remains resident in the runtime and waits for a subscription request or zomp query. Any Erlang program written with the intention of being used with zx can send a message to zx_daemon and ask it to maintain a connection to the program’s parent realm and enroll for update notifications. If the target program itself is the subject of a realm index update then it will get a message letting it know what has changed. The program can respond any way the author wants to such a notification.

In this way it is possible to write a client-side or server-side application that can enroll to become aware of updates to itself without any extra infrastructure and a minimal amount of code. In some programs I’ve used this to cause a pop up notification to appear to desktop users so they know that a new version has become available and they should restart the program (the way Firefox does on Windows). It could also be used to initiate a restart on its own, or whatever else you might come up with.

There are several benefits to developers of using this system as well.

As a developer I can start a new project by doing zx init app [Realm-Name] or zx init lib [Realm-Name] in an existing project root directory and a zomp.meta file will be generated for it, or a new project template directory will be created (populated with a functioning sample skeleton project). I can do zx dailyze and zx will make sure a generally relevant PLT exists or is built (if not up to date) and used to check the typespecs of the project and its dependencies. zx create package [Path] will create a zomp package, sign it, and populate the metadata for it. zomp keygen will generate the kind of keys necessary to interact with a zomp server. zomp submit PackageFilePath will submit a package for review.

And so on.. It is a lot easier to do most things now, and that’s the main point.

(There are commands for reviewing, approving, or rejecting package submissions, adding packagers and maintainers to package projects, adding dependencies to projects, X.Y.Z version incrementing, etc. as well.)

This is about 90% of the way I want it to be, but that means about 90% of the effort remains (pessimistically assuming the 90/10 rule, because life sucks and nobody cares). Most of that is probably going to be finagling some network lunacy, but a lot of the effort is going to be in putting polish to it.

Zomp/zx is based on a similar project I wrote for use within Tsuriai a few years ago that has much sparser features but does basically the same thing: eases packaging and repeatable deployment from source to client systems. I would never release that version publicly because it has a lot of “works for me!” level functionality, but very little polish and requires manually diddling quite a few settings files in error-prone ways (which is fine because it was just us diddling them).

My intention here is to Cadillac this out a bit so that newcomers can slide into the new language and just focus on that language after learning a minimum of tooling commands or environmental details. I think zx init app foo-bar and zx runlocal are a low enough bar for entry.

(Personal) Guidelines for Software Projects

A few guidelines for non-trivial, large projects you actually care about and want to maintain for more than a month or so.

1. Typespecs

Learn to use them. If you are writing a large, complex project in a language that doesn’t support this or have tooling for it then use a different language. Yes, it actually saves so much heartache that it is important enough to switch.

Why? Because for-real type checking can tell you, without the futility or religious interference of unit testing, whether or not your program is valid. A valid program is not necessarily a correct program, but an invalid program is necessarily an incorrect one. (Also, it is worth keeping in mind that classes are not types. There is a subtle, and critical, difference.)

2. Property testing, not unit testing

Don’t simply write a few “unit tests” and assume things work. They don’t. As Rich Hickey (the creator of Clojure) so aptly put it: “What is the one thing that is true about all bugs found in the wild? Every one of them passed all the tests!” It can be useful to engage in regression testing, but regression testing is a subset of integration testing and even crosses over with user testing (the ultimate of all) and project documentation and history management.

When you write code, it has bugs.

  • Some are syntactic: You forgot some ant poop somewhere (things like: : ; . ,), failed to close a brace or paren, or misspelled something.
  • Some are structural: You passed in a foo type but the function is defined as accepting bar (statistically this is the greatest category of compilable, invisible errors — reference point 1 above).
  • Some are scheduling and timing: You have races and deadlocks all over the place and never knew it because they don’t usually get triggered and are super complex to work out in your head.
  • Some are semantic: The program does precisely what you told it to do, but you told it to do the wrong thing (the most frequent place where protocol failures creep in).

You write every one of these kinds of bugs into your programs every time you write a non-trivial program. I can’t just tell you to knock it off and tighten your shot group because I do the same stuff because it is impossible to avoid! If you write all these stupid bugs into your programs, what do you think lurks in your hand-written test code? MORE BUGS!

So what do?

In the same way that we can write a type specification for a function (declare its domain and codomain, basically) we can also write a specification for the function’s valid inputs, and outputs and the expected rules the output should follow (its range and image, basically). This defines the properties of the function.

Neat-O. But what would we do with such a specification? Property declarations are like me explaining to you what a function does, but not how it manages to do it. To test whether our implementation of the function does the expected thing and lacks corner cases, however, we can use a property-based testing system to generate tests for us on the fly and run them to check whether the expected properties of the function hold true. Not only that, smart property based testing systems not only find bugs (values that are defined as valid but produce invalid results that violate the property specification) but can quite often home in on specific broken cases and give you a good indication what sorts of values are problematic. That is to say, a property-based testing engine equipped with good property definitions can locate the corner cases for you.

Why wouldn’t we do this by hand? Because typically unit tests cover a handful of most-common cases with their expected values and that’s about it. Property based testing is much less merciful and also much less prone to error because a property based tester will generate an endless stream of tests according to the provided properties and run them for as much CPU time as you’re willing to give for testing. You are never going to write millions of different test cases for your code. A property based testing engine will do precisely that if you give it the CPU time to do so. Compared to how testing is done in most projects this is like having nuclear power in the age of wooden stoves.

This is magical.

3. DO USER TESTING

When you release something that has worked for you so far, that’s about as much confidence as you should put in an alpha release. “Works for me!” are the bold last words of many an abandonned project.

Don’t be That Guy. Don’t release That Project as a final. Be clear its a beta or even alpha, and development is an ongoing thing, forever. Manage expectations, your users (paying or community) will reward you for being honest.

When you release a project understand that this is your beta period, even if you’re on a relatively mature version. In a sense all significant features go through their own little beta phase. This is true in part because you’ve no clue if power users are going to find a way to break it (they will) or if it will be instantly appreciated and adopted by the userbase (random gamble there). Whatever you think is important or intuitive might have never even occurred to them.

Power users are going to push the button the wrong way and don’t know how to deal. That’s actually a good thing if you maintain a relationship with your users, because you’re basically getting directions straight from the affected party about how to make your program better. This is important whether you’re doing community open source for some sweet Ego Points, or trying to feed the kids at your soul-crushing job.

No amount of unit testing (which we’ve already sort of debunked — write typespecs, don’t blindly churn out unit tests) or property testing (which is vastly superior to unit testing, but misses a lot of side-effecty issues, which are often the central purpose of your program) can catch everything. No amount of integration testing will uncover everything that is wrong with your program. None of these tests will tell you whether your program sucks to use and is reviled by users. But user testing will.

4. Don’t be afraid to change stuff

You have a version control system for your code. You use git. Or something. It doesn’t matter, though, because you have something that does version control for you and creating a new branch is painless. (Unless you’re not using a version control system… then you really need to start. You don’t have to submit to the dark cabal of Ruby hipsters that controls github, but you should at least be using git locally.)

If you have an idea try it out. It is probably a great idea in spirit but won’t be so great in reality until you’ve shaken a bit of the stupid, self-indulgent fantasy out of it. You can’t do that without exploring the idea in actual implementation and that sort of exploration requires hacking up your pristine project a bit until you discover exactly why, in mechanical terms, the Universe hates your idea. Once you know exactly why the Universe hates you and your ideas you can adjust your plan to accommodate the whims of the math gods, tame the vagaries of digital magika, and tap out the proper incantations in much less time than you could had you just held endless meetings about it.

Break stuff. Remember the Cardinal Rule of Hacking:

“If you understand what you’re doing, you’re not learning anything.”
– Some guy (who was not actually Abraham Lincoln)

Sometimes the best sign of progress is a change in the error messages you are getting.

Simplicity follows complexity. Until you write a godawful fugly version of your solution you don’t really understand the problem. If you don’t fully grok the problem how can you ever hope to come up with a solution? Only after you have encountered all the little gotchas that made the code ugly in the first place are you ready to rewrite that steaming pile of (working) poo into an elegant solution that is almost guaranteed to have fewer bugs if for no other reason than increased transparency and better organization of the code.

(But note that you could stick with the ugly version for a bit in a pinch — so not all is lost. Getting something working at all is better than having a bunch of great ideas that don’t exist in reality.)

5. Don’t be afraid of new languages

At this point in my life I’ve written code in about 30 or 40 languages. I don’t know the exact number. I have written a lot of code and gained intimacy with about 10 of those. That’s a lot of languages by some standards and not many at all by others. It is enough, though, that I have come to realize that most languages are minor syntactic variations on a couple of basic paradigms, and really none of that crap matters too much.

It’s all shitty. All languages suck. Some suck a little less than others. Try to find one from the handful that sucks dramatically less than others in a specific domain, then get comfortable with it as a go-to tool for that domain. But remember that it is just a tool. Jackhammers are tools, but I don’t see anyone building houses with them.

When you hop on to a new project that someone is already working on you’re going to have to pretty much adhere to the rules of their house, and that means dealing with whatever annoying language they wrote their awesome project in.

Want to hack on Freenode‘s core implementation? Better not mind dealing with network code and file operations in Java (eek!). And what if you don’t even know Java or it has been years since you saw it last and everything is different now? This is the concern that should worry you the least of all.

If you squint a little projects basically are languages. They have their own semantics (the project libs, its functions, it type specification, its class definitions, its decision tables, its… whatever its got that is relevant). They have their own sort of syntax. In fact, every very large project I’ve ever worked on tended to actually follow Greenspun’s Tenth Rule and if it was a concurrent system (so common today) they even tend to follow Virding’s First Rule. (That becomes less of a joke and more of a law of nature the longer you do this and the more you know about both lisp and OTP.)

What does this mean? It means that learning the language a program is written in is the easy part. Learning the libs of that language tend to take about twice as long as learning the language itself. Learning the internals of a large project, however, tend to take about ten times longer than that. So where is the real cost in effort here? It isn’t in the adoption of a new language. It is in the adoption of a new project because every project is a tarbaby.

6. JUST OPEN YOUR EDITOR YOU PROCRASTINATING SACK OF POO!

Getting started is the hardest part of writing anything, whether prose, code, or poetry is sitting down and typing out something.

How to tackle the procrastination problem? Easier said than done: OPEN YOUR EDITOR

3 to 5 letters is all you need: `vim` or `emacs` and away you go!

Once you’re fully in the Matrix, write a function or spec or something. It doesn’t matter what you try to do: it will be wrong. And then you’ll have been wrong, but not exhausted yet. And suddenly you’ll realize that you are the one being wrong on the internet today and that situation just cannot stand. So you’ll start fixing it. And tinkering on it. And before you know it you’ll actually have some something productive, the curse of social media will be temporarily suspended, and you’ll finally stop feeling so crap about yourself (for a few minutes, anyway).

Web Designers: Stop making SPAs for inherently web 1.0 style sites

It is 2017. What’s with the drive to make everything an SPA whether it needs to be or not? This is getting a little ridiculous. I’m going to ramble on below a bit because I’ve got a hankering to do so — pay this no mind.

All around the web I see sites that are best represented as a collection of inter-linked documents, and all around the web I see many of those being changed into single-page application (SPAs). Even more stupid is when the SPA in question was built by some naive dope who included a little bit of almost every JS framework in existence — including a random selection from the thousands of obsolete and dead ones.

What is the goal? What’s the deal? Do web authors today not know how the web was actually intended to work originally? That document publication is actually its reason for existence in the first place and that “web applications” are a new thing that is a backhack to an incomplete standard that only sorta-kinda-works?

Granted, the reason it only sorta-kinda-works is due mostly to the problems inherent in the fact that only a single language is allowed in scripts… which is ridiculous. Was nobody paying attention to the Guile2 approach all those years? The only lesson learned from the Java applet and Flash experience seems to have been that “it sucks to force users to install runtimes as plugins”. Ugh.

Anyway, back to web applications…

I get it. For the moment we don’t have a solid distinction between “a document browser” and “an application browser” so we are stuck with this insufficient worst-of-both-worlds nether region of “applications that masquerade as documents”. And that drives anyone nuts who has given this much thought.

Not that a lot of people have considered the difference deeply. I imagine that is probably because very few new coders today have ever written more than a line or two of code intended to run natively on a user’s local system. Nearly everyone has written thousands of lines of code intended to run natively on server-side systems, but even that is getting wonky because many youngsters today don’t know how to deploy without using Docker yet lack the faintest inkling as to what problems Docker actually is intended to solve and wind up bypassing better solutions when they exist.

Tools shine when they are used in a focused way, performing they job for which they were intended. The web is the same way. Yes, it is a big jumble of crap. So let’s just leave that there. Networks are a big jumble of crap, too, and so are human societies — so we’ve adopted dirty ways of dealing with the dirt. The jumbly pile of shit that is the web is one of our ways of dealing with that. Everything times out. Everything is sent in text. Protocols are bloated and redundant. There isn’t even a proper definition of what “valid” HTML and XML and JSON and whatever else is in most cases. Its all racing toward a singularity where everything is uniformly stupid. But… whatever, it sort of kind of still works — and humans just barely work themselves, so that’s par for the course.

The original web was designed to function as an insecure document publication system where documents could be interlinked. We realized that we could include more interesting stuff by expanding the definition of “document” to include more than just text, and quite recently with HTML5 the way in which documents can be written is only a few orders of magnitude behind, say, LaTeX, in its ability to arrange things on the screen (that’s feature lag is not entirely the fault of the HTML5 authors).

This gives a lot of freedom to website authors — perhaps too much.

If a website is a set of news articles or academic papers (or even tweets) then you really don’t need a SPA, you need a more traditional sort of “web site”. It can be dressed up all pretty with shiny things sprinkled around, of course, but we don’t want a SPA that mysteriously changes state in ways that users cannot bookmark things, can’t easily send links to one another to specific resources (something Twitter got right despite some initial confusion over how to frame their content), etc.

If a website is actually just a delivery front end for a graphical RPG, well, obviously the game part of the site is probably best designed as a SPA, but the rest of the site — the forums, armory, character pages, beastiary, fan wiki, manual, guild rankings, lore pages, etc. — are absolutely best presented outside of that SPA as an actual website.

See the difference?

The game example is actually quite useful to contemplate for a variety of reasons. I’ll probably come back and cut this post down to just that part. Either that or eventually come back and rewrite the first bits to more accurately convey the humor with which I, as a graybeard resident in cyberspace for about 30 years now, view the state of the web today.

Whatever you do, dear reader, have fun coding, and remember: Don’t outsmart yourself!

Erlangers! USE LABELS! (aka “Stop Writing Punched-in-the-Face Code Blocks”)

Do you write lambdas directly inline in the argument list of various list functions or list comprehensions? Do you ever do it even though the fun itself, or the other arguments or return assignment/assertion for the call are too long and force you to scrunch that lambda’s definition up into an inline-multiline ball of wild shit? YOU DO? WTF?!?!? AHHHH!

First off, realize this makes you look like a douchebag for not being polite to other people or your future self whenever you do it. There is a big difference for the human reading between:

%%% From shitty_inline.erl

do_whatever(Keys, SomeParameter) ->
    lists:foreach(fun(K) -> case external_lookup(K) of
                  {ok, V} -> do_side_effecty_thing(V, SomeParameter);
                  {error, R} -> report_some_failure(R)
                end
          end, Keys
    ).

and

%%% From shitty_listcomp.erl

do_whatever(Keys, SomeParameter) ->
    [fun(K) -> case external_lookup(K) of
        {ok, V} -> do_side_effecty_thing(V, SomeParameter);
        {error, R} -> report_some_failure(R) end end(Key) || Key <- Keys],
    ok.

and

%%% From less_shitty_listcomp.erl

do_whatever(Keys, SomeParameter) ->
    ExecIfFound = fun(K) -> case external_lookup(K) of
            {ok, V} -> do_side_effecty_thing(V, SomeParameter);
            {error, R} -> report_some_failure(R)
        end
    end,
    [ExecIfFound(Key) || Key <- Keys],
    ok.

and

%%% From labeled_lambda.erl

do_whatever(Keys, SomeParameter) ->
    ExecIfFound =
        fun(Key) ->
            case external_lookup(Key) of
                {ok, Value}     -> do_side_effecty_thing(Value, SomeParameter);
                {error, Reason} -> report_some_failure(Reason)
            end
        end,
    lists:foreach(ExecIfFound, Keys).

and

%%% From isolated_functions.erl

-spec do_whatever(Keys, SomeParameter) -> ok
    when Keys          :: [some_kind_of_key()],
         SomeParameter :: term().

do_whatever(Keys, SomeParameter) ->
    ExecIfFound = fun(Key) -> maybe_do_stuff(Key, SomeParameter) end,
    lists:foreach(ExecIfFound, Keys).

maybe_do_stuff(Key, Param) ->
    case external_lookup(Key) of
        {ok, Value}     -> do_side_effecty_thing(Value, Param);
        {error, Reason} -> report_some_failure(Reason)
    end.

Which versions force your eyes to do less jumping around? How about which version lets you most naturally understand each component of the code independently? Which is more universal? What does code like this translate to after erlc has a go at it?

Are any of these difficult to read? No, of course not. Every version of this is pretty darn basic and common — you need a listy operation by require a closure over some in-scope state to make it work right, so you really do need a lambda instead of being able to look all suave with a fun some_function/1 type thing. So we agree, taken by itself, any version of this is easy to comprehend. But when you are reading through hundreds of these sort of things at once to understand wtf is going on in a project while also remembering a bunch of other shit code that is laying around and has side effects while trying to recall some detail of a standard while the phone is ringing… things change.

Do I really care which way you do it? In a toy case like this, no. In actual code I have to care about forever and ever — absolutely, yes I do. The fifth version is my definite preference, but the fourth will do just fine also.

(Or even the third, maybe. I tend to disagree with the semantic confusion of using a list comprehension to effect a loop over a list of values only for the side effects without returning a value – partly because this is semantically ambiguous, and also because whenever possible I like every expression of my code to either be an assignment or an assertion (so every line should normally have a = on it). In other words, use lists:foreach/2 in these cases, not a list comp. I especially disagree with using a listcomp when we the main utility of using a list comprehension is normally to achieve a closure over local state, but here we are just calling another closure — so semantic fail there, twice.)

But what about my lolspeed?!?

I don’t know, but let’s see. I’ve created five modules, based on the above examples:

  1. shitty_inline.erl
  2. shitty_listcomp.erl
  3. less_shitty_listcomp.erl
  4. labeled_lambda.erl
  5. isolated_functions.erl

These all call the same helpers that do basically nothing important other than having actual side effects when called (they call io:format/2). What we are interested in here is the generated assembler. What is the cost of introducing these labels that help the humans out VS leaving things all messy the way we imagine might be faster for the runtime?

It turns out that just like with using assignments to document your code, there is zero cost to label functions. For example, here is the assembler for shitty_inline.erl side-by-side with labeled_lambda.erl:

Oooh, look. The exact same stuff!

(This is a screenshot, a text file with the contents shown is here: label_example_comparison.txt)

See? All that annoying-to-read inline lambdaness buys you absolutely nothing. You’re not helping the compiler, you’re not helping the runtime, and you are hurting your future self and anyone you want to work with on the same code later. (Note: You can generate precompiler output with erlc -P and erlc -E, and assembler output with erlc -S. Here is the manpage. Play around with it a bit, BEAM and EVM are amazing platforms, wide open for exploration!)

So use labels.

As for execution speed… all of these perform basically the same, except for the last one, isolated_functions.erl. Here is the assembler for that one: isolated_functions.S. This outperforms the others, though to a relatively insignificant degree. Of course, it is only an “insignificant degree” until that part of the program is the most critical part of whatever your program does — then even a 10% difference may be a really huge win for you. In those cases it is worth it to refactor to test the speed of different representations against each version of the runtime you happen to be using — and all thoughts on mere style have to take a backseat. But this is never the case for the vast majority of our code.

(I’ve read reports in the past that indicate 99% of our performance bottlenecks tend to reside in less than 1% of our code by line count — but I can’t recall the names of any just now. If you happen to find a reference, let me know so I can update this little parenthetical blurb with some hard references.)

My point here is that breaking every lambda out into a separate named functions isn’t always worth it — sometimes an in-place lambda really is more idiomatic and easier to understand simply because you can see everything right there in the same function body. What you don’t want to see is multi-line lambdas squashed into argument lists that make things hard to read and give you the exact same result once compiled as labeling that lambda with a meaningful variable name on another line in the code and then referring to it where it is invoked later.

zUUID: An Example Erlang/OTP Project

I was talking with a friend of mine yesterday about how UUID v2 seems to have evaporated. We looked into things further and found its not actually included in RFC 4122! One thing led to another and I wound up writing an example project that is yet another UUID generator/utility in Erlang — but this time it actually has duplicate v1 and v2 detection/correction and implements as close to what I can find is defined as UUID version 2 values.

As there are already plenty of UUID projects around I focused on making this one as readable as I possibly could — to include exported documentation, in-source documentation, obvious variable names, full typespecs, my silly little “pure” notation, blatantly obvious bitstring syntax, and the obligatory github presence.

Hopefully some folks newish to Erlang will come along and explain to me what confuses them about that code, the process of writing it, the documentation conventions, etc. so that I can become a better literate programmer. Of course, since the last thing the world needs is another UUID implementation I suppose I would have had better luck with something at least peripherally related to the web. (>.<)

Methodologies in open source development

Prompted by an article on opensource.com about scrum in foss projects. This is an incomplete, impulsive rough draft and requires some major revisions.

Nothing makes me facepalm faster than when I get hired to work on a project and the first thing anyone tells me is “This project is going to be great because we are using the best methodologies, you know, like TDD and Scrum and everything. Totally perfect.”

WTF!? How about telling me about the problem the project solves or why it will be an improvement over whatever came before or talking about the architecture or something? There are only three conclusions I can draw from seeing that the only thing someone can think to say when I first enter a project is a word about methodology:

  1. They have no clue what the project actually does.
  2. There are no project goals.
  3. They are hiring me specifically because they have not yet encountered any other competent developers and they know it, yet (amazingly) they feel confident in dictating whatever they decided are “best practices” to me.

AHHHHHH!

Often this results in me backing out of a job, or leaving if I had tentatively agreed to it already — unless they are just going to pay enough to make watching a project tie itself in knots worth my time.

(Incidentally, I was talking with a guy from Klarna the other day and had the exact opposite experience. They know what problem their platform tackles and why it is a really good idea. So its not hopeless everywhere, just most places. What is really amazing is the guy I was speaking to wasn’t even a developer but understood what their main product is all about. That’s almost a reason to have actual hope for the future of that particular company. I don’t think I’ve written the word “hope” without prefacing it with a word like “unreasonable”, “false”, “sad” or “vain” in an article about software development.)

Today there is a problem with methodologies infecting and then blinding software projects (foss and otherwise). Its like competing flavors of ISIS taking over the ethos of various management and teams while sapping the reason from their adherents. Its not that Scrum or Agile or TDD are necessarily bad — there are a lot of good ideas that can be pulled from them — it is problematic that things like Scrum have become religions. Let’s put that in the big letters:

The problem isn’t this or that particular practice, it is the religion part.
— Me, just now

When it is absolutely unacceptable to question the utility of a methodology or specific coding practice you have a major problem. Selecting a particular set of practices without any regard to the context within which the methods or techniques are to be applied you are simply making uncritical, faith-based decisions. That is like hoping that a man who lives in the clouds will make everything better for you because you move a wooden block back and forth three times a day or kiss the ground a lot.

No, nevermind, actually it is worse than that, because you’ll only lose a few moments of your day moving a little block of wood around and kissing the ground is pretty quick. Neither has to bother anyone else who is trying to be productive right then. It is when you schedule daily or weekly meetings about block-moving techniques, force everyone to take a “vacation/working retreat” (oxymoron much?) to the tune of hundreds of thousands of dollars in company money to attend seminars given by charlatans about ground-kissing, and schedule weekend work time and “fun events” like 24-hour “hackathons” and “weekend sprints” to make up for the time lost in coordinating all the above activities. That’s way worse.

(Note that in the ranty paragraph above I’m not calling all scrum/agile/whatever coaches charlatans. The real coaches I’ve met (who have themselves written successful software) would agree entirely with what I’m writing here and tend to say straight out that chosen practices must match the context of the project. I am calling the seminar circuit and “methodology certification” guys charlatans. Those shitbags have learned that they can make crazy money by telling sweet, long, loud lies to management in a culture desperate for something to point value-blind investors at as a reason to throw good money after bad. Of course, this also implies that quite a bit of tech management is incompetent and most tech investors are just shooting in the dark.)

Without the ability to question a practice it becomes impossible to evaluate its impact on your goals. For example, sometimes TDD makes a lot of sense, especially when the product is a library. Srsly, if you write libraries and you don’t do TDD then you had better have some other word for something that amounts to the same thing. But sometimes tests — especially when they are, by necessity, integration tests of a non-trivial, comprehensive nature — can be a massive distraction, totally unhelpful, and a noticeable net loss to a project (especially project components that are literally untestable before the heat death of the universe).

The choice of a methodology or technique has to be a goal-based decision. Very often this means a business decision but sometimes its a project-goals-and-ethos type decision. Business decisions are the easiest because they are the most straightforward and the goals are relatively simple. It is a bit different in a foss project where adherence to a particular practice might be an underlying goal or a core part of the social value system that surrounds it. But even in a foss project it isn’t so hard to pin down a goal and determine how a particular practice when imposed as a rule will either help, hinder, or confuse the goal.

There is a general incongruency  between the context scrum was designed around and the context within which most foss projects exist. Scrum (and agile in general) is about customer-facing code; specifically in the case where the customer is a paying entity that is inexpert in the software being developed, but the developers are inexpert in the problem domain being solved. That is the primary Agile use case. It works well there — but this does not describe the situation of most foss projects.

Most foss projects are intended to be used by other software developers or techie users (system operators, interface specialists, DBAs, etc.), and are especially focused around infrastructure. In this case Agile simply doesn’t fit. Some documentation policies don’t even fit: there is a lot of blending among the idea that “code should be documented”, “code should be commented”, “comments should adhere to a markup that generates documentation”, “the API documentation is the manual” and “there is a product manual”. These are not at all the same things in certain markets, but in the open source world there are shades of meaning there spanning “the code is the documentation” to “literate code” to “we have a separate documentation project”.

I use foss software almost exclusively for work, but my customers don’t. They have no idea the difference, really, and don’t care over the span of a purchasing decision whether they are getting foss software or proprietary software — they only care that it works to solve their immediate business problem (that closed source sometimes introduces a different, longer-term business problem is not something they are generally prepared to understand the arguments for). In this case I have non-developer users paying for my (incidentally foss) software — and scrum/agile/whatever works very well there as a set of guidelines and practices to draw from when outlining my group’s workflow.

But the infrastructure stuff we do that sits behind all that — scrum/agile/whatever is totally insane. There is no “sit in with the customer to develop use cases” and holding a scrum meeting daily when it comes to network protocols that underlie the whole system, or the crypto suites that render the whole thing safe, or the database schemas that decompose the raw data into meanings relevant on the back-end. That sort of stuff is generally core-competency for foss developers, but it doesn’t fit the scrum or agile methodologies at all.

Only when you are working on a wicker-castle of a web project where concerns are so totally interwoven and mishmashed together that front-end customer-facing concerns become backend concerns does agile make any sense — but in this case it only makes sense because you are shipping a giant ball of mud and your hair is on fire every single day because you’ll never catch up with breakages (welcome to the actual modern paradigm that rules 90% of software projects I’ve seen…).

I’ve poked at a few different sides of development. What I’m really getting at is that there are different worlds of software development, and each world is suited better or worse by different methodologies that have been developed over time. (As an extreme case, consider the lunacy of scrum in most embedded projects.) No methodology I’ve ever seen fits any given project very well, but all of them absolutely work against the interests of a project that exists outside the context from which they originated. That could be because of time constraints (consider the infamous “docs don’t pay” attitude — yet it shouldn’t be dismissed outright because it is actually a valid business concern sometimes), the context of use doesn’t fit, or whatever.

In the same way that we (should) try to pick the best tools (languages, data stores, data structures, network paradigms, etc.) for a given job based on the nature of the task, we should approach selection of development methodology and team workflow based on the nature and context of the project.

Fedora: A Study in Featuritis

Its a creeping featurism! No, its a feeping creaturism! No, its an infestation of Feature Faeries! No, its Fedora!

I’ve been passively watching this thread (link to thread list) on the Fedora development list and I just can’t take anymore. I can’t bring myself to add to the symphony, either, because it won’t do any good — people with big money have already funded people with big egos to push forward with the castration of Fedora, come what may. So I’m writing a blog post way out here in the wilds of the unread part of the internet instead, mostly to satisfy my own urge to scream. Even if alone in the woods. Into a pillow. Inside a soundproof vault.

I already wrote an article about the current efforts to neuter Unix, so I won’t completely rehash all of that here. But its worth noting that the post about de-Nixing *nix generated a lot more support than hatred. When I write about political topics I usually get more hate mail than support, so this was unique. “But Unix isn’t politics” you might naively say — but let’s face it, the effort to completely re-shape Unix is nothing but politics; there is very little genuinely new or novel tech going on there (assloads of agitation, no change in temperature). In fact, that has ever been the Unix Paradox — that most major developments are political, not technical in nature.

As an example, in a response to the thread linked above, Konstantin Ryabitsev said:

So, in other words, all our existing log analysis tools have to be modified if they are to be of any use in Fedora 18?

In a word, yes. But what is really happening is that we will have to replace all existing *nix admins or at a minimum replace all of their training and habits. Most of the major movement within Fedora from about a year ago is an attempt to un-nix everything about Linux as we know it, and especially as we knew it as a descendant in the Unix tradition. If things keep going the way they are OS X might wind up being more “traditional” than Fedora in short order (never thought I’d write that sentence — so that’s why they say “never say ‘never'”).

Log files won’t even be really plain text anymore? And not “just” HTML, either, but almost definitely some new illegible form of XML by the time this is over — after all, the tendency toward laughably obfuscated XML is almost impossible to resist once angle brackets have made their way into any format for any reason. Apparently having log files sorted in Postgres wasn’t good enough.

How well will this sit with embedded systems, existing utilities, or better, embedded admins? It won’t, and they aren’t all going to get remade. Can you imagine hearing phrases like this and not being disgusted/amused/amazed: “Wait, let me fire up a browser to check what happened in the router board that only has a serial terminal connection can’t find its network devices”; or even better, “Let me fire up a browser to check what happened in this engine’s piston timing module”?

Unless Fedora derived systems completely take over all server and mobile spaces (and hence gain the “foist on the public by fiat” advantage Windows has enjoyed in spite of itself) this evolutionary branch is going to become marginalized and dumped by the community because the whole advantage of being a *nix admin was that you didn’t have to retrain everything every release like with Windows — now that’s out the window (oops, bad pun).

There was a time when you could pretty well know what knowledge was going to be eternal (and probably be universal across systems, or nearly so) and what knowledge was going to change a bit per release. That was always one of the biggest cultural differences between Unix and everything else. But those days are gone, at least within Fedoraland.

The original goals for systemd (at least the ones that allegedly sold FESCO on it) were to permit parallel service boot (biggest point of noise by the lead developer initially, with a special subset of this noise focused around the idea of Fedora “going mobile” (advanced sleep-states VS insta-boot, etc.)) and sane descendant process tracking (second most noise and a solid idea), with a little “easy to multi-seat” on the side to pacify everyone else (though I’ve seen about zero evidence of this actually getting anywhere yet). Now systemd goals and features have grown to cover everything to include logging. The response from the systemd team would likely be”but how can it not include logging?!?” Of course, that sort of reasoning is how you get monolithic chunk projects that spread like cancer. Its ironic to me that when systemd was introduced HAL was held up as such a perfect example of what not to do when writing a sub-system specifically because it became such an octopus — but at least HAL stayed within its govern-device-thingies bounds. I have no idea where the zone of responsibility for systemd starts and the kernel or userland begins anymore. That’s quite an achievement.

And there has been no end to resistance to systemd, and not just because of init script changeover and breakages. There have been endless disputes about the philosophy underlying its basic design. But don’t let that stop anybody and make them think. Not so dissimilar to the Gnome3/Unity flop.

I no longer see a future where this distro and its commercially important derivative is the juggernaut in Linux IT — particularly since it really won’t be Linux as we understand it, it will be some other operating system running atop the same kernel.

Come to think of it, changing the kernel would go over better than making all these service and subsystem changes — because administrators and users would at least still know what was going on for the most part and with a change in kernel the type of things that likely would be different (services) would be expected and even well-received if they represented clear improvements over whatever had preceded them.

Consider how similar administering Debian/Hurd is to administering Debian/Linux, or Arch/Hurd is to administering Arch/Linux. And how similar AIX and HP/UX are to administering, say, RHEL 6. We’re making such invasive changes through systemd that a change of kernel from a monolothic to a microkernel is actually more sensible — after all, most of the “wrangle services atop a kernel a new way” ideas are already managed a more robust way as part of the kernel design, not as an intermediate wonder-how-it’ll-work-this-week subsystem.

Maybe that is simpler. But it doesn’t matter, because this is about deliberately divisive techno politicking on one side (in the vain hope that “if our wacko system dominates the market, we’ll own the training market by default even if Scientific Linux and CentOS still dominate in raw numbers!”), and ego masturbation on the other (“I’ll be such a rebel if I shake up the Unix community by repeatedly deriding so-called ‘Unix traditions‘ as outdated superstitions and generally giving the Unix community the bird!”) on the other.

Here’s a guide to predicting the most likely outcomes:

  • To read the future history* of how these efforts work out as a business tactic, check the history of Unix from the mid-1980’s to early 2000’s and see how well “diversification” in the interest of carving out corporate empires works. I find it strikingly suitable that political abuse of language has found its way into this effort — conscious efforts at diversification (defined as branching away from every other existing system, even your own previous releases) is always performed under the label of “standardization” or “conformance to existing influences and trends”. Har har. Joke’s on you, though, Fedora. (*Yeah, its already written, so you can just read this one. Easy.)
  • To predict the future history of a snubbed Unix community, consider that the Unix community is so used to getting flipped the bird by commercial interests that lose their way that it banded together to write Linux and the entire GNU constellation from scratch. Consider also that the original UNIX was started by developers who were snubbed and felt ill at ease with another, related system whose principal flaw was (ironically) none other than the same featuritis the Linux community is now enduring.

I don’t see any future where Fedora succeeds in any of its logarithmically expanding goals as driven by Red Hat. And with that, I don’t see a bright future for Red Hat beyond v7 if they don’t get this and other priorities sorted**. As a developer who wishes for the love of everything holy that I could just focus on developing consumer business applications, I’m honestly sad to say that I’m having to look for a new “main platform” to develop for, because this goose looks about cooked.

** (sound still doesn’t work reliably — Ekiga is broken out of the box, Skype is owned by Microsoft now — Fedora/Red Hat don’t have a prayer at getting on mobile (miracles aside) — nobody is working on anything solid to stand a business on once the “cloud” dream bubble pops — virtualization is already way overinvested in and done better elsewhere already anyway — easy-to-fix media issues aren’t being looked at — a new init system makes everything above worse, not better, and is distracting and requires admins to completely retrain besides…)

Lean on Your Database

Doing computations in the database is almost always the right answer. Writing your own database procedures instead of relying on an ORM framework is even better. These go hand in hand since ORMs don’t allow you enough control over your data schema to define calculations in the first place (most can’t even properly handle multi-column primary keys and instead invent meaningless integer ID for everything; this should tell you something). Many, maybe most (at least that I’ve met), web developers don’t even know what sort of calculations are available to be performed in the database because they’ve been taught, in an absolute vacuum of personal experience, that “SQL hard. Relational thinking hard. OOP good. Trust framework”. There is so much missing here.

Frameworks try to be “database agnostic”. This is fundamentally flawed thinking. This implies that the data layer is merely there for “persistence” and that the “persistence layer” can be whatever — all RDBMS systems “aren’t OO and therefore suck” and so any old database will do. If this is true then it follows that frameworks and applications should be designed to work against any database system and not delve too deeply into any specific feature sets — after all, the application functionality is the focus, not the data, right? This is exactly backwards.

Even forgetting that this condemns you to least-common-denominator data design, this is still exactly backwards. Let me put the right way on a line all by itself, because it just that important:

Data designs should strive to be application agnostic.

Data drives everything. Your functions are what you can change around easily, but your data schema is critical and represents everything about your system logic. If you show me a well-labeled data schema I can probably guess what you are trying to do, but if you show me just your functions and objects I’ll require either a code tour or a lot of familiarization time before getting anything serious done (that project documentation will be lacking is a truism not worth addressing here).

Consider that changing your app code is cake whereas changing your data schema is major project surgery. OOP has us so in a stupor that we think if we just get our objects right everything will be fine. It won’t. Ever. As long as you think that you’ll also believe other crap like that each object should map directly to a table. There are certain basic truths about certain types of data. It is striking that I can give a data requirement to two DBAs schooled on two different RDBMSes and ask for a normalized data model (let’s just say NF3 for argument) and get back two very similar looking schemas, but I can give a feature requirement to two Java programmers and get back radically different system designs.

This should tell us something. In fact, it screams the truth that data is a foundation from which you must work up toward the application code, not the other way around. The database layer is the most important place to make sound choices. The choice of database system itself should be based on project requirements, because that choice matters. Most critically, I’ll say it again here because it is so important and implies so much on contemplation: the database designs should strive to be application agnostic.

Racing to remove the last Nix

This post was prompted by a discussion on ScientificLinuxForum. The subject of this post diverts significantly from the original discussion, so I’ve placed it here instead. The thread was initially about the release of RHEL 6.3, but discussions there have a tendency to wander, particularly since many are worried we are in the last days of free computing with the advent of UEFI lock-down, DRM-Everything and new laws which prevent the digital equivalent of changing your own oil, but this post just doesn’t belong in the thread and may be of interest to a more general audience.

Unix philosophy is shifting. We can see it everywhere. Not too long ago on a Fedora development list an exchange equivalent to:

“I wanna do X, Y, and Z this new way I just made up and everyone says its bad. Why?”
“It breaks with everything Unix has done for 40 years that is known to work.”
“I don’t care what Unix has done. I want to make it work this way instead.”
“Its insecure.”
“ummm… oh…”
“Besides, it introduces binary settings so the user can’t adjust and fix them manually if the system goes to poo. So users can’t write scripts to change their settings without going through an API you’ve yet to even consider writing causing more work for everyone, and at the same time security is going to suffer. Besides, telling someone to reinstall their OS because one file got corrupted is not acceptable by our way of thinking.”
“uhhhhh… oooh?”

Let me be clear, there is the world of Unixy operating systems, there is the Unix design philosophy, and then there is the Unix religion. Part of the fun in a flame war is detailing how your opponent is a proponent of whatever part of the spectrum would most undermine their position at the time (usually the religion accusation is thrown, unless someone makes a dive straight to Nazism). The problem with dividing the world of grown-up operating systems into three stripes that way, though, is that it misses why a religion evolved in the first place.

Religion is all about belief, in particular a belief in what is safe and reliable. If I don’t offend God I’m more likely to get into Heaven — that’s safe and reliable. If I don’t give every process the ability to write arbitrarily then I’m less likely to have problems — that’s safe and reliable. Whatever God is up to I’m not really sure, he hasn’t let me in on it all, but that restrictions to write access prevent things like a rogue process (malicious, buggy or deliciously buggy) from DoS’ing the system by filling it up with garbage is something I can understand.

But not everyone can understand that, just like I can’t understand God. That’s why we have guidelines. er, religions. The fun part about the Unix religion is that its got a cultish flair, but the most functional part about it is that its effects can be measured and generally proved (heuristically or logically if not formally) to be better or worse for system performance and service provision.

It is good to question “why” and be a rebel every so often, but you’ve got to have a point to your asking and you’ve got to be prepared to hear things you may not have expected — like the response “Its insecure” which may be followed by an ego-demolishing demonstration. But people don’t like having their egos demolished and they certainly hate studying up on new-things-that-are-actually-old and yet still adore the question “why” because it sounds so revolutionary and forward-thinking.

But IT people are educated, right? They are good at dealing with detailed situations and evaluating courses of action before committing to this or that plan, right? Its all about design, right?

I’m here to tell you that we’ve got problems.

We are absorbing, over time, less talented and grossly inexperienced developers across all of techdom. It started with tossing C in favor of Java, and now even that in favor of Ruby in some places because its like “easier Java… and… hey, Rails!”. (This isn’t to say that Ruby is a bad language, but certainly that it shouldn’t be the only one you know or even the first one you learn.) Almost no universities treat hard math or electrical engineering courses as a prerequisite for computer science any more. In fact, the whole concept of putting hard classes first to wash out the stupid or unmotivated has nearly evaporated. This is not just in CS courses, but the dive has been particularly steep there. These days, as ironic as it may seem, the average programmer coming from school knows next to nothing about what is happening within the actual machine whereas a hobbyist or engineer coming from another field who is fascinated with machine computation understands quite a bit about such things.

Part of it probably has a lot to do with motivation. A large percentage of university students are on a conscious quest for paper, not knowledge, and want to get rich by copying what is now an old idea. That is, they all dream of building the next Facebook (sorry, can’t happen, Facebook will version up; at best you might get hired by them, loser). On the other hand every hobbyist or out-field engineer who spends personal time studying sticky problems in computer science on their own time is genuinely interested in the discipline itself.

It is interesting to me that most of my self-taught friends have either worked or are working through the MIT open coursework on SICP, K&R, Postgres source tours, and a variety of other fairly difficult beginner and advanced material (and remember their reference points remarkably well), while most of the CS graduates I know are more interested in just chasing whatever the latest web framework is and can’t explain what, say, the C preprocessor does. Neither group spends much time writing low-level code, but the self-educated group tends to have some understanding at that level and genuinely appreciates opportunities to learn more while many of the provably educated folks don’t know much, and don’t care to know much, about what is happening within their machines. (That said, I would relish the chance to go to back to school — but since I know I’ll never have the chance I’ve just got to read the best sources I can find and have my own insights.)

This has had a lot of different effects. In the past as a community we had a problem with the Not Invented Here syndrome (aka NIH — yes, its got its own acronym (and sometimes there are good reasons to make NIH a policy)) and sometimes deliberate reinventing of the wheel. Now we have the even worse problems of Never Heard of That Before and Let’s Jam Square Pegs Where They Don’t Belong (like, try to coerce the Web into being an applications development framework instead of being a document linking and publication service, for example).

A lot of concerns have been raised over the last few years about the direction that Unix has been headed in (or more specifically, a few very popular consumer-oriented distributions of Linux which represent the majority of Unix in desktop and tablet use today). There are issues ranging from attempts to move settings files from plain text to binary formats, efforts to make the desktop into one giant web page, efforts to make the system behave more Windows-like (give anyone the privileges to install whatever packages they want into unrestricted environments (protip: toy with the last two words here — there is a solution…)), and many other instances which scream of misinterpreting something that is near to someone’s experience (“easy”) as being less complex (“simple”). Some of these are just surface issues, others are not. But most grind against the Unix philosophy, and for good reason.

Most of these un-Unixy efforts come from the “new” class of developer. These are people who grew up on Windows and seem determined to emulate whatever they saw there, but within Unix. Often they think that the way to get a Unix to feel like Windows is to muck with the subsystems. Sometimes this is because they think that they know better, sometimes this is because they realize that the real solutions lie in making a better window manager but since that is hard subsystems are the easier route (and this feels more hackish), but most often it is simply because they don’t understand why things work they way they do and lack the experience to properly interpret what is in front of them. What results are thoughts like “Ah, I wish that as an unprivileged user I could install things via binary bundle installers, like off downloads.com in Windows, without remembering a stupid password or using some stupid package manager and get whatever I want. I can’t remember my password anyway because I have the desktop set to auto-login. That would put me in charge as a user!” Of course, they think this without ever realizing that this situation in Windows is what puts East European and Chinese government crackers in charge of Windows worldwide.

This gets down to the core of operating system maintenance, and any system administrator on any operating system knows that, but the newcomer who wants to implement this “feature” doesn’t. What they think is “Linux permissions are preventing me from doing that? Linux permissions must be wrong. Let’s just do away with that.” and they go on to write an “extension” which isn’t an extension at all, but rather a huge security flaw in the system. And they do it deliberately. When others say “that’s a bad idea” they say “prove it” and accusations of religious fundamentalism soon follow.

But there could have been a better solution here. For example, group permissions were invented just for this purpose. There is (still) a wheel group in every Linux I’ve seen. There’s even still  a sys group. But I’ve seen them actually used properly once or twice, ever — instead we have another triangular wheel which has been beaten round over the years called sudo and a whole octopus of dangly thingies called PAM and SE domains and… and… and… (do we really want one more?)

Anyway, {groups, [insert favorite permissions system]}  aren’t a perfect solution but they go a long way to doing things right in a simple manner without a lot of mucking about with subsystem changes. Back in the Old Days users had the same concerns, and these systems were thought out way back then. But people don’t go back and research this sort of thing. Learning old, good idea is hard. Not really to do, but to sit still and think long enough to understand is hard for a lot of people. There is a wealth of knowledge scattered throughout the man pages, info docs and about a bajillion websites, open source books, mailing list archives, newsgroup archives, design documents, formal treatments, O’Reilly books, etc. (what?!? books?!? How old fashioned! I’m not reading a word further!) but few people take the time to discover these resources, much less actually use them.

SELinux is another good idea someone had. But its not immediately obvious to newcomers so most folks just turn it off because that’s what someone else said to do. This is totally unnecessary but its what a lot of people do. It also gets very little development attention on Ubuntu, the most Windows-like Linux distro, because that distro has the highest percentage of uneducated ex-Windows users. You know what most exploits are written for? SELinux disabled Ubuntu boxes running a variety of closed-source software (Adobe products are pretty high on the list, but there are others) and unsecured web services (PHP + MySQL (i.e. hacked up Drupal installations) top the list, but to be fair they are the most prolific also). An example of the misconceptions rampant in the Ubuntu community is that running something in a chroot makes something “secure” because it is colloquially called a “chroot jail“. When told that chroot doesn’t really have anything to do with security and that a process can escape from a chroot environment if it wants to they get confused or, even funnier/sadder, want to argue. They can’t imagine that subsystems like mock depend on chroot for reasons other than security.

Why on earth would anyone disable a tool like SELinux if they are going to digitally whore their system out all over the internet by exposing the sensitive bits the way PHP programs do? Because they just don’t know. Before turning it off, no Apache screen. After turning it off, feathers! Before turning off SELinux and installing Flash no pr0nz on the screen just a black box that said something was broken on pornhub.com. After turning it off, tits! The immediate effect of turning it off is simple to understand; the long-term effect of turning it off is hard to understand; learning the system itself requires grokking a new concept and that’s hard. That’s why. And even better, the truly uninformed think that setenforce 0 is some slick haX0r trick because its on the command line… oooh.

So, simply put, pixels.

Pixels is the interest these days. Not performance, not sane subsystems, not security, not anything else. The the proper arrangement of pixels. Pixels can put tits on the screen, security subsystems and text configuration files can’t do that — at least, the connection between the two is impossible to manage for the average ex-Windows user.

The new users coming to Linux trying to Dozify it are doing so in the pure interest of pixels and nothing more. They don’t know much about information theory, relational data theory or any of the other things that people used to be compelled to study (“nuh uh! I learnt how to make Drupal show words on the screen, so I know about RDBMSs!”). Many mistake the information in a howto on a blog for systems knowledge, and most will never actually make the leap from knowledge to wisdom. They tinker with Linux but most of that tinkering doesn’t involve exploration as much as it involves trying to reshape it in the image of an OS they claim to be escaping. They can tinker with Linux because you just can, and you can’t with OS X or Windows.

You can make Linux your own. This is the right reason to get involved, whether your motivation is primarily pixels or whatever, any reason is a good reason to be interested in new development. But you can’t roll in assuming you know everything already.

And that’s the core problem. Folks show up in Linux land thinking they know everything, willing to break over 40 years of tremendous computing success and tradition. Some people even going so far as to arrive with prior intent to break things just for the social shock value. But ultimately its all in the interest of pixels.

But we don’t have to compromise the underlying OS and subsystems to get great desktop performance, run games, get wacky interactive features that aren’t available anywhere else, do multimedia (legally via Fluendo or via more natural means), or even just put tits on the screen. In fact all those things were possible (even easy) about a decade ago on Linux, but few people knew enough about the different components to integrate them effectively. What we need is developers who are educated enough about those separate systems to develop competently within and atop them without creating n00beriffic, monolithic junk designs that spread dependencies like cancer across the entire system.

The original triad of RPM, Yum and PackageKit was a great example of how to do it right — not perfect, but very nearly. They were linearly dependent, and the dependencies were exclusively top-down, accepting for necessary core system libraries/runtimes (the presence of Python, openssh and Bash, for example, is not an unreasonable expectation even on a pretty darn slim system).

But then someone comes along and wants to make PackageKit able to notify you with an audio alert when there is something worth updating — and instead of developing a modular, non-entangled extension that is linearly dependent on PackageKit, and not knowing well enough how to design such things nor willing to take the time to read PackageKit and grok it first, the developer decides to just “add a tiny feature to PackageKit” — which winds up making it grow what at first appears to be a single, tiny dependency: PulseAudio.

So now PackageKit depends on a whole slew of things via PulseAudio that the new feature developer didn’t realize, and over time those things grow circular dependencies which trace back to the feature in PackageKit which provided such a cute little audio notifier. This type of story gets even more fun when the system becomes so entangled that though each component comes from wildly differing projects no individual piece can be installed without all the others. At that point it matters not whether a dependency is officially up, down or sideways relative to any other piece, they all become indirectly dependent on everything else.

HAL got sort of like that, but not through external package dependencies — its dependencies got convoluted on the inside within its own code structure, which is just a different manifestation of the same brand of digital cancer. Actually, gcc is is need of some love to avoid the same fate, as is the Linux kernel itself (fortunately the corrosion of both gcc and the kernel is slower than HAL for pretty good reasons). This sort of decay is what prompts Microsoft to ditch their entire code base and start over every so often — they can’t bear to look at their own steaming pile after a while because it gets really, really hard and that means really, really expensive.

In the story about PackageKit above I’m compressing things a bit and audio alerts is not the way PackageKit got to be both such a tarbaby and grow so much hair at the same time (and it is still cleanly detachable from yum and everything below) — but it is a micro example of how this happens, and it happens everywhere that new developers write junk add-on features without realizing that they are junk. A different sort of problem crops up when people don’t realize that what they are writing isn’t the operating system but rather something that lives among it its various flora, and that it should do one thing well and that’s it.

For example I’m a huge fan of KDE — I think when configured properly it can be the ultimate desktop interface (and isn’t too shabby as a tablet one, either) — but there is no good reason that it should require execmem access. Firefox is the same way. So is Adobe Flash. None of these programs actually require access to protected memory — they can run whatever processes they need to within their own space without any issue — but they get written this way anyway and so this need is foisted on the system arbitrarily by a must-have application. Why? Because the folks writing them forgot that they aren’t writing the OS, they are writing an application that lives in a space provided by the OS, and they are being bad guests. Don’t even get me started on Chrome. (Some people read an agenda into why Flash and Chrome are the way they are — I don’t know about this, but the case is intriguing.)

Some distros are handling these changes better than others. The ones with strong guidelines like Fedora, Arch and Gentoo are faring best. The ones which are much further on the “do whatever” side are suffering a bit more in sanity. Unfortunately, though, over the last couple of years a few of the guidelines in Fedora have been changing — and sometimes not just changing a little because of votes, but changing because things like Firefox, systemd, PulseAudio, PackageKit, etc. are requiring such changes be made in order to function (they haven’t gone as far as reversing library bungling rules completely to let Chrome into the distro, but its a possibility).

To be polite, this is an interesting case of it being easier to re-write the manual than to fix the software. To be more blunt, this is a guideline reversal by fiat instead of vote. There is clear pressure from obviously well-monied quarters to push things like systemd, Gnome3, filesystem changes  and a bunch of other things that either break Fedora away from Linux or break Linux away from what Unices have always been. (To be fair, the filesystem changes are mostly an admission of how things work in practice and an opportunistic stab at cleaning up /etc. Some of the other changes are not so simply or innocently explained, however.)

This is problematic for a whole long list of technical reasons, but what is says about the business situation is a bit disconcerting: the people with the money are throwing it at people who don’t grok Unix. The worst part is that the breaking of Linux in an effort to commit such userland changes is completely unnecessary.

Aside from a very few hardware drivers, we could freeze the kernel at 2.6, freeze most of the subsystems, and focus on userland changes and produce a better result. We’re racing “forward” but I don’t see us in a fundamentally different place than we were about ten years ago on core system capabilities. This is a critical problem with a system like Windows, because customers pay through the nose for new versions that do exactly what the old stuff did. If you’re a business you have a responsibility to ask yourself what you can do today with your computers that you couldn’t do back in the 90’s. The idea here is that the OS isn’t what users are really interested in, they are interested in applications. Its harder to write cool applications without decent services being provided, but they are two distinctly different sets of functionality that do not have any business getting mixed together.

In fact, Linux has always been a cleanly layered cake and should stay that way. Linux userland lives atop all that subsystems goo. If we dig within the subsystem goo itself we find distinct layers there are well that have no business being intertwined. It is entirely possible to write a new window manager that does crazy, amazing things that were unimagined by anyone else before without touching a single line of kernel code, messing with the init system, or growing giant, sticky dependency tentacles everywhere. (Besides, any nerd knows what an abundance of tentacles leads to…)

The most alarming issue over the longer-term is that everyone is breaking Linux differently. If there was a roadmap I would understand. Sometimes its just time to say goodbye to whatever you cling to and get on the bus. But at the moment every project and every developer seems to be doing their own thing to an unprecedented degree. There has been some rumbling that a few things emanating from RH in the form of Fedora changes are deliberate breaks with Unix tradition and even the Linux standard, and that perhaps this is in an effort to deliberately engender incompatibility with other distros. That sounds silly in an open source world, but the truth of the business matter with infrastructure components is (and to be clear, platform equates to infrastructure today) that while you can’t lock out small competitors emerging or users doing what they want, without enormous funding no newcomer can make a dent in the direction of the open source ecosystem without very deep pockets.

Consider the cost of supporting just three good developers and their families for two years in a way that they feel comfortable about their career prospects after that two years. This is not a ton of money, but I don’t see a long line of people waiting to plop a few hundred thousand down on a new open source business idea until after its already been developed (the height of irony). There are a few thousand people willing to plop down a few million each on someone selling them the next already worn-out social networking scheme, though. This is because its easy to pitch a big glossy brochure of lies to suckers using buzzwords targeting an established market but difficult to pitch creation of a new market because that requires teaching a new idea; as noted above, people hate having to work to grasp new ideas.

Very few people can understand the business argument for keeping Linux as a Unixy system and how that can promote long-term stability while still achieving a distro that really can do it all — be the best server OS and maintain tight security by default while retaining the ever-critical ability to put tits on home user’s screens. Just as with developers where effort and time isn’t the problem but rather understanding, with investors the problem isn’t a lack of batteries but rather a lack of comprehension of the shape of the computing space.

Ultimately, there is no reason we have to pick between having a kickass server, a kickass desktop, a kickass tablet or a kickass phone OS, even within the same distro or family. Implementing a sound computing stack first and giving userland wizards something stable to work atop of is paramount. Breaking everything to pieces and trying to make, say, the network subsystem for “user” desktops work differently than servers or phones is beyond missing the point.

Recent business moves are reminiscent of the dark days of Unix in the 80’s and early 90’s. The lack of a direction and deliberate backbiting and sidedealing with organizations which were consciously hostile to the sector in the interest of short-term gain set back not just Unix, but serious computing on small systems for decades. This is, not to mention, that it guaranteed that the general population became acquainted with pretty shoddy systems and were wide open to deliberate miseducation about the role of computers in a work environment.

Its funny/scary to think that office workers spend more hours a day touching and interacting with computers than carpenters spend interacting with their tools, but understand their tools almost none at all whereas the carpenter holds a wealth of detailed knowledge about his field and the mechanics of it. And before you turn your pasty white, environmentally aware, vegan nose up at carpenters with the assumption that their work is simple or easy to learn, let me tell you from direct experience that it is not. “Well, a hammer is simpler than a computer and therefore easier to understand.” That is true about a hammer, but what about the job the carpenter is doing or his other tools, or more to the point, the way his various tools and skills interact to enable his job as a whole? Typing is pretty simple, too, but the scope of your job probably is not as simple as typing. Designing or even just building one house is a very complex task, and yet it is easier to find a carpenter competent at utilizing his tools to build a house than an office worker competent at utilizing his tools to build a solution within what is first and foremost an information management problem domain.

That construction crewmen with a few years on the job hold a larger store of technical knowledge to facilitate their trade than white-collar office workers with a few years on the job do to facilitate theirs is something that never seems to occur to people these days. When it does  it doesn’t occur to the average person something is seriously wrong with that situation. Nothing seems out of place whether the person perceiving this is an office worker, a carpenter or a guy working at a hot dog stand. We have just accepted as a global society that nobody other than “computer people” understands computing the same way that Medieval Europeans had just accepted that nobody other than nobility, scribes and priests could understand literacy.

It is frightening to me that a huge number of college educated developers seem to know less about how systems work than many Linux system administrators do unless we’re strictly walking Web frameworks. This equates to exactly zero durable knowledge since the current incarnation of the Web is built exclusively from flavor-of-the-week components. That’s all to the benefit of the few top players in IT and to the detriment of the user, if not actually according to a creepy plan somewhere. There probably was never a plan that was coherent and all thought up at once, of course, but things have clearly been pushed further in that direction by those in the industry who have caught on since the opportunity has presented itself. The “push” begins with encouraging shallow educational standards in fundamentally deep fields. Its sort of like like digital cancer farming.

Over in my little corner of the universe I’m trying hard to earn enough to push back against this trend, but my company is tiny at the moment and I’m sure I’ll never meet an investor (at least not until long after I really could use one). In fact, I doubt any exist who would really want to listen to a story about “infrastructure” because that admits that general computing is an appliance-like industry and not an explosive growth sector (well it is, but not in ways that are hyped just now). Besides, tech startups are soooo late 90’s.

Despite how “boring” keeping a stable system upon which to build cool stuff is, our customers love our services and they are willing to pay out the nose for custom solutions to real business problems — and this is SMBs who have never had the chance to get custom anything because they aren’t huge companies. Basically all the money that used to go to licensing now goes to things that actually save them money by reducing total human work time instead of merely relocating it from, say, a typewriter or word processor to a typewriter emulation program (like Writer or Word). This diversion of money from the same-old-crap to my company is great, but its slow going.

For starting from literally nothing (I left the Army not too long ago) this sounds like I’ve got one of those classic Good Things going.

But there is a problem looming. We’re spending all our time on custom development when we should be spending at least half of that time on cleaning up our upstream (Fedora, Vine and a smattering of specific upstream projects) to get that great benefit of having both awesome userland experiences and not squandering the last Nix left. If we can’t stick to a relatively sane computing stack a lot of things aren’t going to work out well over the long-term. Not that we or anyone else is doomed, but as a community we are certainly going to be spending a lot of time in digital hamster wheel fixing all the crap that the new generation of inexperienced developers is working overtime to break today.

As for my company, I’d like to hop off this ride. I know we’re going to have to change tack at some point because the general community is headed to stupid land as fast as it can go. The catch is, though, answering the question of whether or not I can generate enough gravity in-house to support a safe split or re-center around something different. Should I take over Vine by just hiring all the devs full-time? Revolve to HURD? Taking over Arch or Gentoo might be a bit much, but its got some smart folks who seem to grok Unix (and aren’t so big that they’ve grown the Ubuntu disease yet).  Or can I do what I really want to do: pour enough effort into sanifying Fedora and diversifying its dev community that I can use that as a direct upstream for SMB desktops without worry? (And I know this would benefit Red Hat directly, but who cares — they aren’t even looking at the market I’m in, so this doesn’t hurt anybody, least of all me. Actually, just for once we could generate the kind of synergistic relationship that open source promised in the first place. Whoa! Remember that idea?!?)

Things aren’t completely retarded yet, but they are getting that way. This is a problem deeper than a few distros getting wacky and attracting a disproportionate number of Windows refugees. It is evident in that I am having to cut my hiring requirement to “smart people who get shit done” — however I can get them. I have to train them completely in-house in the Dark Arts, usually by myself and via surrogate example, because there are simply no fresh graduates who know what I need them to know or think the way I need them to think. It is impossible to find qualified people from school these days. I’ve got a lot of work to do to make computing as sensible as it should be in 2012.

I might catch up to where I think we should be in 2012 by around 2030. Meh.

A Note on “Web Applications”

This is a subject worthy of a series of its own, but a short scolding will have to do for now…

The “Web” is not an applications environment. At all. Web security is a joke. The whole idea is a joke. That’s like trying to have “newspaper security” because the whole point of HTTP is untrusted, completely public, unthrottled publication of textual data and that’s it. Non-textual data is linked in, not even native within a document (ever heard of, say, an <img> tag or that whole series that starts <a something=””>?) and various programs that can use HTTP to read HTML and related markup can (or can’t) fetch extra stuff, but the text is the core of it. Always.

People get wrapped around the axle trying to develop applications that run within a browser and always hit funny walls when trying to deliver interactivity. We’ve got a whole constellation of backhacks to HTTP now that are used to pretend that a website can be an application — in fact at this point I think probably more time is spent working within this kludged edge case than any other specific area of computing, and that’s really saying something. It says a lot about the need for an applications protocol that really can do the things we wish the Web could do and it speaks volumes about how little computing is actually understood by the majority of people who call themselves developers today.

If “security” means being selective about who can see what, then you need to not be using the web at all. Websites are all-or-nothing, whether we delude ourselves that we can put layers of “security” backhacks over it to act against its nature or not.

If “security” means being selective about who can make changes to the data presented on a website, you need to make modifications to data by means other than web forms. And yes, I mean you should be doing the “C” “U” and “D” parts of CRUD from within a real program working over a protocol that is made for this purpose (and therefore inherently securable, not hacked up to be sort-of securable like HTTPS) and ditch your web admin interfaces*. That means the Web can be used as a presentation face for your data, as it was intended (well, dynamic pages is a bit of a hack of its own, but it doesn’t fundamentally alter the concept underlying the design of the protocols involved), and your data input and interactivity faces are just applications other than the browser. Think World of Warcraft, the game, the WoW Armory, and the WoW Forums. Very different faces to match very different use cases and security contexts, but all are connected by a unified data concept centered on character.

Piggybacking ssh is a snap and the forgotten ASN.1 protocol is actually easier to write against than JSON or XML-RPC, but you might have to write your own library handler, though this is true even for JSON as well, depending on your environment.

[*And a note here: the blog part of this site is currently running on WordPress, and that’s a conscious decision against having real security based on what I believe the use case to be and the likelihood of someone taking a hard look at screwing my site over. This approach requires moderation and a vigilant eye. My business servers don’t work this way at all and can serve pages built from DB data and serve static pages built from other data, but that’s it — there is no web interface that permits data alteration at all on those sites I consider to be actually important. Anyway, even our verbiage is screwed up these days. Think about how I just used the word “site”. (!= site server service application)]

I can hear you whining now, “But that requires writing real applications that can touch the database or file system or whatever and that’s <i>hard</i> and requires actual study! And maybe I’ll have to write an actual data model instead of rolling with whatever crap $ORM_FRAMEWORK spatters out and maybe that means I have to actually know something about databases and maybe that means I’ll discover that MySQL is not good at handling normalized data and that might lead me to use Postgres or DB2 on projects and management and marketing droids don’t understand things that aren’t extensively gushed over by the marketing flax at EC trade shows and… NOOOoooo~~~!” But I would counter that all the layers of extra bullshit that are involved in running a public-facing “Web 2.0” site today are way more complicated, convoluted and extremely removed from a sane computing stack and therefore, at this point, much more involved, costly and “hard” to develop and maintain than real applications are.

In fact, since all of your “web applications” are built on top of a few basic parts, but in between most web developers use gigantic framework sets that are a lot larger and harder to learn and way shorter-lived than the underlying stack, your job security and applications security will both improve if you actually learn the underlying SQL, protocol workings, and interfaces involved in the utility constellation that comprises your “application” instead of just leaving everything up to the mystery of whatever today’s version of Rails, Dreamweaver or $PRODUCT is doing to/with your ideas.

I’m not bashing all frameworks, some of them are helpful and anything that speeds proper development is probably a good thing (and “proper” is a deliberately huge and gray area here — which is why development is still more art than science and probably always will be). I am, however, bashing the ideas that:

  1. Tthe web is an applications development environment
  2. Powertools can be anything other than necessarily complex

At this point the mastery burden for web development tools in most cases outstrips the mastery burden for developing normal applications. This means they defeat their own purpose, but people just don’t know that because so few have actually developed a real application or two and discovered how that works. Instead most have spent lots of time reading up on this or that web framework and showing off websites to friends and marks telling them they’re working on a “new app”. Web apps feel easier for reasons of familiarity, not because they are actually less complex.

Hmm… this is a good sentence. Meditate on the last sentence in the above paragraph.

The dangerously seductive thing about web development is that the web is very good at giving quick, shallow results measurable in pixels. The “shallow and quick” part being the dangerous bit and the “pixels” being the seductive bit. There is a reason that folks like Fred Brooks has insinuated that the drive to “just show pixels” is the bane of good programming and also called pixels and pretty screens “chicken lipstick”. Probably the biggest challenge to professional software development is the fact that if you’re really developing original works screens are usually the last thing you have to show, after a huge amount of the work is already done. Sure, you can show how the program logic works and you can trace through routines in an interpreter if you’re using a script language somewhere in the mix like Python or Scheme, but those sorts of concrete progress from a programmer’s perspective don’t connect to most clients’ concerns or imaginations. And so we have this drive to show pixels for everything.

Fortunately I’m the boss where I work so I can get around this to some degree, but its extremely easy to see how this drive to show pixels starts dicking things up right from the beginning of a project. Its very tempting to tell a client “yeah, give me a week and I’ll show you what things should look like”. Because that’s easy. Its a completely baseless lie, but its easy. As a marketer, project manager, or anyone else who contacts clients directly, it is important to learn how to manage expectations while still keeping the client excited. A lot of that has to do with being blunt and honest about everything, and also explaining at least a little of how things actually work. That way instead of saying “I’ll show you how it will look” you can say “I’ll show you a cut-down of how it will work” and then go write a script that does basically what your client needs. You can train them to look for the right things from an interpreter and showing them a working model instead of just screens (which are completely functionless for all they know) as a developer you get a chance to do two awesome things at once: prototype and discover where your surprise problems are going to probably come from early on, and if you work hard enough at it write your “one to throw away” right out the gate at customer sale prototyping time.

But web screens are still seductive, the only thing most people read books about these days (if they read at all) and the mid-term future of general computing looks every bit as stupid, shallow and rat-race driven as the Web itself.

I can’t wait to destroy all this.

Haha, just kidding… but seriously…