Recursive Lineage Queries in SQL and RyuQ

Without explaining every detail, here is a “proper” (as in, non-NULL dependent) method for handling tree data within a recursive query. In the example below table foo has an id, a parent and a dtg (date-time-group). Obviously real world examples will have more fields, but this illustrates the technique. This sort of thing is quite handy in a huge variety of situations as quite a lot of data is hierarchical in nature.

Adding a layer type attribute on the view is also a good way to accomplish things like, say, providing visual layout variables for threaded comments on a website, even if your (stupid) web framework doesn’t understand such things — you can bypass the framework garbage by defining a lineage-type view directly in the database and a “model” on top of it.

CREATE FUNCTION default_parent() RETURNS TRIGGER AS
  $$
  BEGIN
    IF NEW.parent IS NULL THEN
      NEW.parent := NEW.id;
    END IF;
    RETURN NEW;
  END;
  $$ LANGUAGE plpgsql;

CREATE TABLE foo
 (id         SERIAL PRIMARY KEY,
  parent     integer REFERENCES foo ON UPDATE CASCADE ON DELETE SET NULL NOT NULL,
  created_on timestamptz DEFAULT current_timestamp NOT NULL);

CREATE TRIGGER foo_default_parent
  BEFORE INSERT ON foo
  FOR EACH ROW EXECUTE PROCEDURE default_parent();

CREATE TRIGGER foo_reset_parent
  BEFORE UPDATE ON foo
  FOR EACH ROW EXECUTE PROCEDURE default_parent();

CREATE VIEW lineage AS
  WITH RECURSIVE
  parents AS
    (SELECT * FROM foo WHERE parent = id),
  children AS
    (SELECT * FROM foo WHERE parent != id),
  tree AS
   (SELECT id,
           id::text AS branch,
           0 AS layer,
           parent
      FROM parents
   UNION ALL
    SELECT c.id,
           t.branch || '.' || c.id::text AS branch,
           t.layer + 1 AS layer,
           c.parent
    FROM children AS c, tree AS t
    WHERE c.parent = t.id)
  SELECT * FROM tree ORDER BY branch;

CREATE OR REPLACE FUNCTION show_descendants(parent integer) RETURNS TABLE
   (id      integer,
    lineage text,
    layer   integer,
    parent  integer) AS
  $$
    WITH RECURSIVE
    parents AS
      (SELECT * FROM foo WHERE id = $1),
    children AS
      (SELECT * FROM foo WHERE parent != id),
    tree AS
     (SELECT id,
             id::text AS branch,
             0 AS layer,
             parent
        FROM parents
     UNION ALL
      SELECT c.id,
             t.branch || '.' || c.id::text AS branch,
             t.layer + 1 AS layer,
             c.parent
      FROM children AS c, tree AS t
      WHERE c.parent = t.id)
    SELECT * FROM tree ORDER BY branch;
  $$ LANGUAGE SQL STABLE;

CREATE OR REPLACE FUNCTION
  get_lineage(parent_id integer DEFAULT 0, hide_parent boolean DEFAULT false)
  RETURNS TABLE
   (id      integer,
    lineage text,
    layer   integer,
    parent  integer) AS
  $$
    BEGIN
      IF parent_id > 0 AND hide_parent IS true THEN
        RETURN QUERY EXECUTE $query$ SELECT * FROM show_descendants($1) OFFSET 1 $query$
          USING parent_id;
      ELSIF parent_id > 0 AND hide_parent IS false THEN
        RETURN QUERY EXECUTE $query$ SELECT * FROM show_descendants($1) $query$
          USING parent_id;
      ELSE
        RETURN QUERY EXECUTE $query$ SELECT * FROM lineage $query$;
      END IF;
    END;
  $$ LANGUAGE plpgsql STABLE;

Let’s see some invocations:

insert into foo values (default);
insert into foo values (default);
insert into foo values (default);
insert into foo values (default, 1);
insert into foo (parent) values (1);
insert into foo (parent) values (1);
insert into foo (parent) values (1);
insert into foo (parent) values (2);
insert into foo (parent) values (3);
insert into foo (parent) values (3);
insert into foo (parent) values (4);
insert into foo (parent) values (5);
insert into foo (parent) values (7);
insert into foo (parent) values (7);
insert into foo (parent) values (10);
insert into foo (parent) values (15);

select * from lineage;
 id |   branch   | layer | parent 
----+------------+-------+--------
  1 | 1          |     1 |      1
  4 | 1.4        |     2 |      1
 11 | 1.4.11     |     3 |      4
  5 | 1.5        |     2 |      1
 12 | 1.5.12     |     3 |      5
  6 | 1.6        |     2 |      1
  7 | 1.7        |     2 |      1
 13 | 1.7.13     |     3 |      7
 14 | 1.7.14     |     3 |      7
  2 | 2          |     1 |      2
  8 | 2.8        |     2 |      2
  3 | 3          |     1 |      3
 10 | 3.10       |     2 |      3
 15 | 3.10.15    |     3 |     10
 16 | 3.10.15.16 |     4 |     15
  9 | 3.9        |     2 |      3
(16 rows)

select * from get_lineage(3);
 id |  lineage   | layer | parent 
----+------------+-------+--------
  3 | 3          |     0 |      3
 10 | 3.10       |     1 |      3
 15 | 3.10.15    |     2 |     10
 16 | 3.10.15.16 |     3 |     15
  9 | 3.9        |     1 |      3
(5 rows)

select * from get_lineage(3, true);
 id |  lineage   | layer | parent 
----+------------+-------+--------
 10 | 3.10       |     1 |      3
 15 | 3.10.15    |     2 |     10
 16 | 3.10.15.16 |     3 |     15
  9 | 3.9        |     1 |      3
(4 rows)

Notice that there are no NULLs. Most formulations of tree type data uses NULL to represent not having a parent — but this is wrong. NULL means something is unknown. If the top-level nature of an entity is known, then NULL is the wrong value and complicates any logic involving that field unnecessarily. If, instead, we define a top level entity as any entity which is its own parent, then we have no ambiguity in the structure, and we are providing a correct model. For this reason the parent column above is NOT NULL. If we wanted to provide the possibility of maintaining records where the parent was actually unknown, we could remove the NOT NULL constraint and unset the trigger.

That may sound nitpicky, but there are several places where this simplifies things or makes query execution downright unreliable because of a dependence on conditional logic shortcuts. Consider the common SQL function solution comparable to show_descendants() above:

CREATE OR REPLACE FUNCTION show_descendants(parent integer) RETURNS TABLE
   (id      integer,
    lineage text,
    layer   integer,
    parent  integer) AS
  $$
    WITH RECURSIVE tree AS
     (SELECT id,
             id::text AS branch,
             0 AS layer,
             parent
        FROM foo
        WHERE (parent IS NULL AND $1 = 0) OR (parent = $1)
     UNION ALL
      SELECT c.id,
             t.branch || '.' || c.id::text AS branch,
             t.layer + 1 AS layer,
             c.parent
      FROM children AS c, tree AS t
      WHERE c.parent = t.id)
    SELECT * FROM tree ORDER BY branch;
  $$ LANGUAGE SQL STABLE;

This works just fine in Postgres 9.2 — but that is a quirk of implementation. SQL is supposed to be declarative in nature, and as such there is no guarantee which side of the OR condition in WHERE will be executed first. It is cheaper to check the single condition on the right side than the AND condition on the left, so it is likely a future execution optimization will detect that and run the query this way. The problem is that the right side expression can be true in invalid cases. There is no way to write this query safely as long as NULL is a possible value for parent.

Another problem is that using a NULL parent value in any calculation will nuke the entire operation because of NULL propagation. Say you want to do “parent::text || foo” somewhere later. Surprise! Any NULL is going to kill the entire string. There are convoluted ways to avoid that, but we’re fighting against the nature of an unnatural data definition instead of providing a natural definition at the outset and letting the positive consequences follow.

An argument can be made that handling for-real parent values instead of just using NULL definitions complicates the code above unnecessarily. Its true, in SQL you have to define a trigger to handle the top-level case and be a little more specific in the view and function definitions that follow. But that’s the end of it. We’ve trapped the NULL and completely contained the complexity involved in modeling an accurate parent relationship, not mention guarded against the undefined nature of logical OR execution (this is a huge issue on its own). If we were to permit a NULL here and didn’t really intend for that to mean unknown, then we would have traded the convenience of writing a quick and dirty definition for letting the complex and unpredictable side effects escape our model and propagate throughout the rest of the schema and every application that touches this table. With that in mind it is obviously a better solution to pay the complexity tax up front and write the insert trigger and function definitions and nip the issue in the bud.

In RyuQ this can be defined as:

create [relation]
  foo
    attributes
      id         Serial
      parent     Integer
      created_on TimestampTZ
    conditions
      pk id
      fks
        parent -> foo.id
      defaults
        parent = id
        created_on = current_timestamp

And define some elabels to operate on the data:

tree = union
         project [id, lineage = id::text, layer = 1, parent]
           select [parent == id] foo
         project [id = foo.id,
                  lineage = tree.lineage + '.' + foo.id,
                  layer = tree.layer + 1,
                  parent = foo.parent]
           join
             tree
             select [parent != id] foo

branch ID =
       union
         project [id, lineage = id::text, layer = 1, parent]
           select [id == ID] foo
         project [id = foo.id,
                  lineage = branch.lineage + '.' + foo.id,
                  layer = branch.layer + 1,
                  parent = foo.parent]
           join
             branch
             select [parent != id] foo

headless ID = select [parent != id] branch ID

And that’s it. Notice that there are no NULLs here either; we are not using any MaybeType attributes. If we wanted to make the definition of a top-level entity “Something without a parent” then using the type NoneInteger would be correct. If we wanted to include the possibility that a parent relationship is unknown (but that this does not define a top-level entity) we could use MaybeInteger. If we want to include both then we should define a domain that includes integers, None, and _|_ as members, possibly calling it MaybeNoneInteger.

Invokation:

insert [foo]
  {id}
  (default), (default), (default)
  {parent}
  (1), (1), (1), (2), (3), (3), (4), (5), (7), (7), (10), (15)

tree
 id |   branch   | layer | parent 
----+------------+-------+--------
  1 | 1          |     1 |      1
  4 | 1.4        |     2 |      1
 11 | 1.4.11     |     3 |      4
  5 | 1.5        |     2 |      1
 12 | 1.5.12     |     3 |      5
  6 | 1.6        |     2 |      1
  7 | 1.7        |     2 |      1
 13 | 1.7.13     |     3 |      7
 14 | 1.7.14     |     3 |      7
  2 | 2          |     1 |      2
  8 | 2.8        |     2 |      2
  3 | 3          |     1 |      3
 10 | 3.10       |     2 |      3
 15 | 3.10.15    |     3 |     10
 16 | 3.10.15.16 |     4 |     15
  9 | 3.9        |     2 |      3
(16 rows)

branch 3
 id |  lineage   | layer | parent 
----+------------+-------+--------
  3 | 3          |     0 |      3
 10 | 3.10       |     1 |      3
 15 | 3.10.15    |     2 |     10
 16 | 3.10.15.16 |     3 |     15
  9 | 3.9        |     1 |      3
(5 rows)

headless 3
 id |  lineage   | layer | parent 
----+------------+-------+--------
 10 | 3.10       |     1 |      3
 15 | 3.10.15    |     2 |     10
 16 | 3.10.15.16 |     3 |     15
  9 | 3.9        |     1 |      3
(4 rows)

Pretty simple. I find the RyuQ easier and more clear, but obviously I’m biased.

More on a new Data Language

I’ve given more thought to the new data language I’m working on, and finally decided on a name as well.

The new baby shall be called RyuQ — a reference to where I live respelled to accommodate the mandatory “Q” in any shiny new query language. I’ve settled on a modified form of S-expressions, calling them SP-expressions to distinguish between real S-expressions and my mutated version. The (still infantile) language description is peppered with examples, so you can see pretty quickly if SP-expressions feel comfortable or more alien relative to SQL.

I have to say, the further I go on this the more I wonder what was going through the minds on the committee that created SQL. So many things are just so obvious when sticking to an algebraic notation rather than trying to form some new thing that is neither relational algebra nor relational calculus.

Until I have a complete implementation worked into a fork of Postgres I won’t be able to do anything but toy with this language — but the more thought I give it the less I am satisfied with the work I am currently compelled to do in SQL.

Thinking About a Data Langauge

I’ve been thinking a lot lately about how a query and data definition language would look if I were able to write one myself. Well, it turns out I can write one myself, it just takes a lot of time. I don’t have a lot of time, but I’ve written down some of my ideas to clarify them and placed them here. This is mostly just a text-friendly way of writing relational algebra, but it has a few extras that would make it much nicer to use than SQL.

The mid-term goal is to implement at least some of the query language either as part of a runtime that sits outside of Postgres and feels like psql with something better than SQL, or hack an alternate parser into Postgres that would provide parsed Query trees to the optimizer. I haven’t decided which will be more time consuming in the long run just yet.

I’ll keep that area updated as I have time to work on the language spec more. There are quite a few ideas I’ve got left to commit to writing but just can’t yet due to time.

Version String Comparison in Python

Comparing and sorting version strings in Python scripts has come up a few times lately. Here are some simple approaches. These examples assume that the version strings will be numeric representations and not include elements like “rc-1″ or “alpha” or whatever. If your problem includes these kinds of elements, don’t worry — they are solvable by applying a touch of regex-fu to the processes below.

First, sorting a bunch of version number strings. The problem is that many version strings reported by various packages are just that: strings. Usually the only way to get version information is to ask for it (“[prog] –version” in a shell script, or “SELECT version();” from a database, or whatever) and interpret whatever gets sent to stdout. Those strings don’t mean the same thing to comparison operators that they mean to us so long as they remain strings. For example, the string ’3.5.10′ is greater alphabetically than ’3.15.1′ but is a higher version. So we need to convert them to tuples of integers to make comparison of them natural (again, all these examples assume integer-only version strings, but minor changes to the process can allow you to compare anything — and assigning a custom collation order can allow you to sort against any arbitrary order of arbitrary symbols, but that’s beyond the scope of the basic nature of the problem I’m addressing here):

>>> vers
['3.5.10', '3.15.1', '2.5.7', '0.20.0', '2.12.5', '10.4.3']
>>> s_vers = [tuple([int(x) for x in n.split('.')]) for n in vers]
>>> s_vers
[(3, 5, 10), (3, 15, 1), (2, 5, 7), (0, 20, 0), (2, 12, 5), (10, 4, 3)]
>>> vers[0] > vers[1]
True
>>> s_vers[0] > s_vers[1]
False
>>> cmp(vers[0], vers[1])
1
>>> cmp(s_vers[0], s_vers[1])
-1
>>> s_vers.sort()
>>> s_vers
[(0, 20, 0), (2, 5, 7), (2, 12, 5), (3, 5, 10), (3, 15, 1), (10, 4, 3)]

The list comprehension (actually, two nested list comprehensions) assignment to s_vers is the important part of this. Once that is done you can compare whatever you want. If the version number is buried as an element in a dict or larger list (likely) you can do this conversion in place by adding a new element to the contained structures and then sort the greater list based on that element:

>>> packages
[{'version': '3.5.10', 'name': 'foo'}, {'version': '3.15.1', 'name': 'foo'}, {'version': '2.5.7', 'name': 'foo'}, {'version': '0.20.0', 'name': 'foo'}, {'version': '2.12.5', 'name': 'foo'}, {'version': '10.4.3', 'name': 'foo'}]

OK, that’s pretty ugly (uglier depending on how your browser renders <pre> type text), so I’ll print them in order so we can watch the list change more easily.

>>> for p in packages:
...   print p
...
{'version': '3.5.10', 'name': 'foo'}
{'version': '3.15.1', 'name': 'foo'}
{'version': '2.5.7', 'name': 'foo'}
{'version': '0.20.0', 'name': 'foo'}
{'version': '2.12.5', 'name': 'foo'}
{'version': '10.4.3', 'name': 'foo'}
>>> for p in packages:
...   p.update({'version_tuple': tuple([int(x) for x in p['version'].split('.')])})
...
>>> for p in packages:
...   print p
...
{'version_tuple': (3, 5, 10), 'version': '3.5.10', 'name': 'foo'}
{'version_tuple': (3, 15, 1), 'version': '3.15.1', 'name': 'foo'}
{'version_tuple': (2, 5, 7), 'version': '2.5.7', 'name': 'foo'}
{'version_tuple': (0, 20, 0), 'version': '0.20.0', 'name': 'foo'}
{'version_tuple': (2, 12, 5), 'version': '2.12.5', 'name': 'foo'}
{'version_tuple': (10, 4, 3), 'version': '10.4.3', 'name': 'foo'}
>>> packages.sort(key = lambda x:x['version_tuple'])
>>> for p in packages:
...   print p
...
{'version_tuple': (0, 20, 0), 'version': '0.20.0', 'name': 'foo'}
{'version_tuple': (2, 5, 7), 'version': '2.5.7', 'name': 'foo'}
{'version_tuple': (2, 12, 5), 'version': '2.12.5', 'name': 'foo'}
{'version_tuple': (3, 5, 10), 'version': '3.5.10', 'name': 'foo'}
{'version_tuple': (3, 15, 1), 'version': '3.15.1', 'name': 'foo'}
{'version_tuple': (10, 4, 3), 'version': '10.4.3', 'name': 'foo'}

We started out with a list of dictionaries, each containing a package name and a version string. The first loop updates each dictionary to include a version tuple, and the next orders the dictionaries within the list by the tuple values. Viola! We have a list of dictionaries sorted by version number. Of course, if there are more than one package name involved you will want to sort on the package name first, then the version tuple as a secondary criteria (so you don’t compare versions of package ‘foo’ against versions of package ‘bar’, or sort glibc against firefox, for example).

If lambdas are unfamiliar to you, don’t be scared off by the package.sort() line up there — lambdas are perfectly safe, reliable and quite concise once you understand the way they are used.

From here writing a sort function for lists of version strings should be pretty obvious. And… that means that writing a comparison function for two individual elements that works the same way the built-in cmp() function works is trivial:

>>> def ver_tuple(z):
...   return tuple([int(x) for x in z.split('.') if x.isdigit()])
...
>>> def ver_cmp(a, b):
...   return cmp(ver_tuple(a), ver_tuple(b))
...
>>> vers
['3.5.10', '3.15.1', '2.5.7', '0.20.0', '2.12.5', '10.4.3']
>>> ver_cmp(vers[0], vers[1])
1
>>> ver_cmp(vers[0], vers[0])
0
>>> ver_cmp(vers[3], vers[4])
-1

Nice and easy.

Now I can’t figure out why comparison functions I’ve seen floating around occupy so much space and are hard to follow — full of class declarations and exec loops within exec loops (!!!) and other nonsense. At the most you will need to add some regular expression matching to extract/split on the correct substrings from the version string. That means you would have to import the re module and the list comprehension will grow by a few (maybe 10) characters.

Most common Bash date commands for timestamping

From time to time I get asked how to use the date command to generate a timestamp. Here is an idiot-friendly script you can post for reference in your team’s bin/ if you get interrupted about timestamp questions or have an aversion to typing phrases like “man date” (with or without a space).

All but the first and last two produce filename-friendly strings. (Thanks to Rich for the reminder to include UTC and timezoned stamps here):

#! /bin/bash

# An overly obvious reference for most commonly requested bash timestamps
# Now all you Mac fags can stop pestering me.

cat << EOD
        Format/result         |       Command              |          Output
------------------------------+----------------------------+------------------------------
YY-MM-DD_hh:mm:ss             | date +%F_%T                | $(date +%F_%T)
YYMMDD_hhmmss                 | date +%Y%m%d_%H%M%S        | $(date +%Y%m%d_%H%M%S)
YYMMDD_hhmmss (UTC version)   | date --utc +%Y%m%d_%H%M%SZ | $(date --utc +%Y%m%d_%H%M%SZ)
YYMMDD_hhmmss (with local TZ) | date +%Y%m%d_%H%M%S%Z      | $(date +%Y%m%d_%H%M%S%Z)
YYMMSShhmmss                  | date +%Y%m%d%H%M%S         | $(date +%Y%m%d%H%M%S)
YYMMSShhmmssnnnnnnnnn         | date +%Y%m%d%H%M%S%N       | $(date +%Y%m%d%H%M%S%N)
Seconds since UNIX epoch:     | date +%s                   | $(date +%s)
Nanoseconds only:             | date +%N                   | $(date +%N)
Nanoseconds since UNIX epoch: | date +%s%N                 | $(date +%s%N)
ISO8601 UTC timestamp         | date --utc +%FT%TZ         | $(date --utc +%FT%TZ)
ISO8601 Local TZ timestamp    | date +%FT%T%Z              | $(date +%FT%T%Z)
EOD

If executed, it will produce the (obvious) output:

        Format/result         |       Command              |          Output
------------------------------+----------------------------+------------------------------
YY-MM-DD_hh:mm:ss             | date +%F_%T                | 2013-05-17_10:16:09
YYMMDD_hhmmss                 | date +%Y%m%d_%H%M%S        | 20130517_101609
YYMMDD_hhmmss (UTC version)   | date --utc +%Y%m%d_%H%M%SZ | 20130517_011609Z
YYMMDD_hhmmss (with local TZ) | date +%Y%m%d_%H%M%S%Z      | 20130517_101609JST
YYMMSShhmmss                  | date +%Y%m%d%H%M%S         | 20130517101609
YYMMSShhmmssnnnnnnnnn         | date +%Y%m%d%H%M%S%N       | 20130517101609418928482
Seconds since UNIX epoch:     | date +%s                   | 1368753369
Nanoseconds only:             | date +%N                   | 427187053
Nanoseconds since UNIX epoch: | date +%s%N                 | 1368753369431083605
ISO8601 UTC timestamp         | date --utc +%FT%TZ         | 2013-05-17T01:16:09Z
ISO8601 Local TZ timestamp    | date +%FT%T%Z              | 2013-05-17T10:16:09JST

Interview from Another Dimension

I was asked if I was interested in covering a temporary administration position a few days ago because finding bilingual Unix people is pretty hard here in Japan. It sounded marginally interesting and stood a chance of getting me in touch with the local Unix community, so I said sure, have the interviewer give me a call.

One day the positioning agency asked for a resume. I sent one in. The next day at 3pm I got a call saying that I would get a call an hour later to conduct a phone interview.

At 4pm I didn’t get a call.

At 5:30 I called their office back to say that I didn’t get a call. They called me back asking if I’m still available today — I tell them that if its OK that I’ll be playing with my kids then I’m game. They call back again telling me that the company is really going to call this time but from the office in Yokohama, not Okinawa — I’m fine with that. They also told me that the guy calling would “be a foreigner, like you” — I’m fine with that, too.

Not a minute later I did get a call but not from Yokohama, and from a foreigner but not “like me”. The call was from India over the world’s worst connection.

This amazed me. For one thing it was 2013. I expected bad connections when calling across multiple satellite hops from contested jungle territory in Southeast Asia in 2004. But this was a lot worse than that, and this guy was supposed to be calling from an office. And he supposedly works for a high-tech company looking to contract me. It bears mentioning that you could get crystal-clear cell connections from most of Afghanistan in 2010.

So that was the first weird smell. The second hint of rotten tuna was the voice. I couldn’t, for the life of me, understand most of what he was trying to say. I’ve never been one of those “You gotta speak ‘merican!” types (hard to justify it being an expatriot myself), but if you’re going to speak English it should be English and should be intelligible, if not at least generally correct. Otherwise speak Japanese, or German, or get an interpreter, or have someone else do the interview — I’m open to any of the above. If you do know English but have a heavy accent, just slow down. But such ideas are lost on some people.

His speech had a magical pattern to it. Merely missing syllables or mushing sounds together like most non-native speakers was beneath this guy. He set a new standard for unintelligible second-hand language by injecting new syllables and sounds into each word.

The deft ease and fleet pace at which he mangled the language makes me think in retrospect that he probably considered English to be his first language. Maybe it was just taught to him wrong as some sort of cosmic joke. It was what speech would sound like if you could somehow hear a hash salt being added to it. This blew Pig Latin out of the water.

An abridged transcript of the conversation follows:

indian_guy_voice

Him: “Dis is Gumbntator Hlalrishvkttsh koling flum Ueeplo en Eendeya an ayam surchelin Mestarh Kleg Ewurlet?”
I could sort of make out what he was trying to say.
Me: “This is he.”
Him: “Ah see. But dis is Governator Ralrishevdish koling flum Weepro en Indeya an ayam surchen Mestarh Kleeg Iwuuret?”
Perhaps he couldn’t make out what I was trying to say?
Me: “Yes, I am the person you are looking for.”
Him: “OK.”
Me: “…”
Him: “…”
Me: “You are calling about the interview?”
Him: “So ifna kolik abbaud arun foha.”
Me: “I’m sorry, the line is echoing very badly, can you please say that again?”
Him: “So if colling aboud arun four?”
So here I think he’s calling to schedule a call at four because they screwed up today’s schedule already.
Me: “Tomorrow? Yes, you can call me at four.”
Him: “OK. So hou abbaud you al habing eksperens an de Sulrais Ziss?”
Now I don’t know what he’s saying, but I know its not a scheduling question.
Me: “Can you please say that again? This connection must be very bad.”
Him: “You al hawing eksperens wit Lenaks an de Sularis swistems?”
Me: “Yes, I have experience on Linux and Solaris systems. Mostly Linux, though, because that is the platform I develop on.”
And here it began to dawn on me that this was the actual interview. In Indo-Pig Latin.
Him: “Okai. Bud wud abaudd yor kulanted lol on de dekuhnikal missm?”
Me: “I must be having a bad phone day. Please give me a moment to get to a quieter room so I can hear you.”
Him: “So komaing fru dat ayem ah phookink zandngar an…” [and so on...]
He kept babbling on and on about something that I couldn’t hear as I moved to an environment better suited to auditory-verbal cryptanalysis. Hope I didn’t miss anything paradigm shifting.
Him: “…[continued spacetalk]…”
Me: “What would you like to know about my experience?”
Him: “Inna suba sisesutm hau ew mak da pashink?”
Me: “The reception is poor again, can you please say that again?”
Him: “Inna subaa susutem hao eww poot a pach?”
Me: “Patching? Are you asking me how to patch a server? It depends on what you mean by ‘patching’. Are we patching sources to rebuild a program, or installing upgraded binaries through a package manager or performing an automated patch and rebuild the way ebuilds and ports work?”
Him: “Yesss. Inna sabaa, hou eww poot a pach?”
Me: “What system are we talking about?”
Him: “Inna sauce.”
Me: “Sauce? In source? Oh,  Solaris? If we are receiving updated binaries I would use the package manager. I haven’t seen people bypass IPS and use the patch manager directly for a while.”
Him: “Zo uatt ai am gunda be dou nuh is abbauda passhin inna sabaa. Hau yu du?”
Me: “I’m sorry, I think you are asking me how I would patch a Solaris server, and without knowing anything else about the question I think you mean we are receiving updates from a repository. My answer is that I would use the package manager, probably IPS, or if just patches then the old patch manager. But I don’t really understand your question. It is really broad.”
Him: “SO hao eww do?”
Me: “You mean the command sequence?”
Him: “Yeis.”
Me: “You want me to spell it out over the phone?”
Him: “Yeis.”
I couldn’t help but snicker a little… is this really the way system administration interviews go?
Me: “OK, which version of Solaris?”
Him: “Inna sabaa.”
Me: “I understand in a server, but that doesn’t really change the question much, unless I’m missing something. Which version of Solaris? We are talking about Solaris, right?”
Him: “Zo vot ah em denkning niss uii nut dokkin abbaud da deweropent zicheeshn. Dust a passh a sabaa.”
Me: “Right, not a development situation, just patching a server. But this is a difficult question to answer unless I know what system we are talking about. They don’t all work the same way.”
Him: “Du eww habba poosiija fou da makkink na fou da af emma lepozitorian?”
Me: “I’m sorry, the phone is being worse than usual again, can you please ask the question again?”
Him: “Enna proosiija fou passhing. Eww habba lepozitori an poosiija. Du garanti ob da safti?”
Me: “My procedure to guarantee the safety? You mean during patching? If I make a repository? Was that part of the question?”
Him: “Yeis.”
Me: “OK, yes, in a production environment I would expect that we have separate testing and production repositories at least. I would patch or update the test servers, run applicable tests for whatever application or server software we have installed, and then deploy the update to the production servers. But this is a really basic thing to say, and I can’t give you any details without knowing what system we are talking about. Is this even a Solaris question?”
Him: “So abbaudda Lennuks.”
Me: “Linux? The question is about Linux?”
Him: “Onna Lenuks hau eww makka lepozitori?”
Me: “Repositories on Linux? Which distro?”
Him: “Onna Lenuks.”
Me: “OK… What package manager are we talking about? RPM, yum, smitty, portage, aptitude, they all do things very differently. Even RPM is different on different distros that use it.”
Him: “Yeis. Onna Lenuks. Hau eww mak da lepozitori?”
Me: “Just assuming you mean Red Hat or CentOS or something else derived from Fedora, I would collect the RPMs we want to distribute, sign them, write a meta RPM for yum installation that has the public key and config file in it and build the repository metadata with createrepo. But if this is not a development environment we’re probably just mirroring an existing repository, so most of the time syncing with the master is sufficient. If not we could sync, re-sign, and recreate the repodata with createrepo.”
Him: “So hau eww mak da lepositori?”
Me: “I think I just told you. I have maintained several software repositories in the past and using createrepo is by far the easiest and most reliable way to do it, if we are talking about a yum repository full of RPMs for a distro like Red Hat Enterprise Linux.”
Him: “Yeis. So da Redhat.”
Me: “Maybe I don’t understand the question. You want me to tell you how to create a repository?”
Him: “Inna Lenuks hau eww mobbing fom weri zmar drraib enna rojikalworuum?”
Me: “Sorry, I can’t hear the question very well, the phone is full of echoes. You are asking me in Linux how to do something?”
Him: “Mobbing werri zmorr drraib anna rojikalworuum.”
Me: “Moving a small drive in Logical Volume Manager?”
Him: “Yeis.”
And here is where it dawned on me that I should have hung up at the first sign of weirdness. Instead I had hung on and now I was really along for the ride. Until the bittersweet end…
Me: “Do you mean changing a physical block device from one volume to another, or moving the volume itself?”
Him: “Retzsai eyabba  werri zmorr drraib anna wanna denk u poot enna rojikalworuum. Hau kann godu boot?”
Me: “You are asking me how to move a Linux installation from a small drive onto a logical volume, and then boot it later?”
Him: “Yeis.”
Me: “Assuming this is a simple case I would copy the filesystem to a new partition within the logical volume and add an entry to the bootloader so that we could boot it from the new location. But what bootloader we are using in this case? Grub or LILO or Grub2?”
Him: “Inna Lenuks.”
Me: “Right, in Linux, but which bootloader are we using?”
Him: “In da Lenuks.”
Me: “Right, but are we using Grub or LILO?”
Him: “LILO. Inna Lenuks.”
At this point I was relieved just to get something other than “Inna Lenuks” by itself out of him.
Me: “OK, assuming that the version of LILO we are using is logical volume aware, I would add the entry to the LILO configuration file that points to the location of the kernel on the relocated installation.”
Him: “Wat fail?”
Me: “What fail? You mean what file? The LILO configuration file.”
Him: “So wat fail?”
Me: “You mean where is it? Its usually in ‘slash E T C slash L I L O dot C O N F’.”
Him: “Inna Redhadd.”
Me: “In Red Hat? LILO isn’t a part of that distro any more. They use Grub2 now.”
Him: “Uadda za komunt fur addikt inna neu intree?”
Me: “The command for adding the new entry? There is not a command to add a new LILO entry, you have to edit the configuration file directly. Grub2 has some commands like grub-install and grub-update. But you still have to check the configuration file to make sure things are in the right place. Is that what you mean?”
Him: “Inna Lenuks?”
Crap! We’re back to this again. I really don’t know how to debug this guy. He’s worse than the Emacs Psychoanalyst.
Me: “Yes, in Linux. But this is not exactly a Linux question. The bootloader can load anything, so I don’t know what you mean.”
Him: “Adnanujinnadundaweenananndana…[A good five-minute bunch of spacetalk that I completely cannot understand. It was riveting, though. Like a symphony it had its own movements. Initially with the monotone of a public announcement, then to the lively staccato of a friend relating a happy story, capping with a crescendo of alternate gravelly and soft sounds unique to Indian speakers, and ending with a friendly chuckle -- as if he had enjoyed himself and was ready to say goodbye.]…”
Me: “OK, thank you for the call.”

I have no idea what most of that was about. I got the feeling he asked me some Solaris questions and some Linux questions and some general installation-wide question at the end that I never quite got a fix on. Actually, I never quite got a fix on anything at all, and I don’t think he did either.

This was the weirdest interview experience in my life. It is like a trick they would pull you at Robin Sage but this guy was for real; no OC is going to come evaluate me on how I did and counsel me how to better deal with the crazy and ambiguous.

Now for the scary part. This is the new face of IT outsourcing. Think long and hard whether you want to trust your data integrity and the construction of business systems you expect to get reliable answers out of to companies that have trouble communicating with their own (prospective, in this case) subcontractors and employees.

Since this is Japan, I wonder how on earth they manage to conduct interviews of Japanese people?

Am I alone here? Has anyone else ever experienced this sort of thing? (Other than when calling Dell or Microsoft tech support and being redirected to India, that is.)

Freenode Year-End Weather Review and 2013 Forecast

##c, ##c++, ##java, ##javascript and almost all other channels named after an Algol-descended language remained strong in n00b angst and help-vampire congestion (strong counter example is #bash, see below), rendering them useless for anything other than observing flame wars amongst programming newbies arguing over terms they’ve only just discovered on Wikipedia. Expect no change in temperature or inclination for 2013, and as always prepare for flurries of students hoping to get their homework done for them throughout August, October, December, March and May.

#bash took first place for overall 24-hour activity within its stated zone in 2012 — quite an achievement. This was enabled by nothing more than the militant purism of its main participants who happen to actually know (most of) what they are talking about. The intensity of discussion in #bash is likely in increase over 2013 as realization dawns on more new *nix admins and even OS X users that their systems represent a complete programming environment. A corresponding increase in the volume of beginner reference links in-channel is likely — with an associated increase in RTFM calls directed at those who don’t read links or delivered by the less patient/coddling of the regulars.

#fedora, #ubuntu, #centos, and other distribution-named channels fell into two categories in 2012:

  1. overrun with help-vampires asking the same 3 new-release migration questions
  2. overwhelmed with utter silence

The channels #ubuntu and #centos took the first-place poo cake for overall deafening off-topic, RTFM-worthy and amateur architectural astronautic clamor while #archlinux, #gentoo and #fedora managed to achieve a much better signal-to-noise ratio, mostly due to a greater percentage of knowledgeable participants. Expect very little change in 2013 with the exception of #ubuntu and #fedora. The former may grow even worse as the population of those who don’t know any better flock to Ubuntu as Steam picks up, er, steam and the latter may grow gradually quieter as new changes implemented in Fedora 18 cause a probable nose-dive in that distribution’s popularity across the year.

#django was one of the strongest on-topic, 24-hour hour activity channels focused around getting actual work done, with the vast majority of interaction involving at least marginally researched questions and a great deal more courtesy than usual this millennium. This indicates that the Django project has likely reached its Goldilocks point as a project where it is just enough below the radar that the “new thing” from 1~2 years past is still soaking up the n00bs, b00bs and help-vampires (in this case, #RubyOnRails) and enough srsly gentlemen have noticed it to make it a usefully mainstream place to work. If no unexpected storms of blockbuster “Lern da Web wit Djago in 10 Dais!!1!” tutorials or books occur across YouTube and bookstores expect #django to experience only a slight increase in temperature and no bumpkin brain blizzards or humility hurricanes. The status of Django on Python3 is the most likely leading indicator of trouble here (see below).

#django-dev was boring and dead for the most part, aside from the occasional thin mist of packager discussion and “why doesn’t the TLS setting for mail mean real TLS on the correct port?” talk (nonsense!). Some rumblings of the impending Python3 reckoning could be heard, but were still far enough in the distance as to avoid a full-blown #fedora style storm in 2012. Expect this to change in 2013, as Python3 will finally give Django devs enough to talk about to wake kick them off of the ML and into IRC activity. The action is likely to be a bit below storm-strength due to the project’s (general) adherence to its own release guidelines, but may from time to time bear watching.

#RubyOnRails and related channels were clogged with help-vampires and n00bs in similar fashion to the Algol-language and distro channels. This has remained fairly steady since 2009 or so, with the effect being bolstered by the presence of all those people who gave up on mobile programming just before they might have actually figured out how native applications work. Save a major drive to some other fascinating technical mistake (“Web 3.0?” “cloud vX”?) that goes viral, the Rails community will likely continue to experience idiot floods and hails of stupidity through 2013. For the serious who are in need of actual relevant discussion, forums, IRL meetings with Real People You Know and project-specific channels for projects that happen to be built around Rails will be the only places to find it.

#guile managed a slight edge over both #lisp and #scheme last year in Occasional Wizardry, but the overall volume of discussion was far lower than either #lisp or #scheme — giving #guile the best signal-to-noise ratio anywhere but also rendering it an incredibly boring place to hang out on an average day (as in, #guile remains a statistical outlier, though an interesting one). It is uncertain whether the effects of a new project, a new major version or a new implementation of Guile, Scheme or Common Lisp will have any effect or even be noticed by anyone, anywhere, so a prediction for 2013 is beyond me. I have a sneaking suspicion that someone might eventually catch on that guile2 includes a webserver ready-made for scripting in a functional language (among other features), but the population of paren-loving teens is so low at the moment and the current infatuation with the Web and the Java religion of Absolutely Everything Must Be An Object (Amen) still so strong with the sort of computer science faculty that thinks that every student should get a gold sticker for showing up that it is hard to see if anything short of a viral breakout video complete with tits, violence and gore would be noticed.

#haskell took first prize in 2012 for overall, unadulterated, near-constant uber geekness and Deep Black Magic. Three factors influenced this strongly: the near exclusive population of serious math nerds who like to flaunt their grokness, the tendency of such people to never admit they don’t grok a mind-melting snippet in channel and instead boil in silence until something makes sense to them, and the tendency for newcomers to either struggle unflaggingly until they earn their place among the immortals or simply give up and never, ever venture into #haskell again. In this, the uniqueness of Haskell as a language serves a positive filtration role in the community much the way that the old “be smart or go home” sort of freshman math classes did back when it was OK to admit that computer science wasn’t for everybody. Expect very little change to this trend in 2013, though by the end of the year commercial projects using Haskell may be revealed as actually using Haskell, and this may drive a slight, temporary increase in interest.

#erlang was a bit like #haskell, but more average in every aspect: less magic, more noise, fewer quitters, more eternal (but not really annoying) n00bs. This is mostly due to the revelation among high schoolers and college language hipsters that Facebook uses Erlang for a smattering of projects that can’t afford downtime and how Erlang can cope with such requirements in a novel way. Other functional language channels generally fell into the pattern of the lisps and Haskell and Erlang, but these last two deserved particular mention. In 2013 Erlang stands a very small chance of sucking brains away from other interesting languages such as Lua and anything matching .*ML.*. In that case expect Erlang to eventually grow more like #bash in nature over 2013, with a particular threshold being crossed if #erlang itself becomes a bothersome place to hang out due to an excess of help-vampires and alternative Erlang-based project channels becoming the alternative arteries of community brilliance. Saving such a spontaneous increase in notoriety, however, #erlang is likely to follow or return to the majority patterns of 2012.

This has been the Freenode Year-End Weather Review and 2013 Forecast. All other networks either suck or were set up with specific crowds in mind (such as botnets).

East Europe: Why Historical Fears of War are Well Founded

I received an email full of anti-Obama sentiment the other day. It was an indirect re-hashing of a commonly echoed sentiment most concisely summed up as

The American Republic will endure, until politicians realize they can bribe the people with their own money.

The above sentiment has been variously attributed to Benjamin Franklin, Alexander Fraser Tyler, Alexis de Toqueville and P.J. O’Rourke. I have no clue which one is correct, but that each has fascinating and well thought out things to say is evident if you read some of the things on each of their quote pages.

That the message (pasted below) came from a Czech says a lot.

Czech, Poland, Slovakia, Romania, Hungary, the Baltics…, East Europe that lies on the line between the North Sea and the Black Sea — these are the places that always see the worst of trans-European war. This is such a significant and well defined trend in history that there is a name for the place: The Intermarum.

A few years ago Czech, Hungary and Poland broke with European defense policy (a worthless concept unless an American general or President is doing the talking) and formed their own for-real military alliance called the Visegard Group. While European NATO members argue about who is going to pay for what during a projected training exercise the East Europeans are sending their tiny military forces out to the training range on their own. So the Intermarum has now organized itself into a tiny military alliance that isn’t about building bigger and nicer offices for itself in Belgium every year on American tax money. Compared to the last seven decades of abject military complacency and external domination this is quite a break from the status quo.

When France or Germany invades Russia or Russia invades anywhere in Europe, they march through Poland and Czech and have to deploy mountain troops to Slovakia, Czech and Hungary and at least fight a blocking action in Romania. These countries have no direct input on the reasons France or Germany and Russia might be at war, but they are the battlegrounds where the blood is actually spilt and fighting is heaviest. This is, of course, devastating since the combined might of two modern armies always clashes right there in Eastern Europe. The speedbumps always see the worst wear.

Americans forget this because we think of WWII as a fight that happened in France, Italy and Germany because that’s the part we took part in for less than a year. This generation of Americans is only aware of WWII as far as it has watched Saving Private RyanInglorious Bastards, and  Band of Brothers. Ask a random person to show you on a map where Normandy is or what “the Ardennes” means and you’ll see how far from reality most people’s cinema-driven imaginings are from reality. Asking them “why did America invade Europe in WWII” is even more comical/sad.

East Europe was pulverized in 1938 and remained a impact zone (literally — as in bullets, shells and bombs impacting it) for seven more years until 1945, at which point the Russian reaction to German invasion was a counter-invasion of Germany, which meant invading back across East Europe on its way to Germany. East Europe got screwed.  Twice. The place was completely devastated and much of it actually flattened.

After all that East Europe didn’t get straightened out for 60 more years because the Soviets needed East Europe as a buffer zone against the American military in Europe. They didn’t have any need to improve the area, of course, they just needed to hold it for use as a surrogate battlefield in the event of WWIII, because battlegrounds are horrible places and they would much prefer the horrors occur there than in Russia itself.

Considering how geography forces those historical realities on the Intermarum countries it is easy to see why the Euroskeptic, anti-collectivist, pro-American Constitutionalism part of Europe happens to be East Europe. They are very worried about an EU collapse and fragmentation which becomes more likely every day. This isn’t because they care whether the EU member states continue to tell sweet lies of hope and harmony to each other or not, but because European socialism has at this point so rotted out the EU financial position that its has had concrete effects on what is available in the market. That means the market across all of Europe is so unhealthy that it has started to affect things we normally don’t think as being part of economics.

There is a fuzzy line between a need for a market correction and a need for a war to (re)establish natural property assertion by violence (land, commodities, rights, people) to recover from an imbalanced market that has severed itself from reality through rhetoric — a trend which has its roots in government interference in the market to begin with. Violence is the only way to settle property and control issues between parties once financial devices and/or the assent of parties to accept what another party asserts is true have failed. It doesn’t matter whether “party” refers to a person, a clan, an organization or a state; when the market is interfered with it gets sick, eventually that disease infects the state and law fails, and then the only law is what is enforceable through direct action. That is a part of what has driven war from time immemorial.

There really isn’t any separation between politics, economics and war from a geopolitical perspective and the East Europeans are worried that the American system, which has enabled the European system to exist for the last two generations, is on its way toward a type of dysfunction which will require the Americans to withdraw their concrete forms of attention from European affairs due to lack of resources and/or political bandwidth. We would withhold our attention and resources the same way the vascular system withholds blood from the extremities when threatened or ill.

An American decline, the definite separation of French and German interest from the made-up concept of “the European interest” and the resurgence of Russia is alarming to the Czechs and especially the Poles for reasons that people not familiar with history, geopolitics and the logistics of war do not understand.

The original message (unattributed other than “from someone in the Czech Republic”):

The danger to America is not Barack Obama but a citizenry capable of entrusting a man like him with the Presidency. It will be far easier to limit and undo the follies of an Obama presidency than to restore the necessary common sense and good judgment to a depraved electorate willing to have such a man for their president. The problem is much deeper and far more serious than Mr. Obama, who is a mere symptom of what ails America . Blaming the prince of the fools should not blind anyone to the vast confederacy of fools that made him their prince. The Republic can survive a Barack Obama, who is, after all, merely a fool. It is less likely to survive a multitude of fools such as those who made him their president.

ODFapper 0.04 for Django on Unix

I’ve pulled the most common bits out of the Django views where I had rendered ODFs before and have the itty-bitty beginnings of an ODF handling/rendering library started now. “Library” is a bit of a stretch, since its only a few functions and a Bash script, but it abstracts the most mysterious/tedious parts of ODF handling already.

This is just a simple tarball right now. It unpacks like a Django app would, but only contains a copy of odfapper, funcs.py and a templatetags/odf_tags.py. But since we are doing template tag registration you need to include it in your INSTALLED_APPS in settings.py and add a new variable ODFAPPER_PATH which needs to be a string with the absolute path to /your/project/location/odfapper/odfapper. Importing into a views.py (or wherever) that you want to render ODFs in is done with from odfapper.funcs import render_odf, and I go into a bit more detail in the README included in the tarball. At the moment this post and the stuff in README (which is just an expansion of internal notes) is all the documentation.

I’ve got a ton more work I’d like to do on this. If there is any interest I could put a GitHub repo up — but other (paying) work calls to which this isn’t central, so… let me know if this is a direction anyone wants to head in and I’ll keep it going. There are a bajillion tiny things that are not so hard to do that would make ODF handling enormously more intuitive.

Link to 0.04 archive: odfapper_django-0.04.bz2
Link to just the shell update: odfapper-0.04.bz2

It bears mentioning that this is tested against Django 1.4, but not 1.5 yet. Unless the template loader classes have changed a lot then this should still work fine anyway, though.