Turkey Responds to Russia: “No”

As I mentioned when Russian airstrikes in Syria began, the airstrikes have nothing to do with Assad and everything to do with keeping Washington distracted, maintaining the status quo in Syria (that is, prolonging the conflict), and pressuring Turkey (as an expansion on the already decades-old play of keeping Armenia at odds with Turkey and Azerbaijan).

The Russians did what militaries so often do when they want to present a pressuring posture and forced the issue by violating a political target’s airspace while in the course of some other operation (consider the US Navy’s recently deliberate disregard of what the Chinese claim are their “territorial waters” in the South China Sea — though the issue there is almost exactly reversed: the Chinese are the aggressors in the sense that they are laying claim to broad swathes of ocean over which Beijing has never had any practical control). Turkey decided to take the opportunity to send a message to both Moscow and Washington by shooting down a Russian jet.

The important message Ankara is sending is that they will not cooperate on any terms with Moscow, that Ankara still considers itself a Western-ally, and — perhaps most interestingly — forcing the public dialog to become, at least temporarily, about the geopolitical game that is going on instead of the incidental and petty distraction of Assad and ISIS that has been filling the news. ISIS has used terror tactics to get in the news lately (Paris made a big splash, after all), and now Turkey has used a similar technique, though not terrorism by any stretch, to change the focus of public reporting for at least a few days.

If Washington was waiting for a green light in the region before surprising everyone with a sudden shift from Arab to Persian support, this was it. The best move right now would be for Obama to show up in Tehran tomorrow, and Washington to flip sides overnight, both with regard to Tehran/Riyadh and ISIS/Assad. By getting on the Persian side of things Russia has nowhere to go, loses its lever in Iran, and has to (for the first time in two decades) react to Washington instead of being the initiator. The Israelis and Egyptians will play ball — they have before and they will again (and judging by Bibi’s deft use of hyperbolic rhetoric over the last few years, he’s ready to make a deal that let’s Tel Aviv relax), and Turkey is all but shouting out loud in plain language that its time to pinch the destabilizing issues at their source.

Whether anyone who is allowed to make a decision is paying attention is anyone’s guess — the last several years of American policy make me wonder if anyone is paying any attention at all… which is probably why Ankara is trying its hardest to force a focus on the strategic issues that underlie the future-changing alignment shifts in the region instead of letting the public dialog remain purely about peripheral issues like ISIS and Assad.

Methodologies in open source development

Prompted by an article on opensource.com about scrum in foss projects. This is an incomplete, impulsive rough draft and requires some major revisions.

Nothing makes me facepalm faster than when I get hired to work on a project and the first thing anyone tells me is “This project is going to be great because we are using the best methodologies, you know, like TDD and Scrum and everything. Totally perfect.”

WTF!? How about telling me about the problem the project solves or why it will be an improvement over whatever came before or talking about the architecture or something? There are only three conclusions I can draw from seeing that the only thing someone can think to say when I first enter a project is a word about methodology:

  1. They have no clue what the project actually does.
  2. There are no project goals.
  3. They are hiring me specifically because they have not yet encountered any other competent developers and they know it, yet (amazingly) they feel confident in dictating whatever they decided are “best practices” to me.


Often this results in me backing out of a job, or leaving if I had tentatively agreed to it already — unless they are just going to pay enough to make watching a project tie itself in knots worth my time.

(Incidentally, I was talking with a guy from Klarna the other day and had the exact opposite experience. They know what problem their platform tackles and why it is a really good idea. So its not hopeless everywhere, just most places. What is really amazing is the guy I was speaking to wasn’t even a developer but understood what their main product is all about. That’s almost a reason to have actual hope for the future of that particular company. I don’t think I’ve written the word “hope” without prefacing it with a word like “unreasonable”, “false”, “sad” or “vain” in an article about software development.)

Today there is a problem with methodologies infecting and then blinding software projects (foss and otherwise). Its like competing flavors of ISIS taking over the ethos of various management and teams while sapping the reason from their adherents. Its not that Scrum or Agile or TDD are necessarily bad — there are a lot of good ideas that can be pulled from them — it is problematic that things like Scrum have become religions. Let’s put that in the big letters:

The problem isn’t this or that particular practice, it is the religion part.
— Me, just now

When it is absolutely unacceptable to question the utility of a methodology or specific coding practice you have a major problem. Selecting a particular set of practices without any regard to the context within which the methods or techniques are to be applied you are simply making uncritical, faith-based decisions. That is like hoping that a man who lives in the clouds will make everything better for you because you move a wooden block back and forth three times a day or kiss the ground a lot.

No, nevermind, actually it is worse than that, because you’ll only lose a few moments of your day moving a little block of wood around and kissing the ground is pretty quick. Neither has to bother anyone else who is trying to be productive right then. It is when you schedule daily or weekly meetings about block-moving techniques, force everyone to take a “vacation/working retreat” (oxymoron much?) to the tune of hundreds of thousands of dollars in company money to attend seminars given by charlatans about ground-kissing, and schedule weekend work time and “fun events” like 24-hour “hackathons” and “weekend sprints” to make up for the time lost in coordinating all the above activities. That’s way worse.

(Note that in the ranty paragraph above I’m not calling all scrum/agile/whatever coaches charlatans. The real coaches I’ve met (who have themselves written successful software) would agree entirely with what I’m writing here and tend to say straight out that chosen practices must match the context of the project. I am calling the seminar circuit and “methodology certification” guys charlatans. Those shitbags have learned that they can make crazy money by telling sweet, long, loud lies to management in a culture desperate for something to point value-blind investors at as a reason to throw good money after bad. Of course, this also implies that quite a bit of tech management is incompetent and most tech investors are just shooting in the dark.)

Without the ability to question a practice it becomes impossible to evaluate its impact on your goals. For example, sometimes TDD makes a lot of sense, especially when the product is a library. Srsly, if you write libraries and you don’t do TDD then you had better have some other word for something that amounts to the same thing. But sometimes tests — especially when they are, by necessity, integration tests of a non-trivial, comprehensive nature — can be a massive distraction, totally unhelpful, and a noticeable net loss to a project (especially project components that are literally untestable before the heat death of the universe).

The choice of a methodology or technique has to be a goal-based decision. Very often this means a business decision but sometimes its a project-goals-and-ethos type decision. Business decisions are the easiest because they are the most straightforward and the goals are relatively simple. It is a bit different in a foss project where adherence to a particular practice might be an underlying goal or a core part of the social value system that surrounds it. But even in a foss project it isn’t so hard to pin down a goal and determine how a particular practice when imposed as a rule will either help, hinder, or confuse the goal.

There is a general incongruency  between the context scrum was designed around and the context within which most foss projects exist. Scrum (and agile in general) is about customer-facing code; specifically in the case where the customer is a paying entity that is inexpert in the software being developed, but the developers are inexpert in the problem domain being solved. That is the primary Agile use case. It works well there — but this does not describe the situation of most foss projects.

Most foss projects are intended to be used by other software developers or techie users (system operators, interface specialists, DBAs, etc.), and are especially focused around infrastructure. In this case Agile simply doesn’t fit. Some documentation policies don’t even fit: there is a lot of blending among the idea that “code should be documented”, “code should be commented”, “comments should adhere to a markup that generates documentation”, “the API documentation is the manual” and “there is a product manual”. These are not at all the same things in certain markets, but in the open source world there are shades of meaning there spanning “the code is the documentation” to “literate code” to “we have a separate documentation project”.

I use foss software almost exclusively for work, but my customers don’t. They have no idea the difference, really, and don’t care over the span of a purchasing decision whether they are getting foss software or proprietary software — they only care that it works to solve their immediate business problem (that closed source sometimes introduces a different, longer-term business problem is not something they are generally prepared to understand the arguments for). In this case I have non-developer users paying for my (incidentally foss) software — and scrum/agile/whatever works very well there as a set of guidelines and practices to draw from when outlining my group’s workflow.

But the infrastructure stuff we do that sits behind all that — scrum/agile/whatever is totally insane. There is no “sit in with the customer to develop use cases” and holding a scrum meeting daily when it comes to network protocols that underlie the whole system, or the crypto suites that render the whole thing safe, or the database schemas that decompose the raw data into meanings relevant on the back-end. That sort of stuff is generally core-competency for foss developers, but it doesn’t fit the scrum or agile methodologies at all.

Only when you are working on a wicker-castle of a web project where concerns are so totally interwoven and mishmashed together that front-end customer-facing concerns become backend concerns does agile make any sense — but in this case it only makes sense because you are shipping a giant ball of mud and your hair is on fire every single day because you’ll never catch up with breakages (welcome to the actual modern paradigm that rules 90% of software projects I’ve seen…).

I’ve poked at a few different sides of development. What I’m really getting at is that there are different worlds of software development, and each world is suited better or worse by different methodologies that have been developed over time. (As an extreme case, consider the lunacy of scrum in most embedded projects.) No methodology I’ve ever seen fits any given project very well, but all of them absolutely work against the interests of a project that exists outside the context from which they originated. That could be because of time constraints (consider the infamous “docs don’t pay” attitude — yet it shouldn’t be dismissed outright because it is actually a valid business concern sometimes), the context of use doesn’t fit, or whatever.

In the same way that we (should) try to pick the best tools (languages, data stores, data structures, network paradigms, etc.) for a given job based on the nature of the task, we should approach selection of development methodology and team workflow based on the nature and context of the project.

We got a different StarWars Trailer in Japan

I live in Japan. The new Japanese trailer for Star Wars that I saw today was not the same as the new English one that I was shown on a website. Huh?

Disney is apparently doing a striptease by varying what they show in different language versions. Sneaky — and probably a really good strategy. As immune as I generally am to trailers, this is still pretty badass. Its Star Wars after all:

In marketing as in sex, it seems, being teased is at least half the fun.


I’m going to be enormously pissed if I actually go to the trouble to see this in the theater out here and it sucks. I don’t imagine it will suck, but that’s the problem with being teased too intensely — once the main event has begun all that teasing just leads to a huge letdown if she’s only as exciting as a pretty pillow (yeah that’s a real thing, with its own wikipedia page…). Too much anticipation can make an otherwise pretty good experience seem cheap.

I really hope this isn’t Episode I all over again.

A More Likely Calculus Behind Russian Airstrikes in Syria

The media has been abuzz with talk about the Russian airstrikes in Syria. More than a few people have asked me about it. This is a record of my thoughts immediately after hearing the first news.

I haven’t gone to any great trouble to find out the names of places hit or who did what when or whatever. I don’t really need to. I have been expecting Russia to become (more overtly) involved in the Syrian/Iraqi conflict for quite a while now. The news reports I’ve read confirmed the expected: Moscow is not yet picking sides, but is definitely picking targets that will provoke Washington to double-down on its already deep investment in meaningless, expensive actions in the Middle East.

The frustrating part about those “news reports”, however, is they they purport to be news reports but are just pages and pages of unfounded speculation designed to satisfy emotional needs. None attempt to explain how Moscow’s actions may fit into the framework of this or that possible ongoing strategy, evaluate western assumptions about what is going on, determine whether or not Moscow’s activity supports or challenges those assumptions, or highlight any areas where Russian actions require further analysis due to some evident incongruity with whatever was previously assumed to be happening in the world.

Obviously we’ve gotten something wrong or else we would have seen this coming (well, a few of us did, but we aren’t the ones anyone pays attention to). None of that is addressed in the media. But then again neither truth nor analysis nor truthful analysis is the business that media is in. It is in the business of selling advertizing, impressions, user data and click metrics as cheaply as possible, and sensationalism is the best tool at hand for that (other than porn, but that’s already a saturated market).

The explanation the media seems obsessed with is that Putin’s goal is to support his good buddy Assad. The slightly more interesting version goes on to explain that neither Assad nor Putin have attacked ISIS directly, but have instead attacked the smaller factions. Some speculate that this is so that Assad can force any decision about foreign support to be a polar decision between himself or ISIS. By excluding other factions as viable alternatives he can appear to be the only reasonable choice by comparison, the lesser of two evils. At least that is an interesting take on things, and probably not far from the truth. Assad’s truth, anyway. But it doesn’t explain Putin at all; he doesn’t have a horse in that race unless we assume that he just really, really likes Assad.

To believe in this deep and abiding love between those two naughty star-struck dictators we have to answer a few difficult questions: Why is the only support a few airstrikes on minor targets? Why hasn’t Putin leveraged any of his other influence in the region to gain support for Assad? Where is Kadyrov & co. when they are needed? Why hasn’t Tehran been empowered/proded by Moscow to do anything about their (supposed) mutual pal? Where is the old Hezbollah magic when its needed? Why he has waited this long to actually do anything?

I could go on, but suffice to say this isn’t about any Putin-Assad bromance as much as it is about distracting Washington. Its pretty obvious which of those two goals is more important to Russian strategy.

“Hmmm… should our strategy focus on Washington or Damascus… I just can’t make up my mind! Man, this political stuff is really hard! Decisions decisions…”

— What is not going through Putin’s mind

It is silly to concoct an explanation which consists purely of a flimsy, unsupported assertion (“to support Assad”) and then drag a reader through page after page of humanistic moralizations, vague calls for the “international community to act”, regurgitation of random violence statistics, fun factoids about how shitty life in the Middle East might be, or even go off on a long explanation about target selection without addressing why Assad would be Putin’s choice. Even that is premature without addressing the possibility that perhaps Putin has not made a choice. It should absolutely be explained that for Moscow there does not have to be a meaningful distinction of choice. Neither of these guys are amateurs nor is either stuck in the Geopolitics lvl1 Tutorial Playground of this particular game. (Washington, on the other hand…)

I saw some rather lengthy articles about the Russian airstrikes, many well over 4 pages. None of them referenced the history of external influence in the region. Not a single mention of the Turks, not a single mention of how WWI impacted the region, not a single mention of the whole King Faisal I thing (no, not that set of Faisals, the Iraqi/Syrian ones), no reference to how the typical “put a minority in power to foster future political dependency on you” play works (ever wonder how the tiny Alawite minority came to be in charge?). Not any of that. The media simply makes it appear as though Putin is desperately in love with his long-time buddy Assad — two great pals against the world, backs against the wall, willing to do anything for each other. Which is ridiculous.

Off the cuff I would say Putin is definitely angling to create space for Assad, but the reason for that is probably to achieve two goals, neither being “to support Assad”:

  1. Tie Washington down in a pointless game (or rather deepen its investment in the ongoing one).
  2. Maintain the status-quo in Syria.

(There are very likely peripheral goals and incidental benefits to any action Putin takes, and some of them may turn out to have interesting long-term repercussions.* These are just the goals that fit this action in this place at this time the most closely.)

The West, and especially the US, has already invented a rhetoric that mandates unlimited political agitation whenever anything unrelated to Europe or the US happens in the Middle East. Lighting a fire larger than a BBQ grill, for example, may cause a 4-hour special on some news channel, or maybe even a Muslim riot in London or Paris (or more likely a cartoon which itself prompts such rioting — strange that lengthy, detailed, deliberate editorials don’t have the same effect). This rhetoric prompts Washington to invest ever more deeply in pointless actions designed to deflect other pointless actions which may or may not come to pass in the Middle East (like countering Russian influence with Assad, for example). The West is a gigantic, over-charged Van de Graaff generator right now, rubbing itself to pieces internally with angst, just waiting for some poor lab student to come too close. This is as easy for Moscow to exploit as unattended lab equipment is for mischievous high schoolers.

By attacking anything that is both not Assad and not ISIS in Syria Putin is subject to the following effects:


  • Very little money
  • Very little military supply
  • Exactly zero international standing (any faction that matters has already decided their support/neutrality/opposition to Moscow)


  • Actual pilot experience
  • A live, public, and very well photographed showcase for Russia’s new aircraft and weapons (Hey! Its not 1988 anymore!)
  • Vastly improved domestic political standing (He made a point of being seen sidestepping both the “save face at the UN before doing whatever we were going to do anyway” and “coalition building” games. In fact, he made the “international community” in general and the UN in particular look like troublesome trivialities to Moscow. Russians love this. Incidentally, Americans would too…)


  • Nothing (the probability that Washington will manage to comprehend what is actually going on and turn it around on Moscow is very close to zero)


  • Massive media blather and especially social media buzz in the US (Obama’s kryptonite seems to be social media)
  • Massive white-knighting around the world about how “someone should do something!” where “someone” always really means “Americans” and “do something” always means “blow something/someone up (but without actually offending or hurting anyone or actually blowing anything up… on second thought, just talk convincingly tough about taking action and then censor the media to make it appear that things that don’t affect my life at all but rile me up all the same have actually ceased to occur)”
  • The US to double-down on its “anti-terror” investment in one form or another
  • The US to bleed money it doesn’t have
  • The US to commit resources it can’t afford
  • Prolong ongoing American strategic distractions in politically irrelevant areas
  • Prolong ongoing developmental and structural distractions within the US military (Washington has a strategic need to widen the gap in space tech, turn the Air Force into the Space Force and extend its naval dominance to space, not yet another multi-billion-dollar plan for a truck design that would be great if we get in a time machine and re-occupy Iraq but useless in actual force-on-force infantry operations.)

Distracting Washington is the primary goal, not actually supporting Assad or causing problems for this or that American-aligned faction in Syria. Support for Assad and causing problems for whichever groups happen to be American proxies this week is incidental to the goal of cheaply stirring shit up. He’s trying to suck the Americans in somewhere that is cheap for Moscow but very expensive for Washington (both politically and financially).

This strategy worked very well for him in Afghanistan. It bought him an opening to get its way with Poland (after the demise of almost the entire Polish government in a profoundly well-timed plane crash inside of Russia), invade Georgia, take over Ukraine and demonstrate that American security promises are empty whenever Washington is distracted. These actions have had the side effect of deflating the European economy, prompting France to create an opening for itself to re-establish its West African empire and destroy ENI’s main gas alternative (by blowing up Lybia — and before you ask, no that had nothing at all to do with Ghaddafi, freedom or the “Arab Spring”).

Making your opponent expend massively more effort than you do is a winning strategy. The US actually used to be very good at making this sort of play itself, but has apparently lost the touch ever since it bought into the totally bullshit idea that peace was about to break out all over the world with the fall of the Iron Curtain. Oops.

Interestingly, one of the most outspokenly “pro-peace” sort of nations, France, has recovered its knack for both low-cost/high-yield military operations and empire building all while continuing to make the Americans look like the “real bad guy” most of the time (even if its half tongue-in-cheeck). Paris and Moscow are both riding pretty sizeable winning streaks achieved through some heavy-duty, but subtle, geopolitical maneuvering over the last fifteen years**. Impressive.

The appearance of support for Assad works in Putin’s favor because it makes Washington get even more intense about not being his supporter, and reality be damned because politicians are absolutely going to be tripping over each other to be the first to condemn and then be seen as acting against these “ill intended and dangerous Russian activities”.

The air strikes aren’t designed to actually support Assad winning the civil war, they are designed to create space for him. This delays any conclusion to a conflict which is itself useful to Moscow. This gives Moscow time to decide whether it is worth the trouble to support the Alawites once things are over, and judging by how easy it was to dupe Washington into blowing a decade of prosperity in Afghanistan for absolutely no reason at all, it may just be able to turn this strategy around again in Syria.

The Turks, Sunnis and Kurds are all much more important geopolitically. Being a sponsor of Assad’s Syria would turn out the same way Russia’s “sponsorship” of Iran has turned out (lukewarm on the hottest of days). The Arabs are fundamentally more important to Moscow, so the Persians will only get anything from Moscow when it would hurt Washington to give Tehran anything. Other than that, they are just the red-headed stepchildren Russia doesn’t really have much use for. These are “meh, take-em-or-leave-em” allies. The Alawites in Syria are in very nearly the same situation, as evidenced by several political generations of foreign influence now.

[* This would be true of any of a list of potential Russian military moves right now. Listed above are the two primary short-term goals against which the decision was made to actually order sorties in Syria and at this time. Consider that an incidental outcome of invading Georgia at the the end of August 2008 was crashing the European economy. Once it was demonstrated that the East European side of the Yen carry trade was not a secure way to underwrite Western European securities that carry trade unwound and with it quite a few other things that happened to be very ready to get blown over at the next strong wind. People were very deeply invested, both emotionally and financially, in being blind to this risk. If this were a real risk it meant that the pan-European dream didn’t make sense. It meant that war was not actually “a thing of the past”. It meant that pure egalitarianism was unworkable. The reality that offensive power still matters more than the opinions of intellectuals who spend most of their time trying to not offend one another is downright scary. This emotional barrier gave birth to a worldview which made the bizarre explanations that invoke a mysterious “American economic contagion” from two years prior sound like a reasonable definition of “the problem” — the economy is full of highly technical issues and mysterious pitfalls, after all. It is much more comfortable to think that “the Americans” might be the problem than either “the Russians” or the chance that the European economy itself might be inherently unsustainable. Of course, these explanations miraculously avoided any mention of the unopposed Russian invasion aimed at one of two non-Russian pipelines feeding Europe’s economy the week before the crash. So while Putin’s reasoning behind making the attacks where and when they did focus around the two goals above this is a period in which he stands to gain a lot by making very public demonstrations of political and military strength.]

[ ** What France and Russia have been doing is not evil.  Geopolitics is what it is, and its not going to change for you or me. You can’t start moralizing about it just because your side is on the losing end of some issue, or some particular aspect of history is emotionally significant to you (right now), or because you really, really want the world to be some great centrally-administered perfect Eutopia. That sort of thinking doesn’t get anyone anywhere. Letting your emotions get the better of you in politics — thinking in terms of “what should be” instead of “what is” — only confers blindness.]

Iterators? We Don’t NEED No Stinking Iterators!

Every so often a request for “implementation of iterators for maps” over hashes/maps/dicts or some other K-V data structure appears on mailing list for a functional langauge. I’ve spent years making heavy use of iterators in imperative languages, and the way they fit into Python is really great. For Python. I totally understand where some of these folks are coming from, they just don’t realize that functional languages are not where they came from.

So… “Is this post the result of some actual event”? Yeah, you got me. It is. On the erlang-questions mailing list someone asked “Are maps ever going to get an iterator?” Again.

Erlang is definitely not Kansas, but people thinking either that it is or (more dangerously) that it should be and then trying to influence the maintainers to make it that way (and then the powers-that-be getting in a panic over “market share” and doing horrible things to the language…) worries me a bit.

There is no faster way to paint a functional language into a corner than to try making it occasionally imperative. Conversely, consider the syntactic corner C++ and Java have painted themselves into by trying to include functional features as after-thoughts where they really didn’t belong.

(I know, I know, death-by-kitchen-sink is a proud C++ tradition. It is becoming one for Java. Even though I hate Java there is no sense in making it worse by cluttering its syntax and littering it with gotchas and newbie-unfriendly readability landmines in the interest of providing features few Java coders understand the point of, especially when the whole concept of state management in a bondage-and-discipline OOP language like Java is to keep everything in structs with legs (not anonymous closures over state that is temporarily in scope…). The lack of such problems were previously one of the main points that favored Java over C++… well, that and actual encapsulation. Hopefully Rust and D can resist this temptation.)

This frustrates me. It is almost as if instead of picking a tool that matches a given job, people learn one tool and then try over time to make a super-powered Swiss Army knife of it. This never turns out well. The result is more Frankenstein’s Monster than Swiss Army knife and in the best case it winds up being hard to learn, confusing to document and crap at everything.

What’s worse, people assume that the first tool they learned well is the standard by which everything else should be judged (if so, then why are they learning anything else?). It follows, then, that if a newly studied LangX does not have a feature of previously used LangY then it must be introduced because it is “missing”. (I do admit, though, to wishing other languages had pattern matching in function heads… but I don’t bring this up on mailing lists as if its a “missing feature”; I do, however, cackle insanely when overloading is compared with matching.)

Let’s say we did include iterators for maps into Erlang — whatever an “iterator” is supposed to mean in a list-are-conses type functional language. What would that enable?

-spec foreach(fun(), map()) -> ok.

That sort of looks pointless. Its exactly the same as lists:foreach(Fun, maps:to_list(Map)) or maybe lists:foreach(Fun, maps:values(Map)). Without knowing whether we’re trying to build a new map based on the old one or get some side effect out of Fun then its hard to know what the point is.


-spec map(fun(), OldMap :: map()) -> {ok, NewMap :: map()}.

But… wait, isn’t that just maps:map/2 all over again?

I think I know where this is going, though. These people really wish maps were ordered dictionaries, because they want keys to be ordered. So they want something like this:

-spec side_effects_in_order_dammit(fun(), map()) -> ok.
side_effects_in_order_dammit(F, M) ->
    Ordered = [{K, maps:get(K, M)} || K <- lists:sort(maps:keys(M))],
    ok = lists:foreach(F, Ordered).

But wait, what order should the keys be in, anyway?

This is a slow, steady march to insanity. “Give me iterators” begets “Let’s have ordered maps” begets “Let’s have ordered iterators for maps” and so on, and eventually you wind up with most of the Book of Genesis in the Devil’s Bible of Previously Decent Functional Languages. All the while, totally forgetting that these things already exist in another form. There are more data structures than just maps for a reason.

This just gets ridiculous, and it isn’t even what hashes are about to begin with.

Horrible, Drunk Code (but it works and demonstrates a point)

Over on StackOverflow otopolsky was asking about how to make an Erlang program that could read selected lines in a huge file, offset from the bottom, without exploding in memory (too hard).

I mentioned the standard bit about front-loading and caching the work of discovering the linebreak locations, the fact that “a huge text file” nearly always means “a set of really huge log files” and that in this case tokenizing the semantics of the log file within a database is a Good Thing, etc. (my actual answer is here).

He clearly knew most of this, and was hoping that there was some shortcut already created. Well, I don’t know that there is, but it bothered me that his initial stab at following my advice about amortization of linebreak discovery resulted in an attempt to read a 400MB text file in to run a global match over it, and that this just ate up all his memory and made his machine puke. Granted, my initial snippet was a naive implementation that didn’t take size into account at all, but…

400MB? Eating ALL your memory? NO WAY. Something must be done… A call to action!

The main problem is I’m already a bit jeezled up because my wife broke out some 泡盛 earlier (good business day today)… so any demo code I will produce will be, ahem, a little less than massive-public-display worthy (not that the 6 or 7 guys on the planet who actually browse Erlang questions on SO would care, or don’t already know who I am). So… I’m posting here:


linebreaks(File) ->
    {ok, FD} = file:open(File, [raw, read_ahead, binary]),
    Step = 1000,
    Count = 1,
    Loc = 1,
    Indexes = [],
    Read = file:read(FD, Step),
    {ok, Result} = index(FD, Read, Count, Loc, Step, Indexes),
    ok = file:close(FD),
    [{1, Loc} | Result].

index(FD, {ok, Data}, Count, Loc, Step, Indexes) ->
    NewLines = binary:matches(Data, <<$\n>>),
    {NewCount, Found} = indexify(NewLines, Loc, Count, []),
    Read = file:read(FD, Step),
    index(FD, Read, NewCount, Loc + Step, Step, [Found | Indexes]);
index(_, eof, _, _, _, Indexes) ->
    {ok, lists:reverse(lists:flatten(Indexes))};
index(_, {error, Reason}, _, _, _, Indexes) ->
    {error, Reason, Indexes}.

indexify([], _, Count, Indexed) ->
    {Count, Indexed};
indexify([{Pos, Len} | Rest], Offset, Count, Indexed) -> 
    NewCount = Count + 1,
    indexify(Rest, Offset, NewCount, [{Count, Pos + Len + Offset} | Indexed]).

As ugly as that is, it runs in constant space and the index list produced on a 7,247,560 line 614,754,920 byte file appears to take a bit of space (a few dozen MB for the 7,247,560 element list…), and temporarily requires a bit more space during part of the return operation (very sudden, brief spike in memory usage right at the end as it returns). But it works, and returns what we were looking for in a way that won’t kill your computer. And… it only takes a 14 seconds or so on the totally crappy laptop I’m using right now (an old dual-core Celeron).

Much better than what otopolsky ran into when his computer ran for 10 minutes before it started swapping after eating all his memory!


lceverett@changa:~/foo/hugelogs$ ls -l
合計 660408
-rw-rw-r-- 1 ceverett ceverett       928  9月  3 01:31 goofy.erl
-rw-rw-r-- 1 ceverett ceverett  61475492  9月  2 23:17 huge.log
-rw-rw-r-- 1 ceverett ceverett 614754920  9月  2 23:19 huger.log
ceverett@changa:~/foo/hugelogs$ erl
Erlang/OTP 18 [erts-7.0] [source] [64-bit] [smp:2:2] [async-threads:10] [kernel-poll:false]

Running $HOME/.erlang.
Eshell V7.0  (abort with ^G)
1> c(goofy).
2> {HugeTime, HugeIndex} = timer:tc(goofy, linebreaks, ["huge.log"]).
3> {HugerTime, HugerIndex} = timer:tc(goofy, linebreaks, ["huger.log"]).
4> HugerTime / 1000000.
5> HugeTime / 1000000.
6> lists:last(HugeIndex).
7> lists:last(HugerIndex).

Rather untidy code up there, but I figure it is readable enough that otopolsky can get some ideas from this and move on.

ザ自転車 v3.0


それで、後の参考の為にこれで写真を張る(child seat付き)。


Vicarious Reminiscence

Remember being 4-years-old being taught by your parents how to read? Just a vague memory for most of us, but an ongoing process for my own kids today…

AnnasNoteToday, some Sanrio stationary; tomorrow, THE WORLD! (…or more likely some other blog in the digital wasteland, but we all have to start somewhere.)

XML: Xtensively Mucked-up Lists (or “How A Committee Screwed Up Sexps”)

Some folks are puzzled at why I avoid XML. They ask why I “try so hard to avoid XML” and do crazy things like write ASN.1 specs, use native language terms when possible (like Python config files in Python dicts, Erlang configs in Erlang terms, etc.), consider YAML/JSON a decent last resort, and regard XML to a non-option.

I maintain that XML sucks. I believe that it is, to date, the most perfectly subtle (or not-so-subtle) way to screw up sexps. Consider that: “screw up sexps”. What a thing to say! How in the world would one even propose to screw up such a simple idea? Let’s consider an example…

Can you identify the semantic difference among the following examples?
(Inspired by the sample XML in the Python xml.etree docs)

Verson 1

<country name="Liechtenstein">
  <neighbor name="Austria" direction="E"/>
  <neighbor name="Switzerland" direction="W"/>

Version 2


Version 3

<country name="Liechtenstein" rank="1" year="2008" gdppc="141100">
  <neighbor name="Austria" direction="E"/>
  <neighbor name="Switzerland" direction="W"/>

Version 4

And here there is a deliberate semantic difference, meant to be illustrative of a certain property of trees… which is supposedly the whole point.

  <country rank="1" year="2008" gdppc="141100">
      <name direction="E">Austria</name>
      <name direction="W">Switzerland</name>

Which one should you choose for your application? Which one is obvious to a parser? From which could you more than likely write a general parsing routine that could pull out data that meant something? Which one could you turn into a program by defining the identifier tags as functions somewhere?

Consider that last two questions carefully. The so-called “big data” people are hilarious, especially when they are XML people. There is a difference between “not a large enough sample to predict anything specific” and “a statistically significant sample from which generalities can be derived”, certainly, but that has a lot more to do with representative sample data (or rather, how representative the sample is) than the sheer number of petabytes you have sitting on disk somewhere. “Big Data” should really be about “Big Meaning”, but we seem to be so obsessed over the medium that we miss the message. Come to think of it, this is a problem that spans the techniverse — it just happens to be particularly obvious and damaging in the realm of data science.

The reason I so hate XML is because the complexity and ambiguity introduced in an effort to make the X in XML mean something has crippled it in terms of clarity. What is a data format if it confuses the semantics of the data? XML is unnecessarily ambiguous to the people who have to parse (or design, document, discuss, edit, etc.) XML schemas, and makes any hope of readily converting some generic data represented as XML into a program that can extract its meaning without going to the extra work of researching a schema — which throws the entire concept of “universality” right out the window.

Its all a lie. A tremendous amount of effort has been wasted over the years producing tools that do nothing more than automate away the mundane annoyances dealing with the stupid way in which the structure is serialized. These efforts have been trumpeted as a major triumph, and yet they don’t tell us anything about the resulting structure, which itself is still more ambiguous than plain old sexps would have been. Its not just that its a stupid angle-bracket notation when serialized (that’s annoying, but forgiveable: most sexps are annoying paren, obnoxious semantic whitespace, or confusing ant-poop delimited — there just is no escape from the tyranny of ASCII). XML structure is broken and ambiguous, no matter what representation it takes as characters in a file.

Erlang: Writing Terms to a File for file:consult/1

I notice that there are a few little helper functions I seem to always wind up writing given different contexts. In Erlang one of these is an inverse function for file:consult/1, which I have to write any time I use a text file to store config data*.

Very simply:

write_terms(Filename, List) ->
    Format = fun(Term) -> io_lib:format("~tp.~n", [Term]) end,
    Text = lists:map(Format, List),
    file:write_file(Filename, Text).

[Note that this *should* return the atom 'ok' — and if you want to check and assertion or crash on failure, you want to do ok = write_terms(Filename, List) in your code.]

This separates each term in a list by a period in the text file, which causes file:consult/1 to return the same list back (in order — though this detail usually does not matter because most conf files are used as proplists and are keysearched anyway).

An annoyance with most APIs is a lack of inverse functions where they could easily be written. Even if the original authors of the library don’t conceive of a use for an inverse of some particular function, whenever there is an opportunity for this leaving it out just makes an API feel incomplete (and don’t get me started on “web APIs”… ugh). This is just one case of that. Why does Erlang have a file:consult/1 but not a file:write_terms/2 (or “file:deconsult/2” or whatever)? I don’t know. But this bugs me in most libs in most languages — this is the way I usually deal with this particular situation in Erlang.

[* term_to_binary/1 ←→ binary_to_term/1 is not an acceptable solution for config data!]