Category Archives: Computing

Republication of GNU’s Guile 2.0 Manual

I’ve republished the “one page per node” html version of the GNU Guile 2.0 Manual here. I’ve been referencing the manual quite a bit lately and noticed that, at least from my location, the site responds quite slowly at times. So I’m mirroring the manual where I know its fast and I can always find it.

I hope that the reason the GNU servers are responding slowly is legitimate load and not something more sinister. Attacks on community sites are one of the more stupid forms of tragedy that these commons experience.

Using the alternatives command

A quick demonstration of using alternatives:

[root@taco ~]# cat /opt/foo/this
#! /bin/bash
echo "I am this."
[root@taco ~]# cat /opt/foo/that
#! /bin/bash
echo "I am that."
[root@taco ~]# cat /opt/foo/somethingelse 
#! /bin/bash
echo "I am somethingelse."
[root@taco ~]# which thisorthat
/usr/bin/which: no thisorthat in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
[root@taco ~]# alternatives --install /usr/bin/thisorthat thisorthat /opt/foo/this 1
[root@taco ~]# alternatives --install /usr/bin/thisorthat thisorthat /opt/foo/that 2
[root@taco ~]# which thisorthat
/usr/bin/thisorthat
[root@taco ~]# ls -l /usr/bin/thisorthat 
lrwxrwxrwx. 1 root root 28 Mar 23 10:38 /usr/bin/thisorthat -> /etc/alternatives/thisorthat
[root@taco ~]# ls -l /etc/alternatives/thisorthat 
lrwxrwxrwx. 1 root root 13 Mar 23 10:38 /etc/alternatives/thisorthat -> /opt/foo/that
[root@taco ~]# thisorthat
I am that.
[root@taco ~]# alternatives --set thisorthat /opt/foo/that 
[root@taco ~]# thisorthat
I am that.
[root@taco ~]# alternatives --set thisorthat /opt/foo/somethingelse 
/opt/foo/somethingelse has not been configured as an alternative for thisorthat
[root@taco ~]# alternatives --display thisorthat
thisorthat - status is manual.
 link currently points to /opt/foo/that
/opt/foo/this - priority 1
/opt/foo/that - priority 2
Current `best' version is /opt/foo/that.

Once upon a time, in a time long forgotten, the alternatives command on Fedora-descended systems (RHEL, Scientific Linux, CentOS, etc.) seemed a magical thing. It permitted two versions of the same utility to exist on a system at the same time by automatically switching links from a canonical path to the actual versioned path when necessary. Very cool. This was so cool, in fact, that other system utilities came to rely on it, namely, the post-install section on quite a few RPMs calls alternatives to set up parallel versions of things like language runtimes.

The downside to this automation is use of alternatives itself seems to have become a lost art (or possibly its that the popularity of this distro family has simply diluted the knowledge pool as the user ranks have swelled with newcomers who simply take the automation for granted).

It should be noted that alternatives is a system-wide command, so when root sets an alternative, it affects everyone’s view of the system. It should be further noted that this problem has other, but very similar, solutions on other systems like Gentoo and Debian. Gentoo’s eselect system is a bit more sophisticated and can manage families of alternatives at once (links to a number of disparate language utilities which have to change in arbitrary ways based on which underlying runtime is selected, for example). Fortunately its not very difficult to write a wrapper for alternatives that can provide a similar experience, but unfortunately I don’t have the time this morning to get into that here.

Source or Satire?

From time to time I encounter openly discoverable code that is so wild in nature that I can’t help but wonder if the author was writing a machine function or a satirical statement.

Groovy source: ArrayUtil.java

After spending a few days plowing through Java code at the outset of a new Android project I found myself checking around for practical alternatives. In the course of that search (which netted Scala, Groovy and Clojure, in descending order of easy tooling for Android) I stumbled across this gem of a source file in the Groovy codebase. At first I couldn’t really tell if this was a joke about Java’s expressiveness or a functioning bit of code, but then I realized it is actually both — all the more funny because its expressing a cumbersome optimization that will execute on the same JVM either way:

ArrayUtil.java

Breach: A browser as practical satire

Someone from the Erlang world was kind enough to paste a link to Breach — a browser written in node.js. Its so full of meta fail and manifests the very essence of hipster circular logic that… I can only assume it is satire in the same vein as INTERCAL.

breach.cc

 

Update Dell PowerEdge LCD User String LCD Remotely, Without Rebooting, Using IPMI

The other day I noticed an(other) horribly annoying thing about Dell PowerEdge servers: nothing in the iDRAC settings that is useful to change can be changed without either a reboot or at least resetting the DRAC itself. Obviously rebooting is unacceptable for a production server, unnecessarily painful for virtualization hosts, and might totally clobber your ability to contact the DRAC after a reset if you cannot reach its default address subnet after a reset (192.68.0.20 or something like that is the default address).

It turns out that all this mess is totally unnecessary if one just ignores the (stupid) web interface to the DRAC and the RACADM interface and instead sends straight IPMI commands with a utility like ipmitool. I found a script written by Tanner Stokes that makes the “user string” setting easy to do on the fly, and abstracts away the mess of getting hex values for each string character (IPMI can do anything, but none of it is straight forward). Here is my update of that script:

#!/usr/bin/python3

# This script changes the LCD user string on Dell machines that conform to IPMI 2.0

from subprocess import call
from sys import stderr

target_host = input ('\nEnter name or IP of target host:\n');
user_string = input('Enter LCD string:\n')

hex_string = ' '.join([hex(ord(z)) for z in user_string])

print('\nTrying to change LCD string on {0}...'.format(target_host))

try:
    retcode = call('/usr/sbin/ipmitool -H {0} -I lan -U root raw 0x6 0x58 193 0 0 {1} {2}'.format(target_host, str(len(user_string)), hex_string), shell=True)
    if retcode == 0:
        print('Success!')
    elif retcode < 0:
        print('Terminated by signal', -retcode, file=stderr)
    else:
        print('Oops! Returned', retcode, file=stderr)
except OSError as e:
    print('Failed with error:', e, file=stderr)

# The following supposedly sets the user string to show on the LCD, but
# is still broken (probably wrong function number) -CRE

# retcode = call('/usr/sbin/ipmitool -H {0} -I lan -U root raw 0x6 0x58 194 0'.format(target_host))

##############################################################################
# Changelog:
#
# Tanner Stokes - tannr.com - 2010-02-26
#  * Original author
#
# Craig Everett <zxq9@zxq9.com> 2013-12-13
#  * Update to Python3
#  * Pythonification
#  * Cosmetic changes

It would be better if it were wrapped in an optional main(), accepted arguments for the target_host and user_string, and accepted the name of an input file to go through — but I’m not excited enough just now to do that. In any case, thanks, Tanner!

Mental Overhead

I haven’t had any reason to write assembly by hand for quite a while, but the other day a deep hardware geek friend of mine asked for an opinion on an instruction set for an architecture he is working on — and of course that means using his assembly instructions directly.

I had forgotten what that is like. It is common today to hear people who have never written anything in assembly put C in the same category as assembly, having themselves heard that C is “low level”. C is certainly lower level than, say, Python or Erlang, but its a far cry from assembly. Saying “low level” isn’t the same thing as saying “hardware level” or “lower than…”. Abstractions are always relative to the level of your problem of the moment.

Perhaps C is metaphorically comparable to assembly if you are programming in userland and your major concerns are stretching background images or including on-click sound effects or whatever. But that doesn’t compare to assembly. What is most interesting about C is that compilers can be written that abstract away the quirks of different hardware instruction sets and make programs magically portable.

Of course, the phrase “portable” is in the same boat as the phrase “low level” these days. If its not interpreted or compiled to bytecode (or in the case of the ultra ignorant, if its not Java) then folks who have no experience in compiled languages will think you must be mistaken for using the phrase “portable”. Some more Java-hype driven misconceptions that haven’t yet died out*.

After a long stint in the world of garbage collection and either extremely strong typing or completely dynamic typing it is funny to think of everything as an integer again. Assembly is not a bad way to deal with hardware-level problems (provided the instruction set doesn’t suck). A good set of machine instructions is precisely the sort of thing needed to solve the sort of problems you encounter when dealing with specific device behaviors or, say, bootstrapping a compiler for a simple language like C. An assembly with a decent syntax provides mnemonic devices in a way that makes dealing with those machine instructions enormously easier. (But I wouldn’t try writing a graphical MMORPG in pure assembly.)

C is excellent for providing abstractions at the level required for portable systems or limited-feature/limited-aspect general programming (and I still wouldn’t try writing a graphical MMORPG in pure C). Building abstractions up from there is a sensible thing to do, of course, and this permits us to stack abstractions up high enough that we can almost talk in human(ish) terms about human problems with computers and do neat things. Deciding whether a particular tool or form of expression is appropriate for a particular set of problems is all about the nature and level of the problems and is rarely a judgment on the tools themselves.

This is why I don’t like monolingual or even single-paradigm programmers. They have been tricked into thinking that mental overhead is a fixed quantity, a universal constant from which all other difficulty and complexity is derived. This is a handicap which can only be surmounted by forcing oneself to explore outside the comfort zone provided by FavLangX or FavParadigmY.

[* What is funny about that is C is far more portable than Java. "Compile once, run anywhere" doesn't actually solve a problem that anybody has in the real world. It is trivial to compile a program once per architecture it will run on, or even once per platform. After all, that is how the runtime environments for interpreted/bytecode languages get published in the first place.]

Keyboards, Machine Guns, and Other Daily Tools

It looks like I’ll be at least occasionally moving between my home in Japan and offices in the US where I may wind up setting up a system for myself to use while I’m there (I’m not a huge laptop fan when it comes to extended work). This brings up an annoying issue: keyboard layouts.

It is difficult to find US-layout keyboards out here, so even though I usually write only a few Japanese-language emails per day its just not practical to use anything but the local flavor. Even if I did have a bunch of US-layout keyboards it would be insanely annoying to have to switch between JP-layout on laptops, server crash carts and customer systems and then switch back to US-layout when I got back to one of my offices. So I’ve gotten accustomed to this layout and it works well for me.

The main keys that do letters and numbers are all in the same place, so it seems like this wouldn’t be a big deal. The problem is the crazy keys that do “top row” and wildcard stuff like bangs, hashes, quotes, backticks, at-marks, brackets, colons, semicolons, parens, etc. All the stuff that is rarely used on, say, a student or blogger’s keyboard is usually worn smooth on a programmer’s keyboard, especially if he switches languages all day. And they are all in radically different places on JP and US layouts.

So… naturally I’ll probably just get a decent one here and keep in the closet over in the US, and whip it out whenever I show up.

But that brings up a point about familiarity and how “good” tools are from the perspective of the one using them. I could easily take the position that US-layout is poo and that JP-layout is superior. Or I could get uber nerd and pretend that some statistical study on finger reach and frequency of blahblahblah matters whatsoever when it comes to programming. It doesn’t, really. That’s to imagine that input is the hard part of programming, and its not — its figuring out what to write in the first place. So its not speed of input, per se, but smoothness of operation. More to the point, its which layout prevents the wetware halting problem: where the human doing the work has to stop what he is doing to figure out something unrelated to the essential task at hand.

But it remains true that some layouts are probably actually worse than others. It follows that other sorts of tools can fall into the realm of “good enough that preference is a matter of taste or familiarity” or in the realm of “actual poorly designed garbage”.

The reminds me of guns. There are several excellent machine gun, rifle and pistol designs employed in militaries across the world. Many of them are decent enough that while some have a slight edge over others in some areas, I’d go to work with pretty much any of them. For instance, the M4 vs. the SCAR. The SCAR is actually better, but the M4 is familiar enough to me and I have enough confidence and skill with it that I just don’t really care which one I wind up getting stuck with.

I don’t have nearly as much faith in the AK-47 as a precision weapon, especially in an environment where quick on/off safety and partial reloading is critical. They are famously resistant to neglect (which is often mistaken for durability), but that’s really a key attribute for a rifle intended for the mindless Commie Horde sweeping en masse across the tundra or untrained insurgent/freedom-fighter/terrorist whose backers need cheap trashy guns with which to arm their cheap trashy goons. Indeed, the AK-47 is in real terms less good than the SCAR or M4 and there is a whole list of rifles and carbines I would consider before going to work with one (but still, its not absolutely awful, just so much less good than the alternatives that I’d avoid it if possible — sort of like Java).

Where this is really striking is with machine guns and pistols. On the pistol side there are a rather large number of designs that actually break down frequently during use. This never happens in a James Bond movie, of course, but in real life it happens at the most inconvenient times. Come to think of it, there is never a convenient time for a pistol to break. Once again, despite the absolute superiority in design of the semi-automatic over the revolver, familiary can overcome the technical deficiencies between the two (with lots of practice) and I would actually prefer to go to work with certain revolvers over certain semi-autos. (This is to say nothing, of course, of the issue of caliber…)

With machine guns, however, the differences in good vs. bad designs are vast. In the nearly any modern military you’re absolutely spoilt. A “bad” gun is one that doesn’t have a TV built into the stock to ease the passage of long turns on security. They are mindlessly easy to load, sight, barrel change, fire, strip, clean, maneuver with, etc. The links disintegrate and can be re-used on unlinked ammo, all sorts of cool toys fit around the thing (which can, sometimes, make them start to suck just from the “too much Star Wars” problem), runaways can have their belt broken, they will eat through just about any garbage that gets caught in the links or even fire bent ammo. They aren’t even unreasonably heavy (and its patently unfair to compare it to the uber lightness of an M4). Its amazing how well these things work. But when they are all you know you start complaining about them, wishing you had a 240 when you’ve been handed an M-60 (because its possible to jam it up if you accidentally load it bolt-forward, or probably lacks a rail system, or you’re an unsufferable weakling complaining because you didn’t get the lightweight bulldog version, or whatever). I’ve had the misfortune of having to go to work with old Soviet machine guns, though, and can attest that they are indeed of almost universally horrible design.

When we say “crew served weapon” in modern armies we mean “the weapon is the centerpiece of the crew” not “this weapon is absolutely unreasonable to assign to any less than three people”. It might have meant that operating the machinery actually took a crew back when tripods included full-sized chairs, ammo came on a horse-drawn cart, and vast amounts of oil and water were consumed in operation. But that was the early 1900′s. We still employ machine guns as crew served weapons beacuse its an advantage to have an AG and actually set up a tripod if you wind up facing off against a for-real infantry force, not because its difficult to wield one. Today a single person can easily maintain and operate a 240, M-60, MAG58, 249, MG42, MG3, or whatever. Not so with, say, the PKM (or heaven forbid the SG-43). An RP-46 is actually better if you come to the field with American-style assumptions that a single person is adequate to handle a machine gun.

The PKM is not really belt fed, its chain fed, and the chain doesn’t disintegrate. Its also extremely strong. Like you can support more than a single person’s weight from a belt and it won’t break. The longer the belt the more bullets, and this seems good, until you realize that it feeds from the wrong side (the right), which prevents a right-handed shooter from feeding the pig himself with his left hand and leaves the indestructible spent chain right in front of the shooter. This means its right underfoot when running after a bit of shooting — which has made be bust my face in the dirt on the top of the gun more than once (not so convenient at interesting moments, and absolutely detrimental to my Cool Point count).

But the failure of design doesn’t stop there. That stupid belt is nearly impossible to reload by hand without wearing gloves and using a lever (box top, table top wrench, whatever) to force the rounds into the thing (yeah you might load 50 rounds by hand, but how about 5000?). They also rust instantly, in accordance with the PKM Belt Rust Time Law — however long its been since you last packed the belt is precisely how long it takes to rust exactly enough to generate a vast amount of busy work without rusting so much that the belt should be discarded. If you try oiling them to prevent that they gum up or actually start growing hair instantly. Its a never ending cycle of trying to keep the belts from making your life suck without giving up and throwing them all away. Which is why the Soviets conveniently invented a reloading machine. Which itself sucks. I can’t even begin to explain the inadequacy of this stupid machine, but it actually is the only way to maintain even a marginally reasonable reload rate for belts — but there is no way you could do this under fire, or on Tuesday (the machine jams spectacularly on random days, Tuesday tending to be the worst day for this for some magical reason).

I haven’t even begun to mention the inadequacy of the ammo crates. The standard ammo crates are insanely stupid. Actually, this isn’t a gripe reserved just for 7.62 ammo, its true for all commie ammo I’ve ever seen. The ammo cans aren’t like the infinitely reusable, universally useful, hermetically sealed, flip-top boxes found in non-backward armies. They are actually cans. Like giant soup cans, but without a pull-tab — not even a sardine-key. They come with a can opener. A huge one (but only one per crate, not one per can). You read that right, a can opener. You know, the lever-kind where you hook the grabby part onto the crimp at the top edge of the can and pull to lever the pointy part down until it makes a tiny puncture, then slide over a touch and repeat until you’ve prized and ripped a gash large enough to do your business. Let that sink in. We’re talking about an ammo can. Like with bullets that people need to do their job, hopefully sometime this year. But once you’re inside the fun just doesn’t stop — no way. The thousand or so rounds inside are in boxes of 5 or 6 or so. The can that you worked so hard to open isn’t full of pre-loaded belts. That would deprive someone of a government job somewhere and that’s just not Progressive. So inside there are dozens and dozens of tiny, crappy, flimsy little cardboard boxes, each containing a few rounds. And the rounds are individually wrapped in tissue paper.

You just can’t make this trash up. Its amazing. How on earth could such a horrible, stupid, backward constellation of designs emerge from one of the two nations to reach the Moon before the end of the 20th century?

A guy I worked with a few years ago called Mule had a theory that this was, in fact, an excellent design for a machine gun system in a Socialist military. Nobody can use it alone, so you can’t get a wild hair up your ass and get all revolutionary — you need to convince at least a platoon to get crazy with you. You employ a gazillion people not only in the loving production and hand gift-wrapping of each one of the billions of rounds of machine gun ammunition throughout the nation, you employ another gazillion or so to open and load the belts. Its the ultimate low employment figure fixer — at least until the state digests enough of itself that this becomes suddenly unsustainable, of course.

Mule’s theory was that this machine gun design — from the actual shittiness of the gun itself to the complete circus of activity which necessarily surrounds its production, maintenance and use — is a brilliant design from the perspective of the State, not the soldier, and that the aims of the two are at odds is simple the natural result of a socialist system. Mule was one of the most insightful people I’ve ever met (and I’m not being rhetorical — he really was a hidden genius).

Thinking about what he said has made me re-evaluate some of my assumptions of bad designs. Perhaps the designs are excellent — not for the end user, but for whoever is in charge of the end user. And that brings me back to thinking about just why the Java programming language is so bad, yet so prolific. Java is the PKM of the programming world. Its everywhere, it sucks, it is good for (some, Stalin-like) bosses, and the whole circus surrounding its existence just won’t ever go away. And sometimes those of us who know in painstaking detail why a 240 (or nearly anything else in common use) is better are still stuck using it to get real work done.

A Technically Inaccurate Explanation of Monads

I’ve come across a multitude of technically inaccurate explanations of monads on the web, all intending to explain them in simple terms to FP newbies, most revolving around their use in Haskell. They all suck. This is mine. (It also sucks, but I hope it sucks less.)

Imagine using a purely functional language where side effects are verboten and we can’t do IO or anything else useful because of this. The host environment can clearly do IO just fine, otherwise our environment couldn’t do things like read memory locations (or our code) and our program would be useless. So it is obvious the host environment can do things externally that our program itself cannot do directly. This does not prevent us from describing what we wish it would do on behalf of our program, however.

In this way the host environment is sort of like a pantheon of gods and our programs are mere mortal blips of disembodied consciousness — capable of thought, desire and expression but powerless to touch the physical world in any way. All we can do is express our desires as regard the results of our thoughts and appeal to the gods to pity us and take action in accordance with our desires. Whether they take actions on our behalf is entirely up to them.*

With this in mind…
Imagine a function written in SomeLanguage:

int f(x): return x

It takes an integer and returns an integer. So what? The system doesn’t do anything with it so it is practically useless.

Now imagine a wrapper function:

tuple w(int x):
    tuple t = (f(x), "Please put this in STDOUT")
    return t

So w() has taken the output of f() and wrapped it in a tuple that contains the value returned by f() and a prayer to the system gods to “Please put this in STDOUT”. Pretty straight forward. Now the system gods can do whatever they want with it, but our language itself can’t — because talking to STDOUT would be a side effect and that’s strictly forbidden on Planet Functon. What we can do, however, is pray that the gods take mercy on us and act according to our wishes as regard the value that accompanies the prayer.

If this is the best model of a monad you can muster at the moment, then just stop here. This model is sufficient to make the concept of monads useful to you as you write programs that do useful work. (Which is the whole point. Unless you’re an academic, in which case your point is to write a lengthy article in a peer reviewed journal in which your explanation of monads is prefaced with a demonstration of your deep knowledge of set theory and type theory and esoteric discussion of things like monoids and thus make it appear that unfamiliarity with these subjects precludes the ability to use monads in a useful way in functional programs.)

But there are a few unresolved issues above, most of which I won’t address below, but the glaring issue of typing is relevant… (The other glaring issue of the undefined and imaginary syntax of SomeLanguage won’t be addressed at all.)

The type tuple is pretty ambiguous for this use (we could be talking about coordinates or relational values or immutable strings or whatever). How are the gods to know that we have a prayer for them? Should they inspect every tuple, or even every value we dream up, just checking for a “prayer” type? Of course not, they are busy doing immaculate things. We need a more clear signal.

One way to clear this up is to make a function that takes a function, its arguments and a request specification and returns a type prayer whose existence is itself a magical signal to get the attention of the gods. So ultimately a prayer is a type whose definition is a combined value composed of a thingy plus a prayer request regarding that thingy:

prayer p(function z, list l, request r):
    return (z(*l), r())

And then define an action of type request and rewrite w() in a way that returns a prayer:

request q = "Print to STDOUT"

prayer w(int x):
    return p(f, x, q)

Here we have defined a request for the system to perform, passed a reference/pointer/label to the function f() along with its arguments to the prayer maker function p(), which has executed f() along with its argument and returned a bundle of type prayer that contains the original integer return value along with a request to the system gods to do something regarding the bundled value.

Note that in Haskell it is common to define the underlying function (represented here by f()) and its monadic version (represented here as w()) separately. I suppose this isn’t strictly necessary — one could define monads in terms of lambdas — but this is the way things are done.

So we have specified a side effect, but we are only praying that the system takes an action based on our specification. Nothing more. Obviously I’m glossing over the details of how a request object itself is actually defined, but this is the general idea behind how monads are used to write programs in purely functional languages that have side effects.

An interesting consequence of this idea of a program specifying actions instead of actually taking them is that it is now trivial to write a host environment that deliberately ignores all prayers. In this way writing a host environment purely for the purpose of examining the behavior is also trivial. Its the ultimate sandboxing tool. It is equally easy to write a host environment that selectively observes requests, taking only some and ignoring all others, or ignoring some and taking all others.

I believe the abundance of confusing monad explanations to be a side effect of the Haskell community being disproportionately populated with mathematicians, type theorists and set theorists (“for real” or self-educated) and their explanations assume a knowledge base and vocabulary that is foreign and intimidating to most newcomers. The Clojure community doesn’t seem to have this problem, at least not as deeply.

[*It is easy to reverse this metaphor, of course, and imagine that we are uncorporeal gods, capable of thought and expression of our command to the slavish host to do whatever we specify. But I like the "program as mortals, host as gods" way better, because it extends to include the very real case of writing a test host that takes no action and merely observes (or even checks for the correctness of) the program's behavior.]

Binary Search: Random Windowing Over Large Sets

Yesterday I came across a blog post from 2010 that said less than 10% of programmers can write a binary search. At first I thought “ah, what nonsense” and then I realized I probably haven’t written one myself, at least not since BASIC and Pascal were what the cool kids were up to in the 80′s.

So, of course, I had a crack at it. There was an odd stipulation that made the challenge interesting — you couldn’t test the algorithm until you were confident it was correct. In other words, it had to work the first time.

I was wary of fencepost errors (perhaps being self-aware that spending more time in Python and Lisp(s) than C may have made me lazy with array indexes lately) so, on a whim, I decided to use a random window index to guarantee that I was in bounds each iteration. I also wrote it in a recursive style, because it just makes more sense to me that way.

Two things stuck out to me.

Though I was sure what I had written was an accurate representation of what I thought binary search was all about, I couldn’t actually recall ever seeing an implementation, having never taken a programming or algorithm class before (and still owning zero books on the subject, despite a personal resolution to remedy this last year…). So while I was confident that my algorithm would return the index of the target value, I wasn’t at all sure that I was implementing a “binary search” to the snob-standard.

The other thing that made me think twice was simply whether or not I would ever breach the recursion depth limit in Python on really huge sets. Obviously this is possible, but was it likely enough that it would occur in the span of a few thousand runs over large sets? Sometimes what seems statistically unlikely can pop up as a hamstring slicer in practical application. In particular, were the odds good that a random guess would lead the algorithm to follow a series of really bad guesses, and therefore occasionally blow up. On the other hand, were the odds better that random guesses would be occasionally so good that on average a random index is better than a halved one (of course, the target itself is always random, so how does this balance)?

I didn’t do any paperwork on this to figure out the probabilities, I just ran the code several thousand times and averaged the results — which were remarkably uniform.

binsearch

I split the process of assignment into two different procedures, one that narrows the window to be searched randomly, and another that does it by dividing by two. Then I made it iterate over ever larger random sets (converted to sorted lists) until I ran out of memory — turns out a list sort needs more than 6Gb at around 80,000,000 members or so.

I didn’t spend any time rewriting to clean things up to pursue larger lists (appending guaranteed larger members instead of sorting would probably permit astronomically huge lists to be searched within 6Gb of memory) but the results were pretty interesting when comparing the methods of binary search by window halving and binary search by random window narrowing. It turns out that halving is quite consistently better, but not by much, and the gap may possibly narrow at larger values (but I’m not going to write a super huge list generator to test this idea on just now).

It seems like something about these results are exploitable. But even if they were, the difference between iterating 24 instead of 34 times over a list of over 60,000,000 members to find a target item isn’t much difference in the grand scheme of things. That said, its mind boggling how not even close to Python’s recursion depth limit one will get, even when searching such a large list.

Here is the code (Python 2).

from __future__ import print_function
import random

def byhalf(r):
    return (r[0] + r[1]) / 2

def byrand(r):
    return random.randint(r[0], r[1])

def binsearch(t, l, r=None, z=0, op=byhalf):
    if r is None:
        r = (0, len(l) - 1)
    i = op(r)
    z += 1

    if t > l[i]:
        return binsearch(t, l, (i + 1, r[1]), z, op)
    elif t < l[i]:
        return binsearch(t, l, (r[0], i - 1), z, op)
    else:
        return z

def doit(z, x):
    l = list(set((int(z * random.random()) for i in xrange(x))))
    l.sort()

    res = {'half': [], 'rand': []}
    for i in range(1000):
        if x > 1:
            target = l[random.randrange(len(l) - 1)]
        elif x == 1:
            target = l[0]
        res['half'].append(binsearch(target, l, op=byhalf))
        res['rand'].append(binsearch(target, l, op=byrand))
    print('length: {0:>12} half:{1:>4} rand:{2:>4}'\
                    .format(len(l),
                            sum(res['half']) / len(res['half']),
                            sum(res['rand']) / len(res['rand'])))

for q in [2 ** x for x in range(27)]:
    doit(1000000000000, q)

Something just smells exploitable about these results, but I can’t put my finger on why just yet. And I don’t have time to think about it further. Anyway, it seems that the damage done by using a random index to make certain you stay within bounds doesn’t actually hurt performance as much as I thought it would. A perhaps useless discovery, but personally interesting nonetheless.

Don’t Get Class Happy

If you find yourself writing a class and you can’t explain what the difference between the class and an instance of that class is, just stop. You should be writing a function.

This antipattern — needless classes everywhere, for everything — is driving me crazy. Lately I see it in Python and Ruby a good bit, where it really shouldn’t occur. I feel like its a mental contagion that has jumped species from Java to other languages.

In particular I see classes used as inter-module namespaces, which is odd since its not like there is a tendency to run out of meaningful names within a single source file (of reasonable length). Importing the module from outside can either import * from foo,or import foo, or from foo import filename, or even import foo as bar (ah ha!) and make flexible use of namespacing where it is needed most — in the immediate vicinity of use.

So don’t write useless, meaningless, fluffy, non-state&behavior classes. Any time you can avoid writing classes, just don’t do it. Write a function. Its good discipline to see how far you can get writing zero classes in an implementation, and then write it with some classes and see which is actually less complex/more readable. In any case writing two solutions will give you a chance to examine the problem from multiple angles, and that’s nearly always an effort that results in better code since it forces you to value grokness over hacking together a few lines that sort of cover the need for the moment (and forgetting to ever come back and do it right).

Or maybe I should be admonishing people to seek flatness in namespaces, since this implies not writing classes that are stateless containers of a bunch of function definitions which incidentally get referred to as “methods” even though they don’t belong to a meaningful object in the first place.

Or maybe I should be railing against the oxymoronic concept of “stateless objects/classes”.

Or maybe I should be screaming about keeping our programming vocabulary straight so that we can finally say what we mean and mean what we say. (Sound advice in all walks of life.)

This last idea is perhaps the most indirect yet powerful of all. But it implies “standards” and so far in my experience “standards” and “computing” don’t tend to turn out very well once we get beyond, say, TCP/IP or ANSI and Unicode.

Development Speed VS Quality

I’ve been working under some pretty insane time constraints lately. Two things jump out at me upon review of my work:

  1. I can, when cornered, churn out thousands upon thousands of lines of functioning code in a flash.
  2. The code works, but it is not particularly insightful or brilliant — it merely works.

The first bit is sort of cool: Typo Monster and Syntax Bear are no longer a part of my life (at least not in my Big Five). Nice.

On the other hand, the second bit isn’t so nice. It means that insightful development requires far more time than people tend to imagine. This has been a point of discussion for decades in engineering and especially software development, but its hard to fully appreciate until you get a chance to compare your own code, written once under duress, with code written for a similar problem domain at a pace that allows more time for grokness.

Now that I think of it, its not the pace exactly which is the problem, it is the way that certain forms of deadline and environmental pressures can re-tune the time budget in a way that extends typing/coding/input time at the expense of grok time.

This is particularly pronounced when it comes to data modeling — no matter if this means in the sense of managed state, a relational database schema, a no-schema blobbulation, a serialization scheme of some sort, or an object/class model.

This is a particularly touchy thing, since rearranging a data model of any sort entails major surgery when more than one thing relies on the way data is stored, but refactoring functions is relatively lightweight (so long as you understand what the intended mapping was to begin with).

I suspect that the vast majority of production code is written under deadline and that most of it relies on expedient and trashy data models. I further suspect that most corporate settings urge programmers to “be programming” and officially establish that “programming” means “typing” instead of “understanding”.

It is shocking to revisit something written in a hurry and simplify it in a flash of insight, and then contemplate what just happened. The surprise is all the more resonant once you fully realize how much less time is required to input insightful code than rushed, verbose-but-functional trash.

Looking to my left and right, it is not particularly evident that much production code benefits from a culture of sacred Grok Time.