The Intellectual Wilderness There is nothing more useless than doing efficiently that which should not be done at all.

2015.02.28 15:29

Erlang: Maps, Comprehensions and Side-effecty Iteration

Filed under: Computing — Tags: , , , , — zxq9 @ 15:29

In Erlang it is fairly common to want to perform a side-effecty operation over a list of values, not because you want to collect an aggregate (fold), actually map the input list to the output list (map), or build a new list in some way (list comprehension) but because you just want the side-effects of the operation.

The typical idiom (especially for broadcasting messages to a list of processes) is to use a list comprehension for this, or sometimes lists:map/2:

%% Some procedural side-effect, like spawning list of processes or grabbing
%% a list of resources:
[side_effecty_procedure(X) || X <- ListOfThings],

%% Or, for broadcasting:
[Pid ! SomeMessage || Pid <- ListOfPids],

%% And some old farts still do this:
lists:map(fun side_effecty_procedure/1, ListOfThings),

%% Or even this (gasp!) which is actually made for this sort of thing:
lists:foreach(fun side_effecty_procedure/1, ListOfThings),
%% but lacks half of the semantics I describe below, so this part
%% of the function namespace is already taken... (q.q)

That works just fine, and is so common that list comprehensions have been optimized to handle this specific situation in a way that avoids creating a return list value if it is clearly not going to be assigned to anything. I remember thinking this was sort of ugly, or at least sort of hackish before I got accustomed to the idiom, though. “Hackish” in the sense that this is actually a syntax intended for the construction of lists and only incidentally a useful way to write a fast side-effect operation over a list of values, and “ugly” in the sense that it is one of the few places in Erlang you can’t force an assertion to check the outcome of a side-effecty operation.

For example, there is no equivalent to the assertive ok = some_procedure() idiom, or even the slightly more complex variation used in some other situations:

case foo(Bar) of
    ok    -> Continue();
    Other -> Other

a compromise could be to write an smap/2 function defined something like

smap(F, List) ->
    smap(F, 1, List).

smap(F, N, []) ->
smap(F, N, [H|T]) ->
    case F(H) of
        ok              -> smap(F, N + 1, T);
        {error, Reason} -> {error, {N, Reason}}

But this now has the problem of requiring that whatever is passed as F/1 have a return type of ok | {error, Reason} which is unrealistic without forcing a lot of folks to wrap existing side-effecty functions in something that coerces the return type to match this. Though that might not be bad practice ultimately, its still perhaps more trouble than its worth.

It isn’t like most Erlang code is written in a way where side-effecty iteration over a list is likely to fail, and if it actually does fail the error data from a crash will contain whatever was passed to the side-effecty function that failed. But this still just doesn’t quite sit right with me — that leaves the prospect of list iteration in the interest of achieving a series of side effects as at least a little bit risky to do in the error kernel (or “crash kernel”) of an application.

On the other hand, the specific case of broadcasting to a list of processes would certainly be nice to handle exactly the same way sending to a single process works:

%% To one message
Pid ! SomeMessage,
%% To a list of messages
Pids ! SomeMessage,

Which seems particularly obvious syntax, considering that the return of the ! form of send/2 is the message itself, meaning that the following two would be equivalent if the input to the ! form of send/2 also accepted lists:

%% The way it works now
Pid1 ! Pid2 ! SomeMessage,
%% Another way I wish it worked
[Pid1, Pid2] ! SomeMessage,

In any case, this is clearly not the sort of language wart that gets much attention, and its never been a handicap for me. It just seems a bit hackish and ugly to essentially overload the semantics of list comprehensions or (ab)use existing list operations like maps and folds to achieve the side effect of iterating over a list instead of having a function like smap/2 which is explicitly designed to achieve side-effecty iteration.

2013.04.20 20:12

Version String Comparison in Python

Filed under: Computing — Tags: , , , , — zxq9 @ 20:12

Comparing and sorting version strings in Python scripts has come up a few times lately. Here are some simple approaches. These examples assume that the version strings will be numeric representations and not include elements like “rc-1” or “alpha” or whatever. If your problem includes these kinds of elements, don’t worry — they are solvable by applying a touch of regex-fu to the processes below.

First, sorting a bunch of version number strings. The problem is that many version strings reported by various packages are just that: strings. Usually the only way to get version information is to ask for it (“[prog] –version” in a shell script, or “SELECT version();” from a database, or whatever) and interpret whatever gets sent to stdout. Those strings don’t mean the same thing to comparison operators that they mean to us so long as they remain strings. For example, the string ‘3.5.10’ is greater alphabetically than ‘3.15.1’ but is a higher version. So we need to convert them to tuples of integers to make comparison of them natural (again, all these examples assume integer-only version strings, but minor changes to the process can allow you to compare anything — and assigning a custom collation order can allow you to sort against any arbitrary order of arbitrary symbols, but that’s beyond the scope of the basic nature of the problem I’m addressing here):

>>> vers
['3.5.10', '3.15.1', '2.5.7', '0.20.0', '2.12.5', '10.4.3']
>>> s_vers = [tuple([int(x) for x in n.split('.')]) for n in vers]
>>> s_vers
[(3, 5, 10), (3, 15, 1), (2, 5, 7), (0, 20, 0), (2, 12, 5), (10, 4, 3)]
>>> vers[0] > vers[1]
>>> s_vers[0] > s_vers[1]
>>> cmp(vers[0], vers[1])
>>> cmp(s_vers[0], s_vers[1])
>>> s_vers.sort()
>>> s_vers
[(0, 20, 0), (2, 5, 7), (2, 12, 5), (3, 5, 10), (3, 15, 1), (10, 4, 3)]

The list comprehension (actually, two nested list comprehensions) assignment to s_vers is the important part of this. Once that is done you can compare whatever you want. If the version number is buried as an element in a dict or larger list (likely) you can do this conversion in place by adding a new element to the contained structures and then sort the greater list based on that element:

>>> packages
[{'version': '3.5.10', 'name': 'foo'}, {'version': '3.15.1', 'name': 'foo'}, {'version': '2.5.7', 'name': 'foo'}, {'version': '0.20.0', 'name': 'foo'}, {'version': '2.12.5', 'name': 'foo'}, {'version': '10.4.3', 'name': 'foo'}]

OK, that’s pretty ugly (uglier depending on how your browser renders <pre> type text), so I’ll print them in order so we can watch the list change more easily.

>>> for p in packages:
...   print p
{'version': '3.5.10', 'name': 'foo'}
{'version': '3.15.1', 'name': 'foo'}
{'version': '2.5.7', 'name': 'foo'}
{'version': '0.20.0', 'name': 'foo'}
{'version': '2.12.5', 'name': 'foo'}
{'version': '10.4.3', 'name': 'foo'}
>>> for p in packages:
...   p.update({'version_tuple': tuple([int(x) for x in p['version'].split('.')])})
>>> for p in packages:
...   print p
{'version_tuple': (3, 5, 10), 'version': '3.5.10', 'name': 'foo'}
{'version_tuple': (3, 15, 1), 'version': '3.15.1', 'name': 'foo'}
{'version_tuple': (2, 5, 7), 'version': '2.5.7', 'name': 'foo'}
{'version_tuple': (0, 20, 0), 'version': '0.20.0', 'name': 'foo'}
{'version_tuple': (2, 12, 5), 'version': '2.12.5', 'name': 'foo'}
{'version_tuple': (10, 4, 3), 'version': '10.4.3', 'name': 'foo'}
>>> packages.sort(key = lambda x:x['version_tuple'])
>>> for p in packages:
...   print p
{'version_tuple': (0, 20, 0), 'version': '0.20.0', 'name': 'foo'}
{'version_tuple': (2, 5, 7), 'version': '2.5.7', 'name': 'foo'}
{'version_tuple': (2, 12, 5), 'version': '2.12.5', 'name': 'foo'}
{'version_tuple': (3, 5, 10), 'version': '3.5.10', 'name': 'foo'}
{'version_tuple': (3, 15, 1), 'version': '3.15.1', 'name': 'foo'}
{'version_tuple': (10, 4, 3), 'version': '10.4.3', 'name': 'foo'}

We started out with a list of dictionaries, each containing a package name and a version string. The first loop updates each dictionary to include a version tuple, and the next orders the dictionaries within the list by the tuple values. Viola! We have a list of dictionaries sorted by version number. Of course, if there are more than one package name involved you will want to sort on the package name first, then the version tuple as a secondary criteria (so you don’t compare versions of package ‘foo’ against versions of package ‘bar’, or sort glibc against firefox, for example).

If lambdas are unfamiliar to you, don’t be scared off by the package.sort() line up there — lambdas are perfectly safe, reliable and quite concise once you understand the way they are used.

From here writing a sort function for lists of version strings should be pretty obvious. And… that means that writing a comparison function for two individual elements that works the same way the built-in cmp() function works is trivial:

>>> def ver_tuple(z):
...   return tuple([int(x) for x in z.split('.') if x.isdigit()])
>>> def ver_cmp(a, b):
...   return cmp(ver_tuple(a), ver_tuple(b))
>>> vers
['3.5.10', '3.15.1', '2.5.7', '0.20.0', '2.12.5', '10.4.3']
>>> ver_cmp(vers[0], vers[1])
>>> ver_cmp(vers[0], vers[0])
>>> ver_cmp(vers[3], vers[4])

Nice and easy.

Now I can’t figure out why comparison functions I’ve seen floating around occupy so much space and are hard to follow — full of class declarations and exec loops within exec loops (!!!) and other nonsense. At the most you will need to add some regular expression matching to extract/split on the correct substrings from the version string. That means you would have to import the re module and the list comprehension will grow by a few (maybe 10) characters.

Powered by WordPress