The Intellectual Wilderness There is nothing more useless than doing efficiently that which should not be done at all.

2013.04.21 14:35

2013.04.20 20:12

Version String Comparison in Python

Filed under: Computing — Tags: , , , , — zxq9 @ 20:12

Comparing and sorting version strings in Python scripts has come up a few times lately. Here are some simple approaches. These examples assume that the version strings will be numeric representations and not include elements like “rc-1” or “alpha” or whatever. If your problem includes these kinds of elements, don’t worry — they are solvable by applying a touch of regex-fu to the processes below.

First, sorting a bunch of version number strings. The problem is that many version strings reported by various packages are just that: strings. Usually the only way to get version information is to ask for it (“[prog] –version” in a shell script, or “SELECT version();” from a database, or whatever) and interpret whatever gets sent to stdout. Those strings don’t mean the same thing to comparison operators that they mean to us so long as they remain strings. For example, the string ‘3.5.10’ is greater alphabetically than ‘3.15.1’ but is a higher version. So we need to convert them to tuples of integers to make comparison of them natural (again, all these examples assume integer-only version strings, but minor changes to the process can allow you to compare anything — and assigning a custom collation order can allow you to sort against any arbitrary order of arbitrary symbols, but that’s beyond the scope of the basic nature of the problem I’m addressing here):

>>> vers
['3.5.10', '3.15.1', '2.5.7', '0.20.0', '2.12.5', '10.4.3']
>>> s_vers = [tuple([int(x) for x in n.split('.')]) for n in vers]
>>> s_vers
[(3, 5, 10), (3, 15, 1), (2, 5, 7), (0, 20, 0), (2, 12, 5), (10, 4, 3)]
>>> vers[0] > vers[1]
>>> s_vers[0] > s_vers[1]
>>> cmp(vers[0], vers[1])
>>> cmp(s_vers[0], s_vers[1])
>>> s_vers.sort()
>>> s_vers
[(0, 20, 0), (2, 5, 7), (2, 12, 5), (3, 5, 10), (3, 15, 1), (10, 4, 3)]

The list comprehension (actually, two nested list comprehensions) assignment to s_vers is the important part of this. Once that is done you can compare whatever you want. If the version number is buried as an element in a dict or larger list (likely) you can do this conversion in place by adding a new element to the contained structures and then sort the greater list based on that element:

>>> packages
[{'version': '3.5.10', 'name': 'foo'}, {'version': '3.15.1', 'name': 'foo'}, {'version': '2.5.7', 'name': 'foo'}, {'version': '0.20.0', 'name': 'foo'}, {'version': '2.12.5', 'name': 'foo'}, {'version': '10.4.3', 'name': 'foo'}]

OK, that’s pretty ugly (uglier depending on how your browser renders <pre> type text), so I’ll print them in order so we can watch the list change more easily.

>>> for p in packages:
...   print p
{'version': '3.5.10', 'name': 'foo'}
{'version': '3.15.1', 'name': 'foo'}
{'version': '2.5.7', 'name': 'foo'}
{'version': '0.20.0', 'name': 'foo'}
{'version': '2.12.5', 'name': 'foo'}
{'version': '10.4.3', 'name': 'foo'}
>>> for p in packages:
...   p.update({'version_tuple': tuple([int(x) for x in p['version'].split('.')])})
>>> for p in packages:
...   print p
{'version_tuple': (3, 5, 10), 'version': '3.5.10', 'name': 'foo'}
{'version_tuple': (3, 15, 1), 'version': '3.15.1', 'name': 'foo'}
{'version_tuple': (2, 5, 7), 'version': '2.5.7', 'name': 'foo'}
{'version_tuple': (0, 20, 0), 'version': '0.20.0', 'name': 'foo'}
{'version_tuple': (2, 12, 5), 'version': '2.12.5', 'name': 'foo'}
{'version_tuple': (10, 4, 3), 'version': '10.4.3', 'name': 'foo'}
>>> packages.sort(key = lambda x:x['version_tuple'])
>>> for p in packages:
...   print p
{'version_tuple': (0, 20, 0), 'version': '0.20.0', 'name': 'foo'}
{'version_tuple': (2, 5, 7), 'version': '2.5.7', 'name': 'foo'}
{'version_tuple': (2, 12, 5), 'version': '2.12.5', 'name': 'foo'}
{'version_tuple': (3, 5, 10), 'version': '3.5.10', 'name': 'foo'}
{'version_tuple': (3, 15, 1), 'version': '3.15.1', 'name': 'foo'}
{'version_tuple': (10, 4, 3), 'version': '10.4.3', 'name': 'foo'}

We started out with a list of dictionaries, each containing a package name and a version string. The first loop updates each dictionary to include a version tuple, and the next orders the dictionaries within the list by the tuple values. Viola! We have a list of dictionaries sorted by version number. Of course, if there are more than one package name involved you will want to sort on the package name first, then the version tuple as a secondary criteria (so you don’t compare versions of package ‘foo’ against versions of package ‘bar’, or sort glibc against firefox, for example).

If lambdas are unfamiliar to you, don’t be scared off by the package.sort() line up there — lambdas are perfectly safe, reliable and quite concise once you understand the way they are used.

From here writing a sort function for lists of version strings should be pretty obvious. And… that means that writing a comparison function for two individual elements that works the same way the built-in cmp() function works is trivial:

>>> def ver_tuple(z):
...   return tuple([int(x) for x in z.split('.') if x.isdigit()])
>>> def ver_cmp(a, b):
...   return cmp(ver_tuple(a), ver_tuple(b))
>>> vers
['3.5.10', '3.15.1', '2.5.7', '0.20.0', '2.12.5', '10.4.3']
>>> ver_cmp(vers[0], vers[1])
>>> ver_cmp(vers[0], vers[0])
>>> ver_cmp(vers[3], vers[4])

Nice and easy.

Now I can’t figure out why comparison functions I’ve seen floating around occupy so much space and are hard to follow — full of class declarations and exec loops within exec loops (!!!) and other nonsense. At the most you will need to add some regular expression matching to extract/split on the correct substrings from the version string. That means you would have to import the re module and the list comprehension will grow by a few (maybe 10) characters.

Most common Bash date commands for timestamping

Filed under: Computing — Tags: , , , , — zxq9 @ 13:46

From time to time I get asked how to use the date command to generate a timestamp. Here is an idiot-friendly script you can post for reference in your team’s bin/ if you get interrupted about timestamp questions or have an aversion to typing phrases like “man date” (with or without a space).

All but the first one and last three produce filename-friendly strings.

A big thanks to the following folks for pointing out mistakes and suggesting useful format inclusions:

  • 2013-05: Rich for the reminder to include UTC and timezoned stamps.
  • 2017-10: “Hamilton and Meg” (haha!) for pointing out I had my 4 year example formats messed up and for prodding me to include a 2-year example.
  • 2018-01: Autumn Gray for suggesting that I add examples of including short and long days of the week.
  • 2020-09: rade noticed that I had a “Zulu” notation for UTC listed as ISO8601, which it is not, and provided a correction. I’ve specified the Z-notation UTC timestamp separately from the (now corrected) ISO8601 timestamp. Thanks, rade!
#! /bin/bash

# An overly obvious reference for most commonly requested bash timestamps
# Now all you Mac fags can stop pestering me.

cat << EOD
        Format/result           |       Command              |          Output
YYYY-MM-DD_hh:mm:ss             | date +%F_%T                | $(date +%F_%T)
YYYYMMDD_hhmmss                 | date +%Y%m%d_%H%M%S        | $(date +%Y%m%d_%H%M%S)
YYYYMMDD_hhmmss (UTC version)   | date --utc +%Y%m%d_%H%M%SZ | $(date --utc +%Y%m%d_%H%M%SZ)
YYYYMMDD_hhmmss (with local TZ) | date +%Y%m%d_%H%M%S%Z      | $(date +%Y%m%d_%H%M%S%Z)
YYYYMMSShhmmss                  | date +%Y%m%d%H%M%S         | $(date +%Y%m%d%H%M%S)
YYYYMMSShhmmssnnnnnnnnn         | date +%Y%m%d%H%M%S%N       | $(date +%Y%m%d%H%M%S%N)
YYMMDD_hhmmss                   | date +%y%m%d_%H%M%S        | $(date +%y%m%d_%H%M%S)
Seconds since UNIX epoch:       | date +%s                   | $(date +%s)
Nanoseconds only:               | date +%N                   | $(date +%N)
Nanoseconds since UNIX epoch:   | date +%s%N                 | $(date +%s%N)
Z-notation UTC timestamp        | date --utc +%FT%TZ         | $(date --utc +%FT%TZ)
Z-notation UTC timestamp + ms   | date --utc +%FT%T.%3NZ     | $(date --utc +%FT%T.%3NZ)
ISO8601 UTC timestamp           | date --utc +%FT%T%Z        | $(date --utc +%FT%T%Z)
ISO8601 UTC timestamp + ms      | date --utc +%FT%T.%3N%Z    | $(date --utc +%FT%T.%3N%Z)
ISO8601 Local TZ timestamp      | date +%FT%T%Z              | $(date +%FT%T%Z)
YYYY-MM-DD (Short day)          | date +%F\(%a\)             | $(date +%F\(%a\))
YYYY-MM-DD (Long day)           | date +%F\(%A\)             | $(date +%F\(%A\))

If executed, it will produce the (obvious) output:

        Format/result           |       Command              |          Output
YYYY-MM-DD_hh:mm:ss             | date +%F_%T                | 2020-09-09_11:17:10
YYYYMMDD_hhmmss                 | date +%Y%m%d_%H%M%S        | 20200909_111710
YYYYMMDD_hhmmss (UTC version)   | date --utc +%Y%m%d_%H%M%SZ | 20200909_021710Z
YYYYMMDD_hhmmss (with local TZ) | date +%Y%m%d_%H%M%S%Z      | 20200909_111710JST
YYYYMMSShhmmss                  | date +%Y%m%d%H%M%S         | 20200909111710
YYYYMMSShhmmssnnnnnnnnn         | date +%Y%m%d%H%M%S%N       | 20200909111710866549239
YYMMDD_hhmmss                   | date +%y%m%d_%H%M%S        | 200909_111710
Seconds since UNIX epoch:       | date +%s                   | 1599617830
Nanoseconds only:               | date +%N                   | 873291281
Nanoseconds since UNIX epoch:   | date +%s%N                 | 1599617830874893393
Z-notation UTC timestamp        | date --utc +%FT%TZ         | 2020-09-09T02:17:10Z
Z-notation UTC timestamp + ms   | date --utc +%FT%T.%3NZ     | 2020-09-09T02:17:10.878Z
ISO8601 UTC timestamp           | date --utc +%FT%T%Z        | 2020-09-09T02:17:10UTC
ISO8601 UTC timestamp + ms      | date --utc +%FT%T.%3N%Z    | 2020-09-09T02:17:10.881UTC
ISO8601 Local TZ timestamp      | date +%FT%T%Z              | 2020-09-09T11:17:10JST
YYYY-MM-DD (Short day)          | date +%F\(%a\)             | 2020-09-09(水)
YYYY-MM-DD (Long day)           | date +%F\(%A\)             | 2020-09-09(水曜日)

Note that the last two, short and long day-of-week are dependent on the environment variable LANG. After setting LANG=en_US we wind up with the following:

YYYY-MM-DD (Short day)          | date +%F\(%a\)             | 2020-09-09(Wed)
YYYY-MM-DD (Long day)           | date +%F\(%A\)             | 2020-09-09(Wednesday)

Powered by WordPress