Daily Archives: 2012.07.1 22:26

Sane Version Numbering

Version numbers should have definite meanings — in particular they should have meanings that provide some concrete semantics other than simple collative comparison. In open source this is usually the case, though there are certianly projects packed with wacky numbering, but that’s usually discouraged by the community strongly enough to not happen unless external and well-monied parties foist it on the project in the interest of “marketing”. There is an understood — and should be a natural and explicit — difference between major, minor and sub-minor version numbers, and what increases to each represents.

A sub-minor version number increase (ver. 3.4.x to 3.4.x++) represents improvements to the current state of the software. This means bug fixes, more complete documentation, translation improvements, code refactoring that does not impact the API in any way (bug fix, optimization, etc.) or other changes that do not interfere with interface expectations (human, machine or construct) or anything currently accepted as input or expected as output by the program.

A minor version number increase (ver. 3.x to 3.x++) represents new functionality and improvements that do change the API definitions, but are backwards compatible. A minor version increase might also include interface changes, so long as they don’t invalidate user expectations (if human interfaces) or invalidate what was considered valid input or expected output. So adding a menu item is OK, but moving menus around or changing the semantics of existing menu items is not.

A major version number increase (ver. X to X++) represents new functionality, fixes, etc. but does break backwards compatibility. Anyone moving to a new major version should expect to read the docs and check for changes deliberately or re-read the manual to find out what is different. If the software is a library then other programs which rely on that library cannot be expected to automatically work with the new major version because some or even all semantics or syntax might now be different. Any program’s output might be subject to change, so a program that always output “[username] [size] [time]” might now output “[time] [username] [size]” by default now, so other programs that rely on that output (like parsers or data importers) cannot rely on the output always being the same without checking whether it is first (it is always polite to include a “legacy output” switch if this is the case, and accordingly sound coding practice to explicitely declare output mode switches in programs whenever they are available to reduce breakage like this later on).

[NOTE: The following examples are in Python and refer to a library example, but that doesn’t matter in the slightest. This concept applies to all projects of all sorts in all languages, even (or especially) projects written in assembler.]

Consider for example, that we had a library API that had originally defined a function find_prime() this way:

def find_prime(max):

This was fine, but has some specific problems. For one thing it couldn’t do negative primes and would crash or return garbage values (a crash is probably better than getting garbage, actually). So in version X.Y.Z a lot of code broke or was tricky to write well because programmers would have to remember to pass it only values that were tested for being positive before calling the function to be safe, and that is both hard to remember to do every time on an infrequent function call and annoying to do anyway because if it happens every call then the positives-only check should really be a part of the function. So in version X.Y.Z++ the definition changes to provide an “IsNegative” exception to be nice to programmers and make the function more general. So now calling find_prime(100) works like it always did, but find_prime(-100) produces a much more human-friendly exception case instead of randomly terrorizing your program by letting it continue to function with garbage values scattered about within it.

The original function suffered from another problem, though. In version X.Y the function could accept a “max” argument, but couldn’t define a “min” argument, so programmers would have to generate primes up to “max” and then test for and cut off all the ones below the “min” value they really wanted. This is a general problem, and so it should be part of the function. So in version X.Y++ the definition was expanded to:

def find_prime(max, min=False):

This change permits the old call of find_prime(100) just fine, but also alows it to be called as find_prime(100, 30) or even better, the more natural find_prime(min=30, max=100). So this change actually adds a feature, not just a bug fix, and so makes sense to include in a minor version increase.

Eventually the time came around for a major version increase, so in version X+ nobody has to care about backwards compatibility anymore, so the project is clear to break everything and not look back. The old find_prime(100, 30) type call was really driving everyone nuts because positionally min should come before max, and having positional arguments seem out of order is annoying, and some people hate using keyword arguments, and religious wars break out over such things anyway. So its clear this needs to be changed. But this couldn’t have been changed in previous minor versions of the library because a lot of the old code written against that library relied on constructs like find_prime(100, 30) being correct and breaking all that code wouldn’t have made sense over such a petty issue. But in a major version increase we don’t care. So this time around the definition was fixed to:

def find_prime(min, max, *args, **kwargs):

This is a considerably expanded definition, and this function actually includes a lot more functionality with finding primes than the old one did, which is great. It is easy to understand for people who used the original version, and easy to port forward to (or even write a parser to automatically check for and forward port calls in most circumstances).

This toy example above may be simple, but it represents how version numbers and program changes should be in tune. There are some caveats, however. Usually in version 0.x of a program, or especially version 0.0.x of a program the “0” actually means “all bets are off”, meaning that the architecture and project semantics are still under development and should not be relied upon as stable to write code or scripts against. Essentially, a project’s release of version 1.0 should be an implicit promise of interface stability, whether that means the API, user or other construct interfaces — and whether that interface represents input or output. This is particularly important in database driven applications, as major-version increases are the only possible chance a project has to change the data schema around. This last point is something worth pondering.

Now folks might wonder “what if I want to make a trivial change in a forward minor version X.Y+ but aspects of that change make sense to packport into the original X.Y that is used by many”. Look at how the Python project itself handled this sort of situation for the answer. Non-destructive improvements in Python 3 have been backported to Python 2.6 and 2.7 as sub-minor version increases. Some changes only made it into Python 2.7, however, and those changes and a few others actually define the differences between 2.6 and 2.7, and this makes perfect sense since they represents backwards-compatible API changes.

There are no hard and fast rules, however. If your project has widely used and users and dependent projects nearly always keep pace with the latest release *and* the minor-version releases are far enough between that porting isn’t an issue *and* a change that breaks backwards compatibility doesn’t affect a major feature (and this is where the situation can fall under much debate) then a transitional set of minor releases can, over time, bridge the gap and eventually introduce changes that break backwards compatibility. But this should be the exception, not routine practice. The Django project has this to say about API stability and version numbers.

The whole point is that your version numbers and what level they reside within the version number string should have a distinct meaning to your users and downstream projects without you having to spell everything out every time. Sure you write documentation, but a developer should generally be safe to look at code written against version X.Y.Z and know that building against X.Y.X++ will still work out OK. Maybe not in every case, but this should be generally true. Users should also expect real changes between major versions, not just a few pixels moved around on the exact same program.

Anyway, even in open source you sometimes see project breaking this rule, and usually its a sign that things will go downhill later on. For example, Firefox took over 7 years to go from version 0.x to 3.2.x. Then they decided that didn’t look cool enough because Chrome goes through a “major” version every 3 months, so Firefox should as well. So in one year they went from version 3.2.x to version 8 (or 10, depending on how you’re accounting for their initial “rapid release” year). And now in that project major versions don’t mean shit and the code base is going to crap because of it. Chrome’s code base is full of security problems and major code cancer (its like Google hired a bunch of noobs, seriously… that project won’t be sustainable at this rate) as well, and I supposed we don’t even need to get started talking about Internet Explorer. In short, the browser market is full of nutbags and I suppose in the browser world its OK to just be fucking retarded.

Contrast this with, say, the way just about any closed source company either just makes up version numbers, or goes as far as inventing incompatibility with its former self just to drive/force sales (and I’ll go ahead and point a finger directly at MS Office on this), and you’ll start wondering just how confusing it must be to work within a closed source project during its sustainment cycle. As a side note on closed source, though, the game industry usually has surprisingly well thought out version numbers — but I suppose that game sequels vs current game releases is what insulates them from that, and anyway the differences between sequels is a lot more obvious to a gamer than the difference between two versions of the same business software product in many cases.

Don’t screw up your versioning. It buys your nothing more than the ire of the community.