The Intellectual Wilderness There is nothing more useless than doing efficiently that which should not be done at all.

2015.11.30 11:17

Almost Always Stupid: Parallel Quicksort

So you’ve been thinking about applying your awesome parallel programming skills to some problem that is just complex enough to impress the noobs, but just trivial enough to code in a few lines? Cool. Knock yourself out.

But stop trying to parallelize quicksort.

Posting articles, tutorials, SO questions, and blog posts with the term “parallel quicksort” in the title only serves to encourage excitable, well-meaning, naive people to copypasta whatever code they find there and screw up what would have otherwise been straightforward, easy-to-read, relatively performant beginner-style code.

A naive quicksort implementation is pretty darn fast in the common cases found in real programs. There are edge cases where an intelligently chosen pivot can be a significant optimization — but the act of choosing the pivot intelligently is not free, which means before committing to this you have to know a lot about the context in which that particular sort will be used so you can be confident that it won’t be an overall pessimization.

A naive parallel quicksort implementation can only ever be slower than a naive serialized quicksort. Why? Think through the serial algorithm: each branch depends on the results of the subsequent branches. When we parallelize that the same dependencies exist but you’ve added the overhead of spawning and messaging to the rollup of values at the end. Even in Erlang where spawning and messaging (on a single node) are amazingly lighter weight and faster than in other environments, spawning and messaging are still more overhead in terms of time and memory than simple comparison. So a naive parallelized quicksort, while perhaps impressive because you get to show that you’re spawning and messaging everywhere, is never going to be as fast as a normal naive quicksort.

What about an intelligent parallel quicksort? Ah. Now here we have something potentially interesting. This can indeed be a significant performance win, even bigger than the case of an intelligently implemented quicksort in a guaranteed case of lopsided values where you know where the optimal pivot is located (consider for a moment how utterly tailored to the situation the choice must be, and how utterly familiar with things about that specific use case that are not quicksort one must be to know that any of this is true).

Let’s assume everything is going your way and you can already have a lot of information about the list you need to sort — enough that the overhead of finding smart pivots is outweighed by the overall running time of the sort — it can indeed be possible to write a much faster parallel sort. But not by forking more workers than you have schedulers.

The naive parallel quicksort forks/spawns every branch. That’s insane. A smart parallel quicksort will only fork until it has occupied the available scheduling resources of the system (cores, threads, schedulers, whatever the environment has). Why? Because forking any further will only incur scheduling overhead that amounts to busywork the platform has to do just to keep up with multiprocess accounting. That means an intelligent parallel quicksort will have to know not only how to pick an optimum pivot, it will also have to know how to stop forking once it has maxed out the scheduling resources of the system, and it will have to pull one more not-so-simple trick: it will have to know enough about the data being sorted to continue picking smart pivots until it is finished forking (to be faster than a simply paritioned two-threaded implementation).

That is a significant amount of arcane information needed about the input, and a not entirely obvious pile of code needed to both continue to pick pivots, communicate among threads and stop forking once the optimum number of threads for the environment has been reached. I know how to find out how many schedulers are available in an Erlang node, and I know how to find out how many cores are available on a Linux or BSD system in C. I don’t know how to find this out in Go or Java. I’m not really sure how to do it in Python, either, other than just making a subprocess call — but oops, see, once I do that the code isn’t really Python, it is “Python that works on platform X” (and not to mention, most sorts will complete in less time than a call via subprocess will… defeating the point). That’s a programmer pessimization. Spending a huge amount of time to write code that by definition probably can’t be made generic and reused later is not a very intelligent use of time (neither is writing this blog post, for that matter).

What else happens, though, when we parallelize quicksort? When we parallelize anything? Nothing is free, so there must be a cost. What are some cases where I want to sort things, and of those what cases are good candidates for parallel quicksort? This list isn’t exhaustive (sorting is pretty common) but thinking through the tradeoffs in just a few common situations may be illustrative. Remember: optimizations are usually tradeoffs. Once a routine has been optimized for the general case the remaining options are about shifting burdens away from dear resources and onto cheap ones (or, in the happiest case, not doing anything).

  1. Sorting a list before passing it to a function that requires a sorted list.
  2. Before an efficient update/merge of two lists (a special case of the above).
  3. Ordering client-side data for presentation.
  4. Organizing log data before flush (can be thought of  as a variation of the above).
  5. Building indexes.
  6. Organizing variable length search data (usually text search; a variant of the above).

In any of the cases above is it likely that I might have a set of data that is so huge that I might need to optimize the sort operation? Maybe in cases #1, #2, #4, #5, and very often in #6. So that’s entirely possible.

In any cases above is it likely that I will know enough to be able to build a customized parallel quicksort that is smart enough to pick smart pivots? Hrm… that is a lot trickier. Assuming you already might know something extra about the nature of the input it is maybe possible that this could be the case in #4, or #5 — but that is not the general case. You’d have to know an awful lot to get this right. The most likely candidate here is actually just #6.

Of the remaining cases (and for the vast majority of actual code, that is really just going to be #6), are there any situations where I want to slow down the whole system just to make a single sort go faster? Is the system doing anything else that might be important? Will this degrade overall performance when run in a cron job? Is perhaps the whole point of having lots of cores that maybe we don’t want the entire system to be affected by a single long-running operation? Consider a database that needs to build a text search index and pass it somewhere else based on whatever has changed in the last 12 hours (or an adsearch index builder that needs to sort billions of items of campaign change data). Is it acceptable for the whole system to experience a performance degradation? I don’t know — only you can answer that question. It might be a big win, it might not. I’ve only encountered a single case* where this was true in actual production code.

Once. Ever.

Now go do a search for “parallel quicksort” and skim over the results with new eyes. You almost never need this. Most articles on parallel quicksort are exactly as useful as the articles on parallel Fibonacci functions (or any Fibonacci functions, really). Sure, some of the code examples are fun to read and muse about, but they don’t get you any closer to solving whatever your actual user problems are.

[* Actually, it was adsearch campaign data, which is why I mentioned it.]

2013.01.4 09:58

Freenode Year-End Weather Review and 2013 Forecast

Filed under: Computing — Tags: , , — zxq9 @ 09:58

##c, ##c++, ##java, ##javascript and almost all other channels named after an Algol-descended language remained strong in n00b angst and help-vampire congestion (strong counter example is #bash, see below), rendering them useless for anything other than observing flame wars amongst programming newbies arguing over terms they’ve only just discovered on Wikipedia. Expect no change in temperature or inclination for 2013, and as always prepare for flurries of students hoping to get their homework done for them throughout August, October, December, March and May.

#bash took first place for overall 24-hour activity within its stated zone in 2012 — quite an achievement. This was enabled by nothing more than the militant purism of its main participants who happen to actually know (most of) what they are talking about. The intensity of discussion in #bash is likely in increase over 2013 as realization dawns on more new *nix admins and even OS X users that their systems represent a complete programming environment. A corresponding increase in the volume of beginner reference links in-channel is likely — with an associated increase in RTFM calls directed at those who don’t read links or delivered by the less patient/coddling of the regulars.

#fedora, #ubuntu, #centos, and other distribution-named channels fell into two categories in 2012:

  1. overrun with help-vampires asking the same 3 new-release migration questions
  2. overwhelmed with utter silence

The channels #ubuntu and #centos took the first-place poo cake for overall deafening off-topic, RTFM-worthy and amateur architectural astronautic clamor while #archlinux, #gentoo and #fedora managed to achieve a much better signal-to-noise ratio, mostly due to a greater percentage of knowledgeable participants. Expect very little change in 2013 with the exception of #ubuntu and #fedora. The former may grow even worse as the population of those who don’t know any better flock to Ubuntu as Steam picks up, er, steam and the latter may grow gradually quieter as new changes implemented in Fedora 18 cause a probable nose-dive in that distribution’s popularity across the year.

#django was one of the strongest on-topic, 24-hour hour activity channels focused around getting actual work done, with the vast majority of interaction involving at least marginally researched questions and a great deal more courtesy than usual this millennium. This indicates that the Django project has likely reached its Goldilocks point as a project where it is just enough below the radar that the “new thing” from 1~2 years past is still soaking up the n00bs, b00bs and help-vampires (in this case, #RubyOnRails) and enough srsly gentlemen have noticed it to make it a usefully mainstream place to work. If no unexpected storms of blockbuster “Lern da Web wit Djago in 10 Dais!!1!” tutorials or books occur across YouTube and bookstores expect #django to experience only a slight increase in temperature and no bumpkin brain blizzards or humility hurricanes. The status of Django on Python3 is the most likely leading indicator of trouble here (see below).

#django-dev was boring and dead for the most part, aside from the occasional thin mist of packager discussion and “why doesn’t the TLS setting for mail mean real TLS on the correct port?” talk (nonsense!). Some rumblings of the impending Python3 reckoning could be heard, but were still far enough in the distance as to avoid a full-blown #fedora style storm in 2012. Expect this to change in 2013, as Python3 will finally give Django devs enough to talk about to wake kick them off of the ML and into IRC activity. The action is likely to be a bit below storm-strength due to the project’s (general) adherence to its own release guidelines, but may from time to time bear watching.

#RubyOnRails and related channels were clogged with help-vampires and n00bs in similar fashion to the Algol-language and distro channels. This has remained fairly steady since 2009 or so, with the effect being bolstered by the presence of all those people who gave up on mobile programming just before they might have actually figured out how native applications work. Save a major drive to some other fascinating technical mistake (“Web 3.0?” “cloud vX”?) that goes viral, the Rails community will likely continue to experience idiot floods and hails of stupidity through 2013. For the serious who are in need of actual relevant discussion, forums, IRL meetings with Real People You Know and project-specific channels for projects that happen to be built around Rails will be the only places to find it.

#guile managed a slight edge over both #lisp and #scheme last year in Occasional Wizardry, but the overall volume of discussion was far lower than either #lisp or #scheme — giving #guile the best signal-to-noise ratio anywhere but also rendering it an incredibly boring place to hang out on an average day (as in, #guile remains a statistical outlier, though an interesting one). It is uncertain whether the effects of a new project, a new major version or a new implementation of Guile, Scheme or Common Lisp will have any effect or even be noticed by anyone, anywhere, so a prediction for 2013 is beyond me. I have a sneaking suspicion that someone might eventually catch on that guile2 includes a webserver ready-made for scripting in a functional language (among other features), but the population of paren-loving teens is so low at the moment and the current infatuation with the Web and the Java religion of Absolutely Everything Must Be An Object (Amen) still so strong with the sort of computer science faculty that thinks that every student should get a gold sticker for showing up that it is hard to see if anything short of a viral breakout video complete with tits, violence and gore would be noticed.

#haskell took first prize in 2012 for overall, unadulterated, near-constant uber geekness and Deep Black Magic. Three factors influenced this strongly: the near exclusive population of serious math nerds who like to flaunt their grokness, the tendency of such people to never admit they don’t grok a mind-melting snippet in channel and instead boil in silence until something makes sense to them, and the tendency for newcomers to either struggle unflaggingly until they earn their place among the immortals or simply give up and never, ever venture into #haskell again. In this, the uniqueness of Haskell as a language serves a positive filtration role in the community much the way that the old “be smart or go home” sort of freshman math classes did back when it was OK to admit that computer science wasn’t for everybody. Expect very little change to this trend in 2013, though by the end of the year commercial projects using Haskell may be revealed as actually using Haskell, and this may drive a slight, temporary increase in interest.

#erlang was a bit like #haskell, but more average in every aspect: less magic, more noise, fewer quitters, more eternal (but not really annoying) n00bs. This is mostly due to the revelation among high schoolers and college language hipsters that Facebook uses Erlang for a smattering of projects that can’t afford downtime and how Erlang can cope with such requirements in a novel way. Other functional language channels generally fell into the pattern of the lisps and Haskell and Erlang, but these last two deserved particular mention. In 2013 Erlang stands a very small chance of sucking brains away from other interesting languages such as Lua and anything matching .*ML.*. In that case expect Erlang to eventually grow more like #bash in nature over 2013, with a particular threshold being crossed if #erlang itself becomes a bothersome place to hang out due to an excess of help-vampires and alternative Erlang-based project channels becoming the alternative arteries of community brilliance. Saving such a spontaneous increase in notoriety, however, #erlang is likely to follow or return to the majority patterns of 2012.

This has been the Freenode Year-End Weather Review and 2013 Forecast. All other networks either suck or were set up with specific crowds in mind (such as botnets).

Powered by WordPress