The Intellectual Wilderness There is nothing more useless than doing efficiently that which should not be done at all.

2020.11.13 12:37

Comments on Dr. Shiva’s Election System Analysis

Filed under: Computing,Politics / Geopolitics,Society — Tags: , , , , — zxq9 @ 12:37

UPDATE 2020-11-20: Scroll to the bottom for a follow-up.


Dr. Shiva performed a data analysis on the automated voting system results in the 2020 U.S. election and made a video presentation of it just a few days ago. I was asked to give my thoughts on it yesterday, so I watched it this morning and wrote some comments as I went along. The video link is below and my timestamped comments follow.

Votes are stored as “decimal fractions”? This would mean they are stored not as integer values, but as floating point values (unless there is a fixed-point library involved, but that seems extremely unlikely). This is insane. A vote can’t increment the tally by a fraction unless we’re trying to go back to the 3/5 rule or some other nonsense. Right off the bat this is pretty ridiculous.

“Provided SYSTEM cannot change the OUTPUT”.
A bit of elaboration… Of course the system changes the output (it generates the output), but given that the input has two parts, the ballot and the current count, vote(Ballot, Count) -> NewCount. the output value NewCount must be reproducible every time in a consistent way given the same initial inputs. Because we have an iterative counting system, this means the counts should be replayable transactions. The system must be able to fast-forward or rewind to any point in the sequence of transactions. As vote counting is distributed, this also means that each instance of a vote counter must record the sequence in which it counted its votes so that the aggregated total can be broken down and each part replayed, and that the aggregator also log and be capable of replaying each instance of aggregation of a new total update. His statement is correct, but it is useful to break down why it is both correct and also somewhat of an oversimplification.

Ballots are converted to ballot images (a digital representation of the ballot generated by the scanning process). The ballot images are being saved in some places and destroyed in others. This could be OK if the physical ballots are available for replay/recount and there was never any need to conduct a forensic analysis of the operation of the system, but this is not the case and there appears to be no standard for what to do about them. Deleting the ballot images prevents any possibility of comparing the counted ballot images and the original ballots, and this prevents a full inspection the system.

“Weighted votes?” There must be some explanation for this. Weighted voting is insane. It is supposed to be 1 vote 1 increment. What would be a legitimate explanation for having a weight system? This may be the reason why floats (of all insanity) are being used to store the vote tally.

The system DB is written in Access? Like MS Access? Is this accurate? Holy shit… No way that can be accurate. And yes, the numbers are indeed stored as floats (a double, specifically)



WAAAAT? This would mean that higher % Republican straight party vote correlates to lower individual candidate vote support. That also means he gets much higher individual vote support in Democrat straight party vote districts. This is extraordinarily unlikely given that he has 98% support across Republicans.

The phrase “taken from Trump and given to Biden”. I’m not seeing the basis for this yet. It is possible, however extremely unlikely, that voters actually voted this way. So I don’t agree with the language used in the discussion as a first-time watcher who does not know what else is coming up later in the talk, but the graph is clearly too structured to be true without some sound explanation of mechanism (and to be fair, the explanation that voters are more likely to dislike Trump the more likely they are to like Republicans is almost impossible to believe, but I’m keeping an open mind for the moment…).

Ah. They mention the same thing: It is “too structured” and “too perfect”. Because it is.

The idea that Trump did overwhelmingly better in Democrat districts than Republican districts is pretty hard to accept. The differences shown here are big enough for there to have some general revolt in the Republican base, which would have been very noticeable. This election Trump won the support of a large percentage of the original “Never Trumper” faction vote, a lot of the support from which manifested after the riots started. These vote tallies are grading on a curve really hard.

“You could even make the argument that even if you want to believe that Republicans hated Trump so so much, the larger that the population size was, they still wouldn’t be able to hate him in such a straight line.”

Wow. That looks a lot more like natural data. Though there is one really straight line hidden (downward) in there.

OK. Here is where he postulates the existence of an algorithm. He’s hinted about it up to this point, but this is where he comes out and says that it appears there is an algorithm that is applied to districts with a high percentage of Republican votes. That does appear to be the case from what is presented so far, though I would like to see some comparisons with districts of various alignment that were counted by hand and did not use these machines. The difference should be painfully obvious (in the same way the chart at 38:10 was striking).

The “Weighted Race” is a feature of the system that is documented? WTF? That’s insane. And yes, this looks very much like a weighted redistribution. The claim here is that all of the major vote counting system vendors have implemented this feature. Was that in the original contract? What was the motivation for its inclusion? Who came up with this nonsense? There must be some explanation for it to exist at all.

This is the core question: “Is it possible that this voting pattern is normal?” If it is then we should see it recur the same way with hand-counted votes. The slope, though, really does not make sense from the explanation that “Trump pissed off some % of Republicans” — the line would be flat, not a slope, and definitely not such a steep slope.

This is a pretty crazy story. Bad stuff.

YES. The idea of an auditable system is very important. That gets back to my original comments at the top about being able to replay each vote as an individual transaction, and each aggregate rollup event as an individual transaction, in addition to being able to independently verify that the ballot images match the paper ballots and that an alternative method of counting matches the results given by the automatic system over any random sample of ballots (hand counting results should match automated counts among random samples).

“Put a bunch of CPAs on this and they’ll tell you what the problems are.” Yes.


I was discussing with a mathematician about the best way to verify the above claims independently. His suggestion was that making an attempt to debunk the claims would provide the fastest path to verification: if we couldn’t debunk them then the claims are probably strong.

To debunk the claims (or alternately, verify them) would require performing an analysis on data from all districts in all counties in the state so we could compare histrionic data from each for anomalies. The reason a histrionic analysis is the most interesting has to do with the way a “weighted” vote algorithm would look over time compared to a natural count. To me the slopes that Dr. Shiva shows are less compelling (though interesting) than the histrionic plots that inspired him to perform an anlysis at all.

These two screenshots are interesting:

I don’t like that the second one is named only “Other Counties”. I have seen similar curves in other data, and it is indeed how weighted systems tend to plot unless the ballots are sorted before being counted. That is to say, a time sequence plot is interesting, a sorted count plot is not. There is not enough information here to know which one we are looking at though this is extremely likely to be a time series. We would need extremely complete data to evaluate this, though.

As for the slope that the next 40 minutes or so of Dr. Shiva’s video focuses on, that data is indeed quite interesting, but the average of the offset turns out to be more compelling than the fact that there is a slope at all.

In fact, another math channel on YouTube, “Stand-up Maths”, has performed a comparative analysis on the Biden data and found the same slope though its magnitude is different (and he never commented on that, nor did he comment on the time plot, which I thought was particularly weird to leave out since it is the original information that prompted the analysis in the first place).

It is an interesting video (though I don’t particularly care for the cheap ad hominem shots occasionally taken) and that channel has a ton of great math videos that have nothing whatsoever to do with politics (yay!) so go watch it at the link above if you are interested.

Unfortunately collecting data with the granularity necessary to perform the kind of analysis we want turns out to be possible, but quite time consuming. At the state level only aggregates are available, and those can’t tell us very much. Each county has the kind of data necessary to recreate the scatter plots (which turn out to be a bit less interesting than Dr. Shiva’s video make them out to be), but to the graphs of how the count tallies evolved over time is held by each district (and probably each county, but not available on the websites for each). The data is available publicly, but it is just very hard to get a hold of due to the sheer number of districts and counties in the state.


The details of the way the voting machine system is designed and implemented are clearly suspect. Verifying these details should be a priority, as should switching to an auditable and publicly visible system (hopefully something open source that lives on a public repo). The basic principles of its design and requirements seem antithetical to the way voting is supposed to work in the U.S.

As for the data analysis, there may be something weird in the data itself, but verifying it takes a lot of time just due to the level of detail required in the data (aggregates are useless to investigate the issues we really want to see), and whatever weirdness may exist doesn’t seem to have much to do with the bulk of both videos: the slopes shown.

Powered by WordPress