Dr. Shiva Ayyadurai & The Danger Of Data Charlatans
NOTE: On Nov. 16th, Ayyadurai doubled down on his misleading analyses.
On November 10th, Dr. Shiva Ayyadurai posted a video claiming that some simple analytics revealed election fraud in Michigan. It received more than 200,000 views, and claims that Joe Biden stole more than 60,000 votes in Michigan.
The main thrust of his analysis is a mathematical parlor trick. In a separate post, I play that parlor trick myself with Oakland County data to “prove” the opposite conclusion — showing that his analysis is bogus at its core.
I rarely pick on people for making analytical mistakes, because it happens to everyone and every instance is a lesson to learn from. But in this case it seems less like a mistake and more like an intentional deception — and you’ll see that the audacity of it is astounding. The man’s trying to fool us into believing in election fraud by taking advantage of how lines work. Bold move. Let’s see how it goes.
The dataset Ayyadurai works with contains:
- Precinct-level % of Republican voters among straight-ticket voters.
- Precinct-level % of Trump votes among split-ticket voters. These are what he calls “Individual Candidate Voters”, people who did not select a party’s “straight-ticket” option on their ballot.
The main quantity Ayyadurai is concerned with is: the % of Trump votes among split-ticket voters MINUS the % of Republicans among straight-ticket votes in a precinct.
He shows that the higher the percentage of Republican voters in a precinct, the more negative this difference is. He speciously says that this is an indicator of how much more popular Trump is among split-ticket voters than Republican candidates, and that it’s strange for it to be more negative in more Republican precincts.
According to the good doctor, the pattern you’d expect if there was just normal negative sentiment towards Trump is this flat line:
To Dr. Shiva, this discrepancy is evidence of algorithmic foul play. To him, the negatively sloped line we see is a clear sign that the state of Michigan used the “weighted race” voting feature of vote-tallying machines to steal votes from Trump and grant them to Biden.
The thing is, that you’d always expect the negatively sloping line that he describes as “suspicious”, by definition. Think about what he’s plotting. On the X-axis is the % of straight-ticket Republican voters. On the Y-axis is the % of split-ticket Trump voters MINUS the % of straight-ticket Republican voters.
If we were to draw the equation for that line, it’d be:
If you remember your formulas for lines, you’ll know that this is by construction biasing us to get a line that will slope down 1:1, at 45-degrees. Regardless of the actual split-ticket vote data and how it’s distributed, the quantity Ayyadurai’s plotting is rigged to look like a negatively sloping line from the outset.
A thought exercise: assume that nobody on a split-ticket ever votes for the major two parties, and they all go third-party like true independents. What would you expect from Ayyadurai’s curve? It would just go down and to the right, by design.
And if a dude is artificially constructing a negatively sloped line in front of your eyes and telling you that this “beautiful, too-perfect line!” is evidence of election fraud, you should run the other way. Because he’s trying to take advantage of your good intentions.
Again, here’s the slide where he describes what he’s plotting. It is, from the beginning, set up to create a line of negative slope, since its Y-axis includes a negated term from the X-axis.
If you’re more comfortable with statistical simulations of split-ticket voter data than with this algebra, then let’s dig into some code and some details, as well as answers to some common challenges.
If you don’t like theory/simulations at all and instead want to look at real data, I do that here.
How Ayyadurai Is Misleading You, In Detail
We’re gonna keep this as simple as possible. Let’s say that each precinct has two main populations of people: people who vote with a straight-ticket ballot and people who do a split-ticket vote. Ayyadurai himself frames these as two distinct populations, so I’m rolling with that premise here.
We can focus on percentages of Republican voters among straight-ticket ballots, and percentages of Trump votes among split-ticket ballots (or “Individual Candidate” votes, as he describes). We’re dropping out other voter groups because Ayyadurai doesn’t address them in his analysis at all.
Now, some assumptions we’ll make before we start running some simulations:
- As Ayyadurai assumes: all Republicans who vote straight-ticket will vote for Trump.
- Split-ticket voters, on the other hand, just have some random, non-zero chance of voting for Trump. We will assume this is a flat probability for now (but I show how my conclusions hold for almost any distribution of split-ticket votes, flat or otherwise, later in the article).
We can now simulate the voting outcomes of several precincts. We can simulate the % of Republican straight-ticket voters with a simple uniform sampling.
We can sample what split ticket voters would do by doing a series of weighted coin-flips, using their fixed probability of voting for Trump.
And now we plot the difference between % straight-ticket Republican voters and % votes for Trump among split-ticket voters, just as Dr. Ayyadurai does:
Bam. It’s a line with a negative slope. Just like we actually observe in real life, in counties where we have a nice spread of precincts with different fractions of Republican voters. No evidence of fraud here — just a simple result of the values we decided to plot.
So, what’s going on?
Let’s look closely at what we’re doing.
We’re plotting the line where the Y variable is: (% split-ticket Trump votes MINUS % straight-ticket Republican votes). This is plotted against the X variable, the % of straight-ticket Republican votes.
We can decompose this into two lines:
One line we have is the % of split-ticket Trump votes on the Y axis, vs. the % of straight-ticket Republican votes on the X-axis. If split-ticket voters have a fixed probability of voting Trump that is independent of their precinct’s % of Republicans, this line should be flat.
On the other hand, the other line we have is the % of straight-ticket Republican votes vs. % straight-ticket Republican votes. Otherwise known as the line of unity: a perfectly correlated line, since we’re plotting the same thing on both axes.
To recover Ayyadurai’s original line of (% split-ticket Trump votes MINUS % straight-ticket Republican votes) vs. % straight-ticket Republican votes, we can subtract our second line from our first line. Remember that subtracting a line is the same as the negating a line and adding it. The line of unity, negated, is this:
You can see how it’s obvious that we’ll get a negatively sloping line after doing this. The operation we are performing will produce a downward sloping line by definition.
This dude must think that we were born yesterday. Wow. Again, the sheer audacity of this just kills me.
What if the probability of split-ticket votes for Trump isn’t constant?
Let’s present Ayyadurai’s argument in the strongest light. Here’s a common objection from respondents: why do we assume a flat probability of split-ticket vote percentages? What if the probability of split-ticket votes for Trump is linearly correlated with the % of straight-ticket Republican voters in a precinct?
Let’s indulge this. Let’s say that straight-ticket voters are the majority of a precinct, and so “Republicanism” or “Democraticness” leaks out into the split-ticket voting population and influences their decisions. In that case, we’re no longer subtracting a 1:1, 45 degree line from a constant, flat line.
However, we are subtracting the 1:1 line of unity from some other line.
In the simulation below, I assume that the linear correlation between the % of split-ticket Trump votes and the % of straight-ticket Republican votes has a slope of 0.9.
Here’s the component where I just plot % split-ticket Trump votes. This is no longer flat, because we’re saying there’s a correlation between split-ticket Trump votes and the “Republican-ness” of a precinct.
Plotting the line of unity is the same:
And when you subtract the second from the first, you again get a negatively sloping line!
The only way you’d get the flat line — which Ayyadurai asserts is “expected” — is if the slope of the correlation between split-ticket Trump votes and straight-ticket Republican votes is exactly 1:1. No more, and no less. In every other case, you’d get a line with some slope.
In my work with real data from Oakland County, MI — I show you that this slope is far from 1. It’s far lower than my simulated 0.9, too: it’s just 0.6. We’d always expect Ayyadurai’s “suspicious” plunge, in this case. Biden also experiences this plunge — though it’d have been totally acceptable to see an upward sloping line in either case, too.
Again: not fraud. Just lines behaving like lines.
Invite to Dr. Ayyadurai
If this is truly what Dr. Ayyadurai meant to do, then it’s essentially just a mathematical trick. A lot of build-up and sleight-of hand for an inevitable and mundane result. It’s kinda like if I asked you to think of a number, add 5, subtract 8, and tell me the result. You shouldn’t be impressed if I could guess your number: all I have to do is add 3 to your result and I’ll know what you started with. In the same vein, Dr. Ayyadurai is conjuring a “suspicious” negatively sloped line from any data that you would likely start with.
You could do the same thing with straight-ticket Democrat votes and split-ticket votes for Biden. In fact, I do so here. Is that evidence that Trump is stealing votes from Biden? Of course not. It is crazy to me that Ayyadurai can put on a straight face and pitch this to us as evidence for conspiracy-level election fraud — he must think very little of us.
I know people are very quick to trust something just because it’s got some numbers on it — especially if it agrees with them — but numbers don’t make an analysis bulletproof. Not inspecting even the quant-iest of presentations can leave you vulnerable to a new breed of snake-oil salesman. Case in point: Dr. Shiva Ayyadurai. U.S. Senate Candidate.
That goes double for me, too. Inspect my argument carefully. If you see trouble, send me a rebuttal right here. Dr. Ayyadurai, that goes for you especially. I’ve answered a few responses below:
Some good challenges from respondents so far:
What about Wayne county? It doesn’t display this negative linear trend.
Correct, I should have addressed this in the main body of the article. I think the issue with Wayne is that there isn’t an even spread of precincts with a wide variety of %-Republican voters. He also zooms in on the X-axis, so we’re only looking from 0–30%, where the value he’s plotting is likelier to be positive. I dig into that case in this Git Gist.
One thing I do want to point out is that Ayyadurai’s graph of Wayne county shows impossible data:
If you look, there are points at approximately (X: 4%, Y: -20%), or (X: 2%, Y: -10%), etc. Those data-points would imply that Trump got negative split-ticket votes at those precincts. e.g if -20 = SplitTrump - 4; SplitTrump must be -16.
If these data were real you could argue that there was cheating in Wayne county, because negative votes should be impossible! Yet this is one of the few counties where Ayyadurai says “no algorithm detected”.
My sides. At this point we’re basically just watching a stand-up comedy special.
How do you explain the fact that Ayyadurai’s regression has two line-segments? That doesn’t agree with your thesis that there should be one negatively sloping line.
Dr. Ayyadurai plots this trend-line over his data:
But we could just as easily draw this line:
The choices are arbitrary. The only way to rigorously tell which fit is better is to run the regressions on the data ourselves and see which model minimized the error more.
An important point: in model selection, the aim is always to follow Occam’s Razor and go with the most parsimonious model you can. If you just try and minimize error without controlling the complexity of your model, your “best” model will be a line wildly flitting from exact point to exact point. We call this overfitting.
Oftentimes you’ll compare competing models with something like the Akaike Information Criterion, which punishes models that are more complicated than they need to be (models that have more parameters). Without looking at the raw data itself, I’d say a single regression line would certainly be a simpler model fit than trying to place two linear regressions in a piecewise model like he draws.
If the kink in the curve is real, it could also be due to a non-linear correlation of % straight-ticket Republican voters and % split-ticket Trump voters. A detailed treatment of how the line can be curvy is offered by Dr. Lum in her video here.
EDITED NOTE: Thanks to Carlos Larriba Andaluz and Sam Nim for pointing out a previous analysis’ axes that didn’t match the video’s exactly. Now they do. Generally: a big thank you to everyone who engaged via the responses and helped me sharpen my argument and get closer to the truth.