Thursday, April 28, 2016

Analysis of Slate Voting for the 2016 Hugos

Update: September 22, 2016: I was able to use the additional data supplied at the end of MidAmeriCon II to compute a much more precise estimate, and that's reported in Slate Voting Analysis Using EPH Data: 2014-2016.

Overview

I estimate there were about 205 “Rabid Puppies” this year, essentially identical to the estimated 204 Sad+Rabid puppies last year.

The reason they did so well despite a doubling of the number of "organic" votes is that they managed much better slate discipline this year; last year, not everyone voted for all five candidates nor in every category, but this year it seems they did.

In making this estimate, I concluded that the slate actually swept the Best Novelette, Best Editor (Long Form), Best Fanwriter, and Best Artist categories, but the following either declined the nomination or were quietly disqualified for some reason:
  • Best Novelette: "Hyperspace Demons," Jonathan Moeller 
  • Best Editor (Long Form): Anne Sowards, Bryan Thomas Schmidt, and Mike Braff
  • Best Fanwriter: Zenopus
  • Best Fan Artist: Rgus
If EPH were in effect, it would have reduced the number of slate candidates, but not as much as some people seem to think. Only 5 categories would have had a single slate candidate, and 4 would have had three slate candidates--prior to withdrawals. In real life, the slate had 64 nominees; under EPH it would still have had at least 33.

Of course, like any estimates, these are only as good as their data, calculations, and assumptions. These results can be tested once we see the final stats in August and we can see how many people voted for Vox Day for Best Editor. And who withdrew. And what the actual results with EPH would have been.

Assumptions

  1. The Rabid Puppies had a disciplined slate, all of whom voted for all the items on the slate in all categories. I ignored the Sad Puppies entirely.
  2. “Organic” (i.e. “non-slate”) votes were distributed in the same “power-law” distribution they have historically followed.
  3. At least one Rabid Puppy nominee in each category was so bad that no one except a slate voter would vote for it. 

Quick Outline of the Method

There is a link below with more details about the math, but the quick summary is that I depended on the following:
  1. If we know the total number of organic votes in a category, the power law allows us to estimate how many organic votes there were at each rank.
  2. The best organic nominee squeezed out by the slate had to have fewer votes than the slate. 
  3. The worst organic nominee that made it despite the slate had to have more votes than the slate.
This lets us set up equations to solve for the minimum and maximum possible values for the number of slate voters for each category. The result looks like this:

Original With Declines Categories
Minimum Maximum Minimum Maximum
203 234 203 234 Best Novel
191 258 191 258 Best Novella
110 148 148 Best Novelette
149 149 Best Short Story
132 132 Best Related Work
187 187 Best Graphic Story
240 280 280 Best Dramatic Presentation, Long Form
174 215 174 215 Best Dramatic Presentation, Short Form
172 172 Best Editor, Short Form
84 99 171 Best Editor, Long Form
174 174 Best Professional Artist
172 218 172 218 Best Semiprozine
158 158 Best Fanzine
135 135 Best Fancast
80 111 111 Best Fanwriter
67 88 88 Best Fan Artist
158 208 158 208 John W. Campbell Award for Best New Writer
240 88 203 208 Overall Result

The columns labeled “original” are the results before making assumptions about people who were on the slate but declined their nominations.

Minima don’t tell you how small something is; they tell you it must be at least that big. So if you compute the maximum across all the minima (I know, that’s confusing), then it’ll give you the smallest size that’s big enough to satisfy every category. You do the reverse with the maxima to find the largest size that isn’t too large for any category.

Unfortunately, when we do that with the original numbers, we find there needed to be at least 240 slate voters but no more than 88, and of course that’s absurd. We need to find a way to explain away some of these numbers. The numbers highlighted in red are the problematic ones.

The minimum for Best Dramatic Presentation (Long Form) is much higher than the other numbers. Since that category seriously breaks the assumption that the worst-ranked slate nominee got zero organic votes, we discard that number.

For the maximum column, we know from Chaos Horizons that “there were indeed withdrawals in several categories.” Making the assumption that that’s what we’re seeing and recomputing their minima, we get the “With Declines” columns. That is, when we assume the slate really swept a category, the maximum value disappears, but the minimum value goes up.

Now we have an excellent result; the biggest minimum is smaller than the smallest maximum. Any number of slate voters between 203 and 208 would produce exactly the results we got, given the original assumptions, so 205 is a fine estimate. When the final numbers are available in August, we’d expect the number of votes for Vox Day as Best Editor to be fairly close to that.

What Would E Pluribus Hugo Do?

To determine the best-case impact of EPH, you take the number of slate votes divided by 5 and look to see if that's still bigger than any of the 1st-rank organic estimates. That's because EPH effectively charges a penalty for keeping the whole slate together. Since 205/5 is 41, that means that if there were any category where the estimated first-rank organic vote were under 41, then the slate would have swept the category even with EPH. But there is none. Even Best Fan Artist had 96.

So next look to see if the slate could have swept all but one slot in a category. For this, divide by 4 and look at the estimated votes in the #2 slots. Again, nothing is under 51.25, although Best Artist is close.

How about a case where the slate takes 3 and organic voters only get 2? Divide 205/3 to get 68.33 and this time a few categories fail. Best Related Work, Best Fancast, Best Fanwriter, and Best Fan Artist would have all had three slate nominees (before declines and withdrawals) this year, even with EPH.

For cases where slates get 2 and organic results get 3, divide 205/2 to get 102.5. Eight categories would end up like that: Best Novelette, Best Short Story, Best Graphic Story, Best Editor (Long Form), Best Professional Artist, Best Semi-Prozine, Best Fanzine, and the Campbell Award.

Slates get 1 and organic votes get 4 for Best Novel, Best Novella, Best Dramatic Presentation (Long Form), Best Dramatic Presentation (Short Form), and Best Editor (Short Form). (Actually, this year the slate only had one nomination for Best Editor [Short Form] anyway.)

There are no categories where the slate doesn't get at least one nominee. They would score 33 nominees, even though they accounted for only 5% of the voters.

Again, these are the best-case results for EPH. We're assuming a single slate, whose candidates are so bad that no one else votes for them, and that the organic candidates on the final list didn't overlap with each other (so EPH only penalized slates). In practice, the slate would likely have done better than this. EPH definitely would have made a bad situation better, but it's not the magic fix that many people seem to think it will be.

Comparison with 2015

In 2015 we know that there were 163 votes for Vox Day as Best Editor and 41 votes for “Adventure Time,” which conflicted with the Rabid slate. If we assume those identify the hard-core Rabid and Sad puppies, respectively, that would give a total of 204 total slate voters last year.  (Estimates of this vary wildly though.)

The above algorithm doesn’t give such clean results for last year, probably because the two slates had much poorer slate discipline and so the number of voters dropped off beyond the top categories, but focusing just on the best novel numbers suggests a min of 168 and a max of 217, which is at least in the ballpark.

Details of the Analysis

Because Blogger has difficulty with embedded mathematics, I'm have put the details of the analysis in a PDF file. I welcome feedback and information about errors.

28 comments (may contain spoilers):

  1. Hi,
    Are you assuming Abyss & Apex withdrew from Semiprozine?

    ReplyDelete
  2. No. The top organic vote for Semiprozine would have been between 172 and 217, so that just barely leaves room for one organic candidate to win. On the other hand, it would also be consistent with them having declined the nomination. This data doesn't really give enough info for us to know for sure.

    ReplyDelete
  3. Hi Greg
    One quibble, though it doesn't affect the analysis: the Jonathan Moeller novella was replaced by Brandon Sanderson's "Perfect State" on the final version of the slate. The RP missing from that category is Nick Cole's "Fear and Self-Loathing in Hollywood". I suspect that he would not have declined the nomination.

    ReplyDelete
    Replies
    1. Thanks! Actually, the error was that I wrote "novella" when I meant to write "novelette." I've corrected it.

      The best estimate for the organic result for Novella was between 360 and 310, so you'd expect one to get on the list without needing to assume someone declined.

      Delete
    2. Any chance of an "organically" nominated story replacing the Thomas Mays one?

      Delete
    3. That should happen with 100% confidence. :-) Just give the admins a couple of days to contact the nominee and get him/her to accept the nomination.

      Delete
  4. It seems like your EPH calculations assume that current non-slate voters make five nominations in every category and would continue to do so under EPH. If current non-slate voters who vote in cats like Best Fan Artist typically only nominate one or two artists on their ballots while slate voters are nominating five artists (assumption I know). And we assume this trend continues next year with EPH (huge assumption). Then slate voters would be completely blocked out of the nominations. I'm pretty sure the reality is somewhere between that hypothetical and your assessment.

    ReplyDelete
    Replies
    1. No, there's no assumption as to how many nominations each voter makes. The only error on that score is that it doesn't account for the fact that a voter may not vote for the same thing twice in the same category, but that's a very small source of error. Did you look at the detailed summary?

      It does assume that the slate voters vote for all the slate nominees (usually 5), but there really is no special assumption about how many things organic voters put on their ballots.

      Delete
  5. If I understand EPH correctly. And 205 slate voters vote the same 5 nominees. Those nominees only receive 41 points each. If the top 5 non-slated nominees each get 42+ nominations (and they are the only noms in their category on their ballots). They will each have 42+ points, completely blocking the slated nominees.

    ReplyDelete
  6. Correct. However, as soon as one of the slated nominees is eliminated, the other four each have 205/4=51.25 points. And when three remain, they have 205/3=68.33 points. At that point, they can break into some of the categories. Remember also that although points are used for ranking in EPH, comparisons are still done based on the actual number of votes, so if a slated work with 41 points (and 205 votes) is compared with an organic work with 42 points (and 42 votes) then the slated work will win. Slated works generally get eliminated because EPH forces them to compete with each other.

    ReplyDelete
  7. I attended the WSFS meetings (8 hours worth) at Sasquan. I am guessing you and Eric where there too ?

    They talked about E Pluribus Hugo, but not enough imo. E Pluribus Hugo was designed to act as a fair democratic voting system. It was never meant to eliminate slate-nominated works, just reduce the slate impact. Meaning the most popular of the slate-works would still get on the final ballot.

    Thanks for explaining the bit about E Pluribus Hugo that was not well-explained at WSFS. I understood as far as the allocation of the points went, but as to how that worked against slates, that was far-more complicated.

    One thing that should be pointed out - nominating only 1 item so that it gets all your points will not work. It will not improve the chances of your 1 item getting onto the final ballot.

    More fans nominating did help. For what it is worth, I used this site extensively to find works, read works and make nominations for all 3 of the Hugo Short Fiction categories. I also nominated Rocket Stack Rank in the Best Fanzine category.

    ReplyDelete
    Replies
    1. Thanks! And thanks for the kind words. It's been great seeing your comments on the site--we'd love to see more discussion here, and a few pioneers really help.

      EPH is that it removes some of the advantage from slate voting (intentional or otherwise) and should put more variety into the results. What it doesn't do is prevent vandalism. An honest slate probably does contain a work or two that voters should get to evaluate, so it's okay to let some through to the ballot. But to deal with vandalism, you have to stop it all.

      I like Kevin Standlee's "3SV" proposal. Done right, I think EPH can deal with normal slating while 3SV will deal with vandalism.

      Delete
    2. EPH is slightly more complicated than it needs to be to do what it is supposed to do. (As is the actual voting system's handling of the "No Award" option.)

      First I have heard of the 3SV proposal. Mathematically, I am a fan of introducing more stages in terms of tuning results, but I would worry about participation rates - it's generally true that voters tend to participate at lower rates the more times they have to vote.

      Delete
  8. This is really great analysis! We'll know the number for sure when we get the vote totals for Vox Day in late august, right?

    As a mathematician I love to see an application of Newton's Method applied to science fiction!

    ReplyDelete
    Replies
    1. And I'm very happy to see that someone actually read the mathematical analysis!

      Delete
  9. How would this work with 4/6? Less slate penalty, but with more nominees and smaller slates. How many more non-slate finalists would we get? Personally I don't really care about 'vandalism', I just want plenty of good non-slated choices.

    In August we can actually amend the 4/6 numbers to whatever numbers we want. So we should still be thinking about what the ideal numbers are.

    ReplyDelete
  10. It is possible a few of the organic nominees surpass the slate nominees in regards their nomination numbers.

    Are you able to do a write-up at some point about EPH ?
    I understood it enough to vote for it at the WSFS Business meeting but the vote was close.

    I suspect part of the problem is not understanding EPH at all. One really needs to be into maths to fully understand it.

    ReplyDelete
  11. I'm doing an analysis of 4/6 as we speak. The preliminary numbers don't look very good. It appears that the slates could almost completely negate the effect of 4/6. That is, under the current system, the slates could have taken all but 15 out of 85 nominees. Under 4/6, it seems that they'd still take all but 17 out of 102 if they split it right. We'll see if that result stands up to further examination. (It's real easy to make mistakes with this stuff.)

    June: I'll think about an EPH writeup. So many other people have done them, I'm not sure how much I can improve on that, but maybe there's a different angle to take with it.

    ReplyDelete
    Replies
    1. What I require is a technical explanation of how the mechanics for EPH would work on slate nominations as compare with organic nominations.

      You have already partially done this in this write-up.

      As for 4/6 - very easy to get around and it will still require large number of organic nominations to surpass slate numbers, but I too, am very interested in a theoretical outcome for this option.

      Delete
    2. My worry about doing that for 4/6 is it amounts to giving instructions on how to construct a winning slate. Perhaps I'm wrong in thinking they can't figure it out on their own.

      Delete
    3. I voted against 4/6. My maths is good enough to work out that this is not the solution. Others may need to see the numbers presented before they are fully convinced.

      Delete
  12. Vox his saying his estimate of his numbers is 750. Hard to see how that could work even if their discipline was weak.

    ReplyDelete
    Replies
    1. Note that I'm only trying to count the hard-core slate voters--the ones who voted all slate items in all categories. There must be more people who really did use the slate as a recommendation list. (Otherwise the slate would always get 5 or zero in every category owing to the rule for ties.) If you include both the hard core and the "casual" group, I think you could argue for as many as 400. I don't think 750 is realistic.

      Delete
  13. I'm trying to remember if 4/6 is also up for a vote at Worldcon this fall. Was EPH the only rule change passed last year or did 4/6 also pass? If so, perhaps a combination of the two?

    ReplyDelete
    Replies
    1. They both passed. I'm playing with the combination now. At first glance, it doesn't make a lot of difference compared to EPH alone. You get about 35 slate items on the ballot. Of course you have a bigger ballot (102 vs. 85). This is the super-optimistic analysis of it, though. 4/6 would weaken EPH somewhat, but it's hard to quantify exactly how much.

      Delete
  14. While the focus on these Hugo reforms is how they'd prevent slate voting from dominating the nomination process, I also think the reforms will make the process much fairer and open to more quality nominees. In the long run that will benefit the Hugo Awards even if slates continue to try to jam certain nominees into the final ballot.

    ReplyDelete
  15. The minima column is based on the assumption that at least one slate candidate that made the ballot is of sufficiently low quality, or of sufficiently controversial nature, that no non-slate voter would vote for it. This is a very tricky assumption even outside of Drama-Long.

    I will actually point to the Sad Puppies list and say that if we see a nomination on the Sad Puppies list, it has support outside of Vox Day's directed slate, of at least a small number of voters. On the other side of the spectrum, if a work comes out of Castalia House, we might be able to assume nomination of it is sufficiently controversial that nobody outside of a slate would vote for that work.

    I do not think it applies to the novel category, for example. Both RP-nom are SP-echoed. In the novella category, all four RP-nom are SP-echoed (and the fifth nom is SP-nom, displacing someone unlikely to decline nomination). In spite of the sweep, I am not convinced it applies to Graphic Novel; these were "safer" picks by VD that were not, AFAIK, connected to Castalia.

    At the end of the day, I would include the categories of novelette, short story, related work, editor (long) with VD himself, and definitely fanzine (Castalia House blog) as being categories where the assumpion should apply. The peak on this is 158 for fanzine; which leaves me slightly skeptical, as the 2015 nomination statistics show significantly higher numbers of ballots (and correspondingly higher thresholds) for novelettes and short stories than fanzines. (I understanding including 2015 data into the model is difficult, since it isn't the "control" condition of slate-free voting.)

    That would also make sense of the novelette category maximum; I doubt that Moeller would have declined nomination, given that his work was published through Castalia House. That would bring us to an estimate of ~150 or so slate voters.

    Best Editor (Long) seems a very likely candidate for withdrawals. Fan Writer seems unusual given it's a Castalia House person who didn't make it; I think what may actually be the case is that the "slate voters" were not as disciplined down-ballot in the fan categories.

    ReplyDelete
    Replies
    1. These are all good points. They all tend to make the estimated number of slate voters smaller rather than larger. We'll get some clue if any of the people involved speaks up about having declined (or insists he/she did not decline).

      Delete