Photo credits: Epping Forest District Council
Photo credits: Epping Forest District Council

How to measure opinion poll inaccuracy in elections

How can we usefully summarise the accuracy of an election opinion poll compared to the real result of an election? In this blog, we describe a score we have devised to allow people to see how different polls compare in their reflection of the final election result, no matter how many parties or candidates are standing. This index, B, can be compared across time, polling company and even election to provide a simple demonstration of how the polls depicted public opinion in the run-up to polling-day.

The 2015 UK General Election has reminded us of the impact that inaccurate opinion polls can have on political science, media coverage of elections, and on the confidence that the general public has in election forecasting of any kind. A salutary reminder that the polling fails of 1992 in the UK, or 1948 in the US, are not the preserve of history, the 2015 miscall has once more focused attention – some of it misplaced – on why and how polls produce the results they do.
The sources of opinion poll inaccuracy have been the subject of much research. Response bias (some voters are more likely to report their intended vote than others) and house effects (how polling companies: a) select their interviewees; b) deal with their answers, and c) weight these answers in anticipation of bias) have been identified as two of the key culprits. Much of this research has been highly sophisticated, using complex statistical models to look at the relative impact of such causes. Yet, except for genuine two-party or two-candidate races, where it is relatively easy to summarise the difference between the outcome and the poll, it has been difficult to find a way of summing up the overall level of difference between polls and the election result in multiparty systems.
Given multiparty systems account for the vast majority of democratic electoral systems in the world, this seemed like a gap worth trying to fill.

In a paper written a couple of years ago, we put forward a measure, which we call ‘B’ (simply because it builds on a measure developed by other researchers called ‘A’ ) which summarises the overall accuracy of an opinion poll as compared with the election result in an election with any number of competitors. In the figure below, the graphs show B’s representation of the overall difference between the polls and the 2015 General Election result for the main polling companies. We can break the lines down by party if we want, but the overall summary clearly shows how polls tended to get closer to the result as the election approached – but not close enough.

“B index of accuracy across UK pollsters in the lead-up to the General Election”
“B index of accuracy across UK pollsters in the lead-up to the General Election”

The statistics behind B are relatively technical, using a so-called multinomial logit estimator to look at the (logged) relative odds of support expressed for each party in the opinion poll and of their final result in the election, whose absolute values are then averaged. But using B itself is very simple. A perfectly accurate opinion poll, identical to the final election result, will score 0 – no error. The more the poll is inaccurate, so the index rises. Like many statistical coefficients, one B score is not usually that informative. What is of more use is a comparison across time and across polls to see how the accuracy evolves.
As with any statistical coefficient, B also has confidence intervals (not shown in the figure). Being a sample of the electorate, a poll cannot be held ‘inaccurate’ if it is within the bounds of sampling error. (We’ll ignore here the very good point that the sorts of samples polling companies use imply error beyond that associated with a random sample.) This is one drawback of B – as any statistian knows, significance is related to sample size, so perversely a poll with a very small sample may not be ‘significantly’ inaccurate even though it is as wrong as a much larger sample poll. Judicious monitoring of sample size is therefore recommended when using B.

We should also be careful in the use of the label ‘inaccurate’. As opinion pollsters go to great pains to underline, even polls asking explicitly how a voter would vote were the election tomorrow are not forecasts – they may be regarded as nowcasts, or a snapshot of opinion at a given point in time. To accuse a poll some 6 months before the election of being ‘inaccurate’ because it did not match the eventual election result is unreasonable – at least, more unreasonable than a similar criticism of a poll published a day or two before the election.
B allows us to track methodically and simply how polls evolved in the lead-up to an election. Had opinion settled well before the election? Were there any ‘shocks’ which saw opinion shift substantially either towards or away from the eventual result? Such dynamics can be distilled out of individual party measures, but B provides a much more convenient set of snapshots to look at trends across time. Furthermore, we can compare accuracy not just across time and polling company, but also across election and across country. Its expected value does not depend on the number of parties that compete in a given election (you could call it an unbiased estimator of bias), and its calculation can take non-voters and pre-electoral alliances into account if desired. B is comparable, whatever the unit of observation.
B cannot work miracles. It can only be implemented once the election has been held. But then nothing can tell us how accurate a poll is until the election has been held. Similarly, B does not tell us why a poll was inaccurate, or what reasons a particular polling company managed to do much better than others in reflecting public opinion just before an election. But it does give us a very useful dependent variable – a score which we can then use to test competing explanations of polling bias.

The B index is available for anyone to use who has the Stata statistics software. It can be downloaded as package “surveybias” from the SSC archive. The commands within Stata are extremely simple to use, and entirely automated beyond entering the polling scores and election results (and even that can be done just by uploading a spreadsheet). The “surveybias” package includes real-world examples of polling data from France and Germany as well as example scripts that researchers can easily adapt to their own data.
In the future, we want to implement the B index in the R programming language. We also intend to create a web-based app that would allow anyone to look at polling accuracy, without needing to use statistical software.
The authors would like to acknowledge Dr Chris Hanretty for providing the polling data upon which the figure is based.

About Jocelyn Evans

Jocelyn Evans

Jocelyn Evans is Professor of Politics at the University of Leeds (UK). His research focuses on voting behaviour, particularly in France and for the Extreme Right.

Jocelyn Evans @ University of Leeds

About Kai Arzheimer

Kai Arzheimer

Kai Arzheimer is Professor of Politics at the University of Mainz (Germany). His research focuses on research methods, voting behaviour, and public opinion in Germany and Western Europe.

Kai Arzheimer @ University of Mainz (Germany)

Check Also

Photo credits: Office of the Official Secretary to the Governor-General

The king is dead; long live the king?

There’s only so much you can do to keep up with the political machinations of ...

Leave a Reply