Social media in general, and Twitter in particular, brim with opinionated messages about diverse topics, from films and brands to services, products and politicians. Political tweets have been subject to a fair amount of research; it has been argued, for instance, that they can be used to forecast electoral results or that they can eventually render opinion polls pointless. Such claims have faced important criticism and extreme caution should, indeed, be advised. Political opinion mining in Twitter has a number of limitations due to both the nature of the data and the methods of analysis commonly employed. In this post I will discuss these issues and a few interesting lines of research with which to tackle them.
Tweets are not Habermasian public opinion
To start with, we must note that politically motivated tweets are a far cry from constituting public opinion in a Habermasian sense. Although Twitter users interact with each other on political matters, those exchanges rarely serve as examples of civil and reasonable deliberation. Moreover, the information about political issues available on Twitter lacks depth and is mostly poor in quality. Misinformation, disinformation and astroturfing are widespread.
While tweets generally do not conform to Habermas’ ideal of public opinion, they do nevertheless constitute opinions – of some sort. In fact, they almost perfectly match the features that psychologist Floyd H. Allport enumerated in his classic 1937 study about public opinion: politically motivated tweets are verbalizations produced by individuals on some issue that is important to them. Furthermore, those individuals are aware that others are reacting to the same issue, and that their behavior can be sufficiently effective to give them a chance to attain their goals.
It is because of such features that politically motivated tweets have attracted much attention. They hold the promise of gauging public opinion for any conceivable topic from extremely large samples of the population, with very little effort on the part of the pollster.
Gauging public opinion from Twitter
Virtually all attempts to gauge public opinion from Twitter follow a quantitative approach. That is, researchers aggregate figures they obtained from Twitter data and interpret them as a proxy for the metric in which they are interested; examples include vote share, consumer confidence and presidential job approval. Several Twitter features have been used as proxy metrics, such as the number of followers, the number of tweets including a given keyword or hashtag, or measures of aggregated sentiment (actually polarity) derived from the tweets.
The most common approach usually consists of two steps: analysts first determine a set of keywords to collect tweets for a given period. Next, they produce either an aggregate figure or a time series – in both cases, the values correspond to raw tweet volume or overall polarity (positive or negative).
When forecasting electoral results (a major subtopic within Twitter-based political opinion mining) it is unusual to produce time series much more common to report figures summarizing the Twitter “sentiment” during the whole campaign. For other kinds of public opinion where longitudinal data is available, time series derived from Twitter data are the norm.
There have been a number of studies reporting promising and even widely successful results on this front, such as the work by Andranik Tumasjan and his colleagues on electoral forecasting by means of Twitter data. Such results, however, have not been free of criticism, and Twitter-based electoral forecasting is still controversial. This is because initial research tended to neglect or minimize the different sources of bias in Twitter data, as well as fail to adequately recognize the limitations in the methods applied.
Sources of bias and known limitations
The main source of bias when mining opinion of any sort from Twitter is that Twitter users (or tweeps) are not a representative sample of the population. Youth and urban people are overrepresented, and the rates of Twitter use within different ethnic groups vary greatly. To improve results, it is therefore imperative to know users’ demographics and to try to correct the data according to their actual weight within the population of interest.
Another important source of bias is self-selection: Twitter users do not tweet about every conceivable topic. Indeed, non-responses can play an even more important role than collected data. If a lack of information mostly affects just one group, the results can show a significant departure from reality. A Pew Research report has suggested that Twitter opinion is not only at odds with public opinion at large – it also departs from the latter in unpredictable ways. In some instances, responses on Twitter are much more liberal; in others, they are much more conservative.
There are two further weaknesses of Twitter-based polling, both related to self-selection bias: vocal minorities and the spiral of silence. The first effect refers to the finding that most tweets on any topic (and particularly regarding politics) are produced by a small fraction of Twitter’s user base. This is problematic because it may give the impression that some ideas are much more popular among the population than they really are. The spiral of silence is an effect whereby vocal minorities deter ordinary Twitter users from publicly expressing their opinions when they differ from what they believe to be the point of view of the majority.
Twitter, and social media more generally, can often be adversarial in nature. Some political users may try to manipulate other users in one way or another, for example by pretending that tweets are posted by ordinary individuals when they in fact are not.
Even worse, a non-negligible amount of Twitter users are, in fact, not real humans but rather automated accounts used to spread messages to defame one candidate or to paint another in a more favorable light. Twitter estimates that 5 percent of users are spammers. The widespread use of such tactics makes it difficult to draw inferences about actual public opinion.
Finally, it is usually assumed that opinion mining in automated ways is fairly straightforward and that precision is reasonably high. Unfortunately, opinion mining (also known as sentiment analysis) is not an easy task and is especially difficult as regards politically charged tweets. This kind of material is packed with double entendres, sarcasm and humor. Seemingly neutral tweets can actually be negative or positive opinions depending on the framing and choice of words. Therefore, commonly employed methods, such as the use of polarity lexicons, are prone to errors when inferring the presumed opinion of a tweet.
To sum up, Twitter-based public opinion mining is a challenging field. Some foundational research in the area has been excessively optimistic, and reports about the death of the polling industry have been greatly exaggerated. This by no means implies that such kind of research is pointless – quite the contrary. Twitter is full of opinions by individuals, and any attempt to enhance the signal over the noise in order to obtain meaningful information is worthwhile.
Much relevant research has been conducted in this regard, such as the credibility evaluation of tweets, automated detection of misinformation and disinformation in Twitter, and demographic user profiling of tweeps. In a rather ironic turn of events, one of the most promising approaches to mining tweets for public opinion requires polling data for training purposes. Here tweets are not used to infer opinion but to nowcast opinion polls.
Ultimately, however, to use Twitter data in an appropriate and responsible way, we must be aware of the limitations and confront remaining methodological challenges.