Can social media predict X Factor voting? It’s a question that’s come up a lot in the Sofabet comments section recently, with debate about how much can be read into metrics such as numbers of Twitter followers (Frankie Cocozza way ahead, so evidently not much), Facebook likes (Janet Devlin way ahead), Youtube viewings and like/dislike ratios, iTunes ratings, and so on.
Regular readers of Sofabet during the 2010 series will have followed the travails of our commenter Nick in trying to identify social media metrics which might help him to predict the bottom two each week. This year we are especially looking forward to following commenter Toby’s new anlytk site, which tracks volume of tweet activity and positive/negative balance. This week it showed Nu Vibe and Rhythmix generating least activity, with Frankie and Kitty generating most negativity.
We’ll come back to this week in due course, but first some more general questions. Twitter sentiment analysis has recently been used with some success to predict box office sales and stock market movements. So what is the potential to use social media to predict X Factor voting – and what are the problems?
As we at Sofabet are not the most switched on when it comes to social media (we so far use our Twitter account only for automated notifications of new posts), we asked Toby from anlytk to explain his methods:
“What I have written is some code that connects to the twitter API and grabs everything tagged with certain keywords from the last week… Then it’s a case of going through each tweet, looking for words that reflect positive and negative emotion (for want of a better word) and then counting up each of them. It really is as simple as looking for “good” words: Like, love, happy, great etc, and bad words: hate, rubbish, shit etc. There are a few lists out there which rate them in terms of how positive or negative they are, but it’s all a bit subjective.”
Toby’s methods are similar to free tools such as Twitter Sentiment and Social Mention, which show volume and positivity/negativity of references, but, as he says, “I have access to the raw data so can be more selective in how I analyse it”. This helps to solve one problem with these free sites – filtering out non-X Factor references. For example, every tweet referring to “Janet Devlin” will be relevant, but most tweets containing the phrase “The Risk” will not.
Another problem is revealed by a quick search of Social Mention – the tracking of positivity and negativity using keywords is still very primitive. For example, “Frankie Cocozza is a twat and needs a good haircut” was counted as a positive tweet, presumably because the word “good” is on a positive words list (and whoever compiled the negative words list missed out “twat”); whereas “it’s a shame Marcus Collins is gay” was counted as a negative tweet, presumably because “shame” is on the negative words list. Toby also points out that such simple word lists can miss the meaning inversions that come with the word “not” (so, for instance, “not a good performance” would be counted as positive).
Pending improvements in software that can understand human language, there is not much we can do about this. We can at least assume that it shouldn’t affect comparisons too much, as the problem should affect all acts roughly equally; although, as Toby says, that “might depend on the demographic a bit – different audience segments using different language for instance”.
Mention of demographics brings us to the core problem: There is far from a perfect overlap between X Factor voters and users of social media. As you would expect, analyses of Twitter user demographics, such as this one, show they skew towards the young, female, urban and educated; Facebook demographics also trend young, as do Youtube demographics.
We can discover something about X Factor voter demographics from a poll conducted by YouGov before the 2010 final (which proved to be pretty accurate in terms of predicting the result, though it underestimated both One Direction and Cher – perhaps because their predominantly young voters were more likely to multiple-vote). While the show’s voters also skew towards the female, they do not skew so young, nor so urban, nor so educated (if we can use education as a proxy for social class) as social media users.
However, knowing the differences in overall demographics is not much help, because to translate from Twitter activity to votes we would need breakdowns of voter demographics per act. And this is something we can only guess at.
Last year, for example, Nick found that One Direction consistently did much better on Twitter than they did in the votes, while Mary Byrne did much worse. These are obvious cases – we can be pretty confident that the former’s supporters skewed young and female and the latter’s more mature. But which social demographics were voting for Matt Cardle, say, or Rebecca Ferguson? This is much less obvious. That YouGov poll tells us some things we might easily have guessed (Matt’s lead over Rebecca was bigger with female than male voters), and some we might not (in age breakdown Matt’s biggest lead was with the 16-24s; in regional breakdown, Matt did disproportionately well in the North, the Midlands and Wales, and disproportionately badly in Scotland). This shows that the necessary judgement calls about which demographics are voting for which acts need to be made with some caution.
It is fair to say that Nick’s efforts last year yielded more frustrated theorising than profitable insight (Nick’s absence from the comments box this year is due to work commitments). In his comments about the semi-final, for example, Nick noted that One Direction were dominating Twitter with Rebecca coming last, and Cher was leading Youtube with Matt only fourth. The vote totals, when we found them out, showed that neither were especially useful: Matt won with 35% followed by Rebecca on 20%, One Direction 17%, Mary 15% and Cher 11%.
This first week of stats from Toby was similarly frustrating, yet also promising in the fact that two acts stood out as having the lowest volume of activity (Nu Vibe and Rhythmix) and two stood out as having the highest levels of negativity (Frankie and Kitty), and the bottom two was comprised of one from each pairing.
On the face of it, we would agree with the interpretation Toby offered when presenting the data before the results show – that the key metric ought to be lack of interest, rather than negativity. It is, after all, a vote to save not a vote to evict, and Frankie was comfortably midfield in terms of positive tweets. Why did these not translate into votes, and why were the votes that kept Rhythmix safe not reflected in positive tweets?
The answer may be demographic: Maybe Frankie-liking tweeters did vote, but were outweighed by Frankie-disliking non-tweeters. And maybe Rhythmix’s fan base are not on Twitter. (This would seem odd, though, as Tulisa’s rather bizarre comments – “they’re not saying we’re going to steal your boyfriend, they’re saying we’re women, let’s unite and stand strong” – suggest Rhythmix are squarely aimed at young females).
Or perhaps the neurons which fire when a desire to vote takes hold are different from those which fire up a desire to tweet. Perhaps Frankie’s fan base took to social media sites to defend themselves as much as him (as in the amusing Facebook feed posted by Martin F), but on some deeper level realised he was too awful to merit a vote. Maybe a lot of girls liked Rhythmix enough to vote, but did not feel they are cool enough to want to broadcast this fact to all and sundry. Being neither voters nor tweeters ourselves, nor indeed teenage girls, we at Sofabet are ill-equipped to judge such theories.
Do we then simply agree with Henry VIII’s comment: “I’d be wary of social media”? Or do we look for angles which could improve its predictive ability? Nick suggested last year that it might prove more useful to track whether an act is getting more or less attention week-on-week than they typically do, rather than comparing them to other acts. Toby is also thinking about analysing whether there is a difference in tweet activity during the Saturday show, after the show, and during the Sunday show.
We eagerly anticipate Toby’s refinements to his model as this season progresses, though we close for now with some words of caution from the man himself: “I’ll caveat this whole thing by saying up front, this is all a learning experience and a bit of an experiment for me… My gut feeling from reading the papers and looking at the traffic is that you can get some of the way to the right answer, but it’s going to be hard to say with confidence that the predictions from the graph will be right all of the time (or 80% or 60%) or anything else. I think they just become another piece of evidence that you can use as a punter (or a bookie) to inform your odds and your betting strategy.”
What’s your feeling about whether social media indicators – and which ones – can help us anticipate X Factor votes? Do please share your thoughts in the comments box below.