Voters or Robots? Evaluating the Twitter popularity of the presidential candidates
March 17, 2016
UPDATE 12:37 AM MARCH 29:
Spoiler alert! Before you read our blog post below, there are a few things we want to make very clear:
We did find tweets about the presidential candidates that we think were posted by bots, and most of these happened to be about one candidate in particular – Duterte. But, this DOES NOT MEAN that these bot-posted tweets originated from the Duterte campaign or its supporters. In fact, most of these botty tweets did not even speak positively about Duterte, and were posted by accounts that appear mainly designed to broadcast #KathNiel tweets. Please read the article to enjoy the full story.
Just past midnight on November 25, 2015, a Twitter user named @chiminychurva posted a tweet mentioning Philippine presidential candidate Rodrigo Duterte.
There was nothing really unusual about the tweet (we'll come back later to what it actually said). It was neither retweeted, favorited, or even replied to. The poster’s account, with only 329 followers, isn’t especially popular.
But within the next minute and a half after this tweet was posted, six more tweets mentioning Duterte were posted. The next minute, 38 more Duterte tweets were posted. The minute after that, 113 more. By 12:20 a.m., tweets about Duterte were being posted at a rate of 237 per minute. The surge would continue over the next two hours, peaking at 703 tweets per minute by 1:03 a.m. before eventually coming to an abrupt halt just past 2:00 a.m.
In the two hours between 12:00 a.m. and 2:15 a.m. on November 25, 2015, a total of 30,195 tweets about Duterte had been posted. This was more than the number of tweets about Duterte that had been posted when the Davao City mayor announced his presidential bid four days prior on November 21. This was even more than all the tweets that had been posted about any presidential candidate over the previous 29 days.
These tweets were captured in a dataset of 116,071 tweets about the Philippine presidential candidates that our team collected over the month-long period between October 26 and November 25, 2015 (with some caveat). We first built this Twitter pipeline to better understand how Filipinos are using Twitter to talk about the elections (read our previous blog post about the most-used emojis in tweets about the Philippine presidential candidates).
A common assumption is celebrities or politicians with more buzz on Twitter are also more popular in real life. But as we dug into the data, we learned that tweet counts and trending hashtags are not as straightforward a measure of popularity as they seem.
Looking for bots
One reason to be skeptical about candidate tweet counts is that not all Twitter users are real people. Wikipedia defines a Twitterbot as "a program used to produce automated posts on the Twitter microblogging service, or to automatically follow Twitter users.” A quick Google search will lead you to many online vendors that sell followers, favorites, retweets, and mentions for as cheap as a few dollars.
We wanted to know how much of the Twitter conversation about the Philippine presidential elections might be artificially driven by Twitter bots.
We first tried using the University of Indiana’s existing "bot or not” application. This application and API analyzes over 1000 of a Twitter user’s features, like their networks of friends/followers and the content of their tweets, to assign that user a “bot or not” or BON score. The higher the score, the more likely that an account is a bot. Unfortunately, because it takes about one to two minutes to get the BON score of one Twitter account, it would take us at least 24 straight days to get BON scores of all 34,455 users in our sample. We decided to find other strategies for identifying bots using the data we already had scraped and stored.
So we researched other typical characteristics of Twitterbot accounts. Some say that Twitter users who post an insane number of tweets are likely to be bots. But what number qualifies as “insane?”
On average, users in our sample tweet around 22 times a day. So we figured it would be safe to assume that users who tweet more than a few hundred times a day are probably bots. We found that 3.4% of the users in our sample – around 1,216 – tweet more than 200 times a day. Let’s call them “high-frequency tweeters.” Here’s a quick look at some of the most prolific ones:
|User Name||Average Tweets/Day|
What could be spammier than accounts that tweet literally thousands of times per day? Some of these accounts’ usernames were also pretty bot-like. For example, take a look at users @lepitennicell10, @lepitennicell4, @lepitennicell2, @lepitennicell69, and @lepitennicell1. They have almost the same username, differentiated only by a few extra numbers or letters. The pattern suggest these names were generated by a computer script designed to quickly generate a lot of unique usernames – useful if you're trying to open a bunch of Twitter accounts at the same time.
About which presidential candidates were these high-frequency tweeters posting? About 48% of tweets about Duterte were were posted by high-frequency tweeters, versus only 2 to 6% of the tweets about the other candidates.
It's great publicity for a brand or personality to bag a fleeting spot on Twitter’s list of trending topics. But to make a topic trend, you need a lot of different users tweeting about your topic at the same time. Not enough real people willing to do this? Just hire or create an army of Twitterbots to make it happen.
To find more bot-created tweets, we looked for groups of five or more tweets that were posted at exactly the same time (down to the second) and at least 80% similar in content – let’s call these “mirror tweets.” We measured the similarity between simultaneously-posted tweets using a natural language processing metric called “Levenshtein ratio," which scores the similarity between two bits of text on a scale of 0 (totally different) to 1 (exactly the same). We figure that if two, five, or even 10 people coincidentally tweet at exactly the same time, their posts would at least be varied. But if 10 different users tweet almost exactly the same message at the exact same time, they’re probably bots.
We found that about 26% of the tweets mentioning Duterte were mirror tweets, a proportion far more than the other candidates:
Who has the “bottiest” tweets?
So far, we’ve tried to identify bot-posted tweets two ways:
- Were the tweets posted by users who tweet more than 200 times a day?
- Did the tweets belong to clusters of very similar, simultaneously posted tweets?
There were about 12,629 tweets in our sample that fit both these criteria. Let’s call them “botty” tweets. We suspected that Duterte's high proportion of botty tweets could be attributed to the surge of Duterte-related tweets on November 25. To find out, we checked on the daily distribution of botty tweets over our whole 30 day sample. Sure enough, almost of all botty tweets were both about Dutere and posted on November 25!
At this point, we were really curious about what these botty tweets were saying. Here’s a random sample of some of the tweets posted during that two-hour period:
We were surprised to find that the tweets were not only similar, but most of them were reposts of the exact same tweet! The chart below shows the frequency of tweets posted every three minutes that evening. Those containing the exact phrase “Kathniel Mode pala si Duterte” are in blue:
One tweet to rule them all
We tried to understand why this particular tweet got broadcast at this scale by taking a closer look at the content. The very first instance of this tweet was posted by user @chiminychurva at 12:14 AM that day. A few moments later, it was echoed by a few bot accounts similarly named "nightybutera." Then the flood gates proceeded to open.
It’s hard to see why this particular tweet was eventually reposted over 30,000 times in two hours. Heck, it’s hard to figure out what the tweet even means if you’re not familiar with Philippine showbiz or politics. For the benefit of those who aren’t, let’s break down its layers one by one.
In a nutshell, the tweet pokes fun at Duterte by comparing him ABS-CBN’s on-screen love team of Kathryn Bernardo and Daniel Padilla, also known to their fans as “KathNiel.” But what does one have to do with the other?
First, it’s important to get that Duterte had just announced his presidential bid four days before this tweet was posted. Before that, Duterte had repeatedly insisted that he had no intention to run for president, despite intense lobbying from his enthusiastic supporters. After a months-long “will he or won’t he?” tease, he finally announced his candidacy on November 21.
The second important piece of the puzzle to understand is KathNiel and the Push Awards. The KathNiel love team has a rabidly vocal fanbase on Twitter, who often work feverishly to make KathNiel-related hashtags trend. Most recently, KathNiel fans had been campaigning vigorously for the couple to win at the #PushAwards, ABSCBN's first digital media awards, which were conducted via online voting. KathNiel eventually won several categories at the #PushAwards awarding ceremony on November 11.
In summary, the tweet sarcastically compares Duterte's campaign for votes to KathNiel's own campaign to win at the #PushAwards.
Why did KathNiel Twitter bots latch onto this particular tweet? We doubt that Duterte’s camp or supporters were behind these bots. By using the hashtag #DuArte2016 and the word “trapo,” the tweet criticizes Duterte for being overly dramatic and a scheming, traditional politician. That’s not exactly a favorable view of Duterte. Although, some would argue that campaign operators don’t distinguish between bad and good publicity.
Another possibility is that a bunch of previously-programmed KathNiel Twitterbots latched upon @chiminychurva’s Duterte-related tweet just because it happened to contain keywords and hashtags related to KathNiel and the #PushAwards. Until as late as February, some of these bots were still rebroadcasting #PushAwardsKathNiels tweets. Looks like whoever made them forgot to turn them off. It does tickle us to think that there could be a bunch of teenaged girls out there who are both KathNiel fans and know how to code Twitterbots. We all start somewhere, girls! #WomenTechmakers
Bots and buzz
While trending topics are sometimes powered by real users, our analysis has revealed a clear example of a Twitter trend fueled almost entirely by a reserve army of bot-like accounts. In the future, it would be interesting to look deeper into the relationship between bots and trends. Do bots make topics trend, do trends cause bots to become active, or do bots and trends amplify one another in a kind of feedback loop? In any case, bots can clearly dilute the diversity and authenticity of online conversations. If there’s any useful takeaway from our analysis, it’s that any candidate's Twitter buzz should be taken with a grain of salt this election season.
* Caveat: Our Twitter pipeline experienced some technical glitches on November 1 and November 10-14, 2015, so we didn't collect any tweets on those days.