On November 8 2016, the world was in shock: Donald Trump crushed almost all predictions to become the 45th president of the United States. Indeed, up to very last day of the campaign, pollsters and statisticians gave Hillary Clinton odds of between 75 and 99% of winning. How could Trump trump them all? How could this outcast beat one of the most qualified presidential candidate, a Clinton of all people?
One thing is certain, Trump changed the game and analysts are still trying to understand how. Apart from his novel communications techniques and unique approach to truth and reality, people increasingly believe he may have been helped by outside influence...
Since top intelligence officials of the Obama administration announced their belief that the Russian government was behind the hacking of the Democratic National Committee earlier in the year, Russia has been accused of meddling with the 2016 US elections. It is believed that their means of action included not only hacking but also transfer of information as well as public opinion manipulation. For example, to sway the public opinion, the Kremlin used an army of online trolls through its Internet Research Agency (IRA) based in St. Petersburg. One of the trolls’ media was Trump’s favorite: Twitter.
The main dataset we are using was released by FiveThirtyEight and contains close to 3 million tweets posted by the IRA . Although 72% of tweets were in English, they were posted in 55 different languages and categorized by Darren Linvill and Patrick Warren of Clemson University into 8 categories. Since we are looking into the effect of the tweet on the US politics, we focus on politically labeled english tweets: The ‘Left Troll’ and ‘Right Troll’ categories. As the categories were reviewed by external experts, we take these category as ground truth. This corresponds to approximately 1M tweets, 63% of which are right-leaning.
The trolls started as soon as 2014 and continued well into 2018 and the political situation in the US was interpreted by analyzing 2016 Presidential polls from November 2015 to the election a year later, and Trump’s approval rate after the elections. All datasets were obtained from FiveThirtyEight.
To understand how the trolling could have had any effect, we need to first understand who are the trolls and how do they work. Then, we will dive into the tweets’ content to finally analyse the existence of a link with the US politics.
How can we gain a clearer image of who are trolls? We want to know how do they tweet, how many followers they have and how the subjects differ from right-trolls to left-trolls.
The plot below presents the number of right and left trolls tweets with the top hashtag per day. The distribution is not smooth but rather follows an interesting spiking behavior. A larger activity is observable during the final election period in October 2016 and the summer 2017. Summer 2017 was a very eventful period both internationally and nationally.
Internationally, Russia and the US were expelling their respectives government officials, clashing over the Syrian and Donbass conflicts and imposing new sanctions on each other. In Asia, a bellicose build-up between Trump and Kim Jung-Un led to rockets flying over Japan while tensions in the South China Sea were growing. New sanctions were announced against Iran and diplomatic relations were degrading with old allies (Mexico, Australia) new ones (Cuba). Finally, the US withdrew from the Paris climate agreement.
Nationally, Trump was under pressure from multiple side. The shock of the violent alt-right rally in Charlottesville and Trump’s poor management of the situation further divided the country and led to various officials resignation which was quickly followed by another round following the Paris Agreement withdrawal. The former FBI director James Comey was accusing Trump and his staff of lying to the FBI while lawsuits were brought against the president following accusations of receiving emoluments from his foreign business dealings, leading to impeachment efforts. However, this pressure did not refrain him from presenting a new immigration law, pardoning a controversial and racist sheriff and announcing a decrease in climate actions.
All these events help explain the dramatic increase in tweet activity, reaching 14’000 posts in a single day! While the Right trolling starts in the early 2015 and decreases at the end of 2017, the Left trolls concentrate their activity in 2016 until it suddenly stops in mid-2017 when less than fifty messages are posted per day and do not react to the aforementioned events. Unfortunately, we could not provide any explanations about the dramatic drop in both right and left troll activity in the beginning of May 2017. While the right troll activity picks up again during the aforementioned 2017 summer, the left trolls barely react to the craze.
The top hashtag per day exhibit a large variety of terms and no clear tendency is observable, highlighting the reactivity and temporarity of Twitter topics. Nonetheless, #youtube is often the most popular hashtag for the Left trolls in the early 2017.
The most popular expression in the tweet content is analysed for each category. It should be noted that the words are stemmed to count root words, focusing on subject more than form. The color code represents the trolls orientation of an expression:
There is a flagrant differences of focus between the right and left trolls.
The right trolls are more involved in politics and news: Trump, Hillary and Obama are the most popular words, with Clinton and President also appearing. Overall, the word Trump is used about 145’000 times or about 13% of the total tweets! There is a visible focus on news as well, both the words news and break crack the top ten (“breaking news”).
On the other hand, the left trolls often used words related to music and social movement related to black people or discrimination against black people (white, ‘police). Surprisingly, the words Hillary, ‘Clinton and Obama are more used by the right than the left trolls. This highlights a disinterest in politics from the latter.
Hashtags can better characterize tweets as they represents its key words. The expressions are clearly oriented and category-specific. The left troll tweets are confirmed to be related to the Black Lives Matter movement and music whereas the right trolls use mostly Trump’s related hashtags, conservatives, islamophobic and news.
The user-tagging indicates to whom or what organization the trolls react and addresses their tweet. The right trolls mainly react to newspapers either relay the information (in the case of FoxNews, Breitbart and Sean Hannity) or criticize it (CNN). On the other hand, the left trolls mainly mention activists related to the Black Lives Matter movement.
A first conclusion of the right and left trolling strategies can be assumed from these information:
The most popular expressions and the top hashtags per day enable to highlight some recurrent themes. It might be more interesting to directly investigate the topics rather than the type of trolling. To understand what aspect of society was targeted and dig deeper into the tweets content, tweets were labelled with selected topics.
What is the more popular troll topic ? What social debate is targeted by the trolls ? Are tweet activity spikes correlated to events ?
A set of 25 topics is created inspired by the popular hashtags and the events that affected american politics. The lists of keywords are enriched thanks to a Word2vec machine learning model for each topic. Word2Vec enables to learn the linguistic context of each word over all tweets and represent them as “concepts”. Hence, similar words share the same representation and can thus be gathered in topics. Starting from few words, the lists are extended ten times using our model trained on the tweets content.
Topic related to musical content.
Topic related to movies/TV.
Topic related to sport content.
Topic related to the actual president : Donald Trump.
Topic related to democrate candidate of the 2016 elections Hillary Clinton.
Topic related to hacking and data breach. Mostly related to the DNC hacking and leaked emails.
Topic related to the senator Bernie Sanders who was Hillary Clinton's opponent for the democrate primary nomination.
Topic related to the former US president Barack Obama.
Topic related to 2016 US presidential election. This includes, debats, votations and other.
Topic related to voter fraud, a topic Trump referenced multiple times without tangible proof.
Topic related to the heathcare, specifically medicare (Obamacare) which Trump vowed to repeal.
Topic related to the US gouvernement in general.
Topic related to religions (especially islam).
Topic related to conservatist republican anti immigrant and in favor of gun ownership
Topic related to the liberal ideology.
Topic related to the guns bearing right.
Topic related to terrorism : attack, shootings, bombing, and others.
Topic related to geopolitical tension with other power such as North Korea or Russia
Topic related to the Black Lives Matter movement fighting against discrimination against african-americans. This includes police brutality and biases.
Topic related to the movement against sexual harassment.
Topic related to fake news, and the associated media.
Topic related to news.
Topic related to various scandals, financial, political and other.
Topic related to the neo-nazi rally of Charlottesville of the 12 august 2017.
Topic related to the United States' economy in general.
The proportions of tweets in each topic provides an overview of the debates that were particularly targeted by the different trolls to better characterize of the left and right trolls.
60.93% of tweets were successfully categorized. Right troll tweets are categorized in 66.99% of cases against 50.60% for left troll. Moreover, 25.85% of tweets appear in more than one category.
The most popular topics concur with the most popular expressions found previously. More than 15% of the tweets were related to Donald Trump, 10% to the Black Lives Matter and 9% to the elections. Our model is able to detect to detect 9,2% of tweets that do not mention Trump directly but are related to him in a way or another.
The distribution of topics among the trolls categories are uneven. In general, Right trolls are dominating every category.
Overall, 60.16% of the tweets are categorized at least once while 21.60% are included in more than one topic.
Tweet topics have non-uniform distribution over time but rather follow a spiking behavior over time. To explore the potential cause of such behavior, Wikipedia Current Events website is scrapped. The website records all the main events that happens during a day, classified by category. When a topic spike is observed, the corresponding wikipedia page is scrapped to find events. If the content is related to the topic, an event is detected.
However, some detected events might not be relevant. Three kinds of errors explains this:
Most of the tweets are related to relevant real events.
On october 5 2016, election, Black Lives Matter , music, sport and movie topics have a strong tweet peak (6000 tweets). On this day, two major events took place: the Paris climate agreement signature and Vladimir Putin suspending the 2013 nuclear agreement with the United States. The trolls might have decided to reduce the spreading of those news by flooding Twitter with irrelevant content about music. The goal may have been to divert attention from Russia’s departure on one hand and to avoid undecided people rallying to the ecological cause since Trump views are anti-ecology.
Hacking and Hillary topics peaks happened exactly within the two months preceding the election.
On June 21th 2015, about 1800 messages about Obama were tweeted. However, no major event seem to have involved the former US president during these days.
On 22 may 2016 however, the spikes in Religion and Terrorism are caused by the Brussels bombing. Religion peaks because the attackers were radical muslims and the right trolls spread hate speech against muslims.
During summer 2017, a major peak corresponds to the Charlottesville’s event. “Unite the right” white supremacists demonstrations and car attack against counter-protestant profoundly affect the american politic. The facts made trolls reacting with tweets about Black Lives Matter, Terrorism, etc.... As discussed before, the Summer of 2017 was eventful, and right trolls were very active at this period. As a result, almost every right troll dominated topic see a surge in activity, Election category included. This last fact may be explained by the fact that this list includes references to US political parties, used to anchor the debate.
Our findings in subject distribution, tweet activity and references lead us to believe that the trolls are following an overall divisive strategy. It seems they were trying to divide the public opinion on twitter by focusing on different subjects targeted to different public. Furthermore, we believe that Darren Linvill and Patrick Warren conclusion that right trolls aimed to mimic Trump supporters while left trolls goal was to divert the democrats attention from the election. Indeed, while we see right trolls heavily supporting Trump and taking politically engaged positions with links to the elections topic, the left trolls focus more either, on less relevant topics, or on societal topics that are not directly linked to the elections (Black Lives Matter). As a conclusion, the left troll behaviour can be seen as a strategy to redirect people’s attention from the elections to other events and subject in order to favorize the GOP candidate.
The trolls strategies to modulate the public opinion is now clearer, one question might be ticking your mind: is it effective?
The most natural way to assess the effectiveness of the trolls strategy is to look at the 2016 US Presidential election polls. More precisely: is there any correlation between the troll tweeting activity and the polls rate for both Clinton and Trump?
The FiveThirthyEight data aggregating polls from various pollsters throughout the presidential campaign is used.
The average of all pollster per day shows a higher popularity for Hillary Clinton throughout the campaign, a week before the election day the mean difference was 2.2% in favor for Clinton, which is very accurate as she won the popular vote by 2.1%. However, she lost the election losing the electoral vote by 13.7%. Were the results influenced by the trolls?
To support this hypothesis, the correlation between the number of tweet and the polls is explored for each topics.
Apparently no. The spearman coefficient measures the correlation between two variables: coefficient of 1 or -1 indicates a strong correlation whereas a coefficient of 0 indicates a lack of correlation.
As the coefficient are low for most topics, no dependency is found between the troll tweeting activity and the campaign polls. Albeit small, the correlations are higher for Trump polls, specifically for right troll tweets on subject concerning Clinton (Hacking and Hillary). Interestingly, right troll tweeting about the Black Lives Matter movement have a positive impact on Trump’s polls. This difference supports the theory of more politically engaged right tweets. Anyway, a 0.4 correlation coefficient is too low to confirm any correlation, this is confirmed by the points distribution.
The higher impact observed for right tweets does not mean that the trolls have an influence on the polls, it just shows that they evolves roughly together.
Hence :
Maybe the election were an event too big for the trolls to have a significant and measurable impact. It might also be that most of the voter were already convicted of their votes for and their opinion was already made up.
Do the troll affect Trump’s approval rates?
(Build with data of various pollsters stating the approval and disapproval rate of Trump).
It does not look really stable (between 45.3% and 36.7% of approval between january the 20th and october the 31th). The tweet activity displayed below shows a weakened activity of the left trolls while the Right trolls are still quite active ! The plot shows how the eventful 2017 summer affects the President’s approval rates. The dramatic increase in right trolling while left trolling stagnates suggests that the IRA is trying to boost Trump’s popularity. Since the left trolls do not tweet much in this period, the tweeting activity is probably not caused by events. Hence, the increased activity can be interpreted as an attempt to boost Donald Trump’s popularity online.
The spearman coefficient distribution over topics looks different than for the elections. The correlation between the Left trolls tweets and the Trump approval rates are around 0.4 for all categories. In a similar fashion, the disapproval rate correlation with left trolls is around -0.4. But take a closer look at the number of tweet on the y-axis of the scatter plot. There are only around 20 daily-tweets. Even though the spearman correlation coefficients seem bigger, it is intuitively unlikely that 20 tweets can modulate the public opinion on a national scale.
On the other hand, the right trolls are more active and around 10 times more daily-tweets are observed, but it appears that no dependency exist between the right tweeting activity and the (dis)approval rate Trump as president.
In brief, focusing only on the english right and left trolls accounts, it appears that the right trolls are more involved politically than the left trolls. Not only #MAGA is their favorite hashtag but they also cite preferentially @realdonaldtrump and focus more on events and problematic directly linked to the 2016 US presidential elections. Morevore, they appears to support Trump: their tweeting activity increased drastically when the president’s approval rate went down in summer 2017, probably to push him up.
On the other hand the Left trolls seem to have a more dispersed interest and tweet more about seemingly irrelevant content like music or sport. While it is true that their main subject ( Black Lives Matter, 17.5% of their tweets), it is not directly related to the elections. In consequence, the left trolls are likely trying to distract democrats away from the elections. Indeed, they were absent during the 2017 summer when they could have attacked Trump to further fragilize him. Their absence of reaction suggests the IRA goal was to get Donald Trump elected then to provide support.
Unfortunately for the IRA, their tweeting efforts do not correlate neither with the 2016 election polls, nor with the Trump approval polls. In consequence, their effect on the American politic seems limited and insignificant. On reason may be that the tweets did not reached enough people to weight on the national public opinion: at the highest, around 500’000 persons have been directly reached by the troll tweets, or 0.15% of the US population. Even though these estimations are very imprecise due to the unknown number of followers the user retweeting has, it would have needed a much wider reach to alter the national opinion.