Election result forecast based on social media data with the deep machine learning method

İbrahim Sabuncu
Assist. Prof. Dr., Yalova University, Yalova, Türkiye
Eda Şen
Student, Yalova University, Yalova, Türkiye

Published 2021-12-25

This study aims to research the predictability of the daily variation of the vote rates of politicians and the election result by using social media data. For this purpose, 20,746,834 tweets shared between 01.07.2020 - 03.11.2020 about the candidates participating in the U.S.A. election on November 3 2020, were collected from the Twitter platform using the RapidMiner program. Sentiment analyzes were made on the data collected from Twitter by the Vader algorithm. Tweets are grouped into positive, negative, N.P.S. (positive-negative), and neutral sentiment categories. Six different machine learning-based forecast models were created to predict the daily vote rates and the election result using the number of tweets divided into sentiment categories. In forecast models, the independent variables are daily Twitter data about candidates grouped by sentiment categories. The dependent variables are the daily vote rate estimates of the candidates based on surveys and economic indicators. Forecast models are trained with 109 days of data. Using the Deep Machine Learning algorithm, the forecast model that gave the most accurate result, the election result could be predicted with a margin of error of 1.7%. This study shows that despite the wide variety of manipulations on Twitter, Twitter can still be a data source that can be used to monitor political trends and predict election results through machine learning.


