Comparative Study of TF-IDF and Word Embedding in the Classification of Hoax Political News

Authors : Aprilia Dwi Dayani; Hasbullah Hasbullah; Christofer Satria; Victoria Cynthia Rebecca; Anthony Anggrawan et al.
article cite 0 Year 2025
source:
Abstract

Political hoaxes are a form of disinformation that contains government policies or strategies that have not been confirmed as accurate and often do not follow the facts, thus becoming a serious threat to the spread of information globally. The significant risk of spreading false information highlights the need for effective mitigation strategies, which include early detection and classification of news. Additionally, utilizing machine learning as a technological solution is crucial. This study aims to compare the performance of feature extraction from Term Frequency-Inverse Document Frequency (TF-IDF) and Word Embedding (Word2Vec) in classifying political hoax news. This research method employs three machine learning methods: Random Forest (RF), Naïve Bayes (NB), and Support Vector Machine (SVM). The follow-up process in this study aims to enhance the performance of the machine learning algorithm analysis used. The results of the study show that the TF-IDF provides a more stable and accurate classification performance than Word2Vec. The RF model utilizing TF-IDF achieves the highest accuracy of 99%, followed by NB at 98% and SVM at 96%. Meanwhile, Word2Vec also demonstrates high accuracy, with a rate of 94% in RF, and the lowest rates are 93% in SVM and NB. The conclusion of this study shows that TF-IDF has advantages in selecting relevant words with political themes when compared to the word embedding method.


Concepts :
Misinformation and Its Impacts
Educational Methods and Media Use
Information Retrieval and Data Mining
article cite 0 Year 2025 source
SDGs
Peace, Justice and strong institutions
Citations by Year
YearCount
2025 0