Comparative Study of TF-IDF and Word Embedding in the Classi...

source:

Abstract

Political hoaxes are a form of disinformation that contains government policies or strategies that have not been confirmed as accurate and often do not follow the facts, thus becoming a serious threat to the spread of information globally. The significant risk of spreading false information highlights the need for effective mitigation strategies, which include early detection and classification of news. Additionally, utilizing machine learning as a technological solution is crucial. This study aims to compare the performance of feature extraction from Term Frequency-Inverse Document Frequency (TF-IDF) and Word Embedding (Word2Vec) in classifying political hoax news. This research method employs three machine learning methods: Random Forest (RF), Naïve Bayes (NB), and Support Vector Machine (SVM). The follow-up process in this study aims to enhance the performance of the machine learning algorithm analysis used. The results of the study show that the TF-IDF provides a more stable and accurate classification performance than Word2Vec. The RF model utilizing TF-IDF achieves the highest accuracy of 99%, followed by NB at 98% and SVM at 96%. Meanwhile, Word2Vec also demonstrates high accuracy, with a rate of 94% in RF, and the lowest rates are 93% in SVM and NB. The conclusion of this study shows that TF-IDF has advantages in selecting relevant words with political themes when compared to the word embedding method.

Concepts :

Misinformation and Its Impacts

Educational Methods and Media Use

Information Retrieval and Data Mining

article cite 0 Year 2025 source

Access to Document

10.1109/icoris67789.2025.11295994

SDGs

Peace, Justice and strong institutions

Citations by Year

Year	Count
2025	0