Komparasi Metode Perhitungan Jarak K-Means Paling Baik Terhadap Pembentukan Pola Kunjungan Wisatawan Mancanegara

Authors : Lalu Mutawalli; Supardianto Supardianto; Sofiansyah Fadli
article cite 1 Year 2023
source: Journal of Information System Research (JOSH)
Abstract

Understanding patterns among foreign tourists is an urgent matter. These patterns can become knowledge that helps in making better decisions because they are data-driven. The pattern to be elaborated on is regarding the clustering of visits by foreign tourists to tourist destinations in Jakarta. Data mining is an approach that extracts knowledge patterns from a dataset. K-Means is one of the data mining algorithms used for clustering data, where data is grouped based on similarity in features and attributes. This study compares the Euclidean Distance, Manhattan Distance, and Haversine Distance methods to obtain more representative data clusters for the datasets. The datasets in this study are not normally distributed due to outlier data; hence, the DBSCAN algorithm is used for improvement without removing or cutting the data, as it can result in a significant amount of missing values that could affect information that does not align with empirical facts. In this study, 5 clusters were created based on elbow calculation results. The K-Means cluster testing in Euclidean distance yielded a Silhouette Score of 0.36, Inertia of 0.86, and Davies-Bouldin Index of 2.39. The Manhattan method resulted in a Silhouette Score of 0.65, Inertia of 1.46, and Davies-Bouldin Index of 0.47. Meanwhile, applying the Haversine method resulted in a Silhouette Score of 0.36, Inertia of 0.03, and a value of 2.39 for the Davies-Bouldin Index.


Concepts :
Data Mining and Machine Learning Applications
Multimedia Learning Systems
Information Retrieval and Data Mining
article cite 1 Year 2023 source Journal of Information System Research (JOSH)
SDGs
Decent work and economic growth
Citations by Year
YearCount
2023 1