Abstract
As software systems become increasingly complex, detecting bugs in source code has become a critical challenge in software development and maintenance. However, manual debugging is time-consuming and error-prone, prompting the need for automated bug detection solutions. The research explores the use of machine learning models, specifically, Random Forest and Neural Network, for identifying bugs in Python source code. Features are extracted using Abstract Syntax Trees (ASTs), which enable the structured parsing of syntactic elements such as functions, classes, variables, conditionals, and exception blocks. These features serve as input to train both models for binary classification: distinguishing between buggy and non-buggy code files. Both buggy and non-buggy code files have 200 Python scripts. The models are evaluated using accuracy, confusion matrices, Receiver Operating Characteristic (ROC) curves, and classification reports across multiple training epochs. Experimental results show that the Random Forest model achieves stable performance with an accuracy of 86.67% and an Area Under the Curve (AUC) score of 0.97 on the testing set, without significant improvement across epochs. In contrast, the Neural Network demonstrates gradual accuracy improvement from 68.33% at epoch 5 to 85% at epoch 300, along with higher sensitivity in bug detection, although it requires longer training times. Additionally, both models are used to predict specific lines of code containing potential bugs. Based on these findings, the choice of model depends on the application context. Random Forest offers faster deployment and consistent performance, while Neural Networks provide better adaptability to complex patterns and improved accuracy with sufficient training.
Concepts :
Citations by Year
| Year | Count |
|---|---|
| 2025 | 0 |