A Robust Hybrid Deep Learning Model for Multiclass Depression Classification from Speech Audio

Authors : Neny Sulistianingsih; Galih Hendro Martono
article cite 0 Year 2026
source: International Journal of Image Graphics and Signal Processing
Abstract

Depression remains one of the most prevalent and underdiagnosed mental health disorders globally, necessitating scalable, objective, and non-invasive diagnostic tools. Speech, as a rich biomarker of emotional and psychological states, offers a promising avenue for automated depression detection. This study proposes a robust hybrid deep learning framework that integrates Convolutional Neural Networks (CNN), Gated Recurrent Units (GRU), Bidirectional Long Short-Term Memory (BiLSTM), and Transformer architectures to classify depression severity into three levels: normal, mild, and severe. Using a curated multimodal dataset comprising 400 labeled audio recordings, we extract comprehensive acoustic features, including MFCC, Chroma, Spectrogram, Contrast, and Tonnetz representations. Models are evaluated using precision, recall, F1-score, and accuracy. Experimental results show that the proposed hybrid models outperform traditional architectures, achieving up to 99% accuracy and strong generalization across all classes. This study demonstrates the potential of attention-enhanced hybrid architectures in mental health assessment and provides a foundation for future deployment in clinical and real-world settings. Future work includes multimodal fusion with EEG data and the implementation of explainable AI for clinical interpretability.


Concepts :
EEG and Brain-Computer Interfaces
Emotion and Mood Recognition
Mental Health via Writing
article cite 0 Year 2026 source International Journal of Image Graphics and Signal Processing
Citations by Year
YearCount
2026 0