A Robust Hybrid Deep Learning Model for Multiclass Depressio...

source: International Journal of Image Graphics and Signal Processing

Abstract

Depression remains one of the most prevalent and underdiagnosed mental health disorders globally, necessitating scalable, objective, and non-invasive diagnostic tools. Speech, as a rich biomarker of emotional and psychological states, offers a promising avenue for automated depression detection. This study proposes a robust hybrid deep learning framework that integrates Convolutional Neural Networks (CNN), Gated Recurrent Units (GRU), Bidirectional Long Short-Term Memory (BiLSTM), and Transformer architectures to classify depression severity into three levels: normal, mild, and severe. Using a curated multimodal dataset comprising 400 labeled audio recordings, we extract comprehensive acoustic features, including MFCC, Chroma, Spectrogram, Contrast, and Tonnetz representations. Models are evaluated using precision, recall, F1-score, and accuracy. Experimental results show that the proposed hybrid models outperform traditional architectures, achieving up to 99% accuracy and strong generalization across all classes. This study demonstrates the potential of attention-enhanced hybrid architectures in mental health assessment and provides a foundation for future deployment in clinical and real-world settings. Future work includes multimodal fusion with EEG data and the implementation of explainable AI for clinical interpretability.

Concepts :

EEG and Brain-Computer Interfaces

Emotion and Mood Recognition

Mental Health via Writing

article cite 0 Year 2026 source International Journal of Image Graphics and Signal Processing

Access to Document

10.5815/ijigsp.2026.02.08 Open Access Link

Citations by Year

Year	Count
2026	0