96 / 2023-09-19 16:33:35
Research on Pathological Voice Recognition using Multi-Scale Convolutional Neural Network
pathological voice recognition,multi-scale convolutional neural network,multi-scale feature extraction,speech frames,spectrogram
终稿
ZhiYuan Dai / Soochow University
MingXuan Yan / Soochow University
YuYang Jiang / Soochow University
Xiaoping Pan / Soochow University
XiaoJun Zhang / Soochow University
Zhi Tao / Soochow University
The performance of existing neural network models for pathological voice recognition no longer meets the requirements, since they cannot solve the problem of diversity in pathological voice features, there is an urgent need to find a neural network model with better performance and stronger robustness. The multi-scale convolutional neural network (MSCNN) model possesses the powerful capability of exploring multi-scale convolutional block to extract multi-scale representations for detection, achieving success in various recognition applications. This paper proposes the MSCNN model to pathological voice recognition, utilizing multi-scale convolutional block for feature extraction from spectrograms of voice signals and employing fully connected layers as classifiers. It compares the MSCNN model with a one-dimensional convolutional neural network (1DCNN), a long short-term memory network (LSTM), and a two-dimensional convolutional neural network (2DCNN) as baseline methods. The 1DCNN and LSTM methods process voice signals that have been framed and windowed as input, while the 2DCNN and MSCNN methods take spectrograms of voice signals as input. Experiments are conducted on three different databases: MEEI, SVD, and HUPA. The performance of the models is evaluated using five metrics: accuracy, precision, recall, F1 score, and Matthews correlation coefficient (MCC). The experimental results demonstrate that MSCNN exhibits excellent performance in pathological voice recognition. In the MEEI, SVD, and HUPA databases, it achieves binary classification accuracies of 95.087%, 69.809%, and 75.455%, respectively, surpassing the other three methods. Moreover, it achieves favorable results in precision, recall, F1 score, and Matthews correlation coefficient.
重要日期
  • 会议日期

    11月02日

    2023

    11月04日

    2023

  • 12月15日 2023

    初稿截稿日期

  • 12月20日 2023

    注册截止日期

主办单位
IEEE Instrumentation and Measurement Society
Xidian University
移动端
在手机上打开
小程序
打开微信小程序
客服
扫码或点此咨询