Abstract:
With the continuous improvement of computer hardware performance and the development of neural network technology, the DNN (deep neural networks) used for speech recognition have made considerable headway and have outperformed the speech recognition model based on the HMM. The LSTM technology is a variant of the RNN neural network model, and it has been successful in semantic recognition and reading comprehension. In this paper, a bidirectional LSTM model combined with the CTC (connectionist temporal classification) loss function is proposed to verify the accuracy of speech recognition on the open LibriSpeech corpus. The CER (character error rate) on the training data set is reduced to 0.04 and the test data set to 0.19. The model is extended and applied based on the results of the pre-trained speech model to realize the evaluation of English pronunciation at a phonetic level, and the results of the pronunciation test can be given in real time.
Key Words: LSTM, CTC, speech certification, speech recognition