First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs. (arXiv:1408.2873v1 [cs.CL])
We present a method to perform first-pass large vocabulary continuous speech recognition using only a neural network and language model. Deep neural network acoustic models are now commonplace in HMM-based speech recognition systems, but building such systems is a complex, domain-specific task. Recent work demonstrated the feasibility of discarding the HMM sequence modeling framework by directly predicting transcript text from audio. This paper extends this approach in two ways.