AAAI 1998
Speech Recognition with Dynamic Bayesian Networks
Abstract
DynamicBayesian networks (DBNs) are a useful tool for representing complexstochastic processes. Recent developments in inference and learning in DBNsallow their use in real-world applications. In this paper, we apply DBNs to the problem of speech recognition. The factored state representation enabled by DBNsallows us to explicitly represent long-term articulatory and acoustic context in addition to the phonetic-state information maintained by hidden Markov models (HMMs). Furthermore, it enables us to modelthe short-term correlations amongmultiple observation streams within single time-frames. Given a DBN structure capable of representing these long- and short-term correlations, we applied the EMalgorithm to learn models with up to 500, 000 parameters. The use of structured DBN models decreased the error rate by 12 to 29% on a large-vocabulary isolated-word recognition task, compared to a discrete HMM; it also improvedsignificantly on other published results for the same task. This is the first successful application of DBNs to a largescale speech recognition problem. Investigation of the learned modelsindicates that the hidden state variables are strongly correlated with acoustic properties of the speech signal.
Authors
Keywords
No keywords are indexed for this paper.
Context
- Venue
- AAAI Conference on Artificial Intelligence
- Archive span
- 1980-2026
- Indexed papers
- 28718
- Paper id
- 224115494842019392