PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA SPEECH RECOGNITION

Main Author: Nahyan Al Mahmud
Format: Article Journal
Terbitan: , 2020
Online Access: https://zenodo.org/record/4035813
Daftar Isi:
  • ABSTRACT In this work a new Bangla speech corpus along with proper transcriptions has been developed; also various acoustic feature extraction methods have been investigated using Long Short-Term Memory (LSTM) neural network to find their effective integration into a state-of-the-art Bangla speech recognition system. The acoustic features are usually a sequence of representative vectors that are extracted from speech signals and the classes are either words or sub word units such as phonemes. The most commonly used feature extraction method, known as linear predictive coding (LPC), has been used first in this work. Then the other two popular methods, namely, the Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) have also been applied. These methods are based on the models of the human auditory system. A detailed review of the implementation of these methods have been described first. Then the steps of the implementation have been elaborated for the development of an automatic speech recognition system (ASR) for Bangla speech. KEYWORDS Mel frequency cepstral coefficients, linear predictive coding, perceptual linear prediction, sentence correct rates, LSTM