Marathi Speech Database Standardization: A Review and Work
Main Authors: | Sonal A.Tiwari,, Rajashri G. Kanke, Maheshwari A. Ambewadikar, Manasi R. Baheti |
---|---|
Format: | Article Journal |
Bahasa: | eng |
Terbitan: |
, 2021
|
Subjects: | |
Online Access: |
https://zenodo.org/record/5501910 |
Daftar Isi:
- Abstract---Automatic Speech Recognition System (ASR) is helpful for interaction between human and machine. It is the way to operate computer and mobile phones through speech only, without taking such extra efforts. The term corpus is used for Standardized Database, which contains a collection of audio recordings of spoken language with its annotations and documents. When existing literature was reviewed, it was observed that much literature is available on how to create speech databases. But few literatures are available about the standardization. Such work is done for the languages other than Indian languages. But for the Hindi, Marathi etc., standardization for the speech datasets is not up to the mark. The main problem in designing of a speech database is to deal with variability of speech. In recent years, there is much need to develop speech corpora for training and testing materials to be used for wide range of applications of speech technology like Linguistic Consortium, Speech interfaces development and language models etc. If it is standardized in regional languages, it will certainly contribute in many applications and research. In future, we would like to work to find standard way to standardized speech databases so with the help of this we can retrieve data easily and more efficiently. Keywords- ASR, Corpus, Speech Database, Standardization, Annotation