Evaluation of Automatic Speech Recognition Approaches

Main Authors: Vasconcelos, Daniel J R, Silva, Ticiana Linhares Coelho da, Cruz, Lívia Almada, Magalhães, Regis Pires, Fernandes, Guilherme Sales, Sampaio, Matheus Xavier
Format: Article Journal
Bahasa: por
Terbitan: , 2022
Subjects:
ASR
API
Online Access: https://zenodo.org/record/6077607
Daftar Isi:
  • Automatic Speech Recognition (ASR) is an essential task for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. Due to the increasing popularity of these applications and the advances in deep learning models for transcribing speech into text, this work aims to evaluate the performance of commercial solutions for ASR that use deep learning models, such as Facebook Wit.ai, Microsoft Azure Speech, Google Cloud Speech-to-Text, Wav2Vec and AWS Transcribe. We performed the experiments with two real and public datasets, the Mozilla Common Voice and the Voxforge. The results demonstrate that the evaluated solutions slightly differ. However, Facebook Wit.ai outperforms the other analyzed approaches for the quality metrics collected like WER, BLEU, and METEOR. We also experiment to fine-tune Jasper Neural Network for ASR with four datasets different with no intersection to the ones we collect the quality metrics. We study the performance of the Jasper model for the two public datasets, comparing its results with the other pre-trained models.
  • In this version we remove overlapping files that were knowingly used in training.