Applications of Support Vector Machines on Smart Phone Systems for Emotional Speech Recognition

Main Authors: Wernhuar Tarng, Yuan-Yuan Chen, Chien-Lung Li, Kun-Rong Hsie, Mingteh Chen
Format: Article Journal
Bahasa: eng
Terbitan: , 2010
Subjects:
Online Access: https://zenodo.org/record/1072525
ctrlnum 1072525
fullrecord <?xml version="1.0"?> <dc schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><creator>Wernhuar Tarng</creator><creator>Yuan-Yuan Chen</creator><creator>Chien-Lung Li</creator><creator>Kun-Rong Hsie</creator><creator>Mingteh Chen</creator><date>2010-12-24</date><description>An emotional speech recognition system for the applications on smart phones was proposed in this study to combine with 3G mobile communications and social networks to provide users and their groups with more interaction and care. This study developed a mechanism using the support vector machines (SVM) to recognize the emotions of speech such as happiness, anger, sadness and normal. The mechanism uses a hierarchical classifier to adjust the weights of acoustic features and divides various parameters into the categories of energy and frequency for training. In this study, 28 commonly used acoustic features including pitch and volume were proposed for training. In addition, a time-frequency parameter obtained by continuous wavelet transforms was also used to identify the accent and intonation in a sentence during the recognition process. The Berlin Database of Emotional Speech was used by dividing the speech into male and female data sets for training. According to the experimental results, the accuracies of male and female test sets were increased by 4.6% and 5.2% respectively after using the time-frequency parameter for classifying happy and angry emotions. For the classification of all emotions, the average accuracy, including male and female data, was 63.5% for the test set and 90.9% for the whole data set.</description><identifier>https://zenodo.org/record/1072525</identifier><identifier>10.5281/zenodo.1072525</identifier><identifier>oai:zenodo.org:1072525</identifier><language>eng</language><relation>doi:10.5281/zenodo.1072524</relation><relation>url:https://zenodo.org/communities/waset</relation><rights>info:eu-repo/semantics/openAccess</rights><rights>https://creativecommons.org/licenses/by/4.0/legalcode</rights><subject>Smart phones</subject><subject>emotional speech recognition</subject><subject>socialnetworks</subject><subject>support vector machines</subject><subject>time-frequency parameter</subject><subject>Mel-scale frequency cepstral coefficients (MFCC).</subject><title>Applications of Support Vector Machines on Smart Phone Systems for Emotional Speech Recognition</title><type>Journal:Article</type><type>Journal:Article</type><recordID>1072525</recordID></dc>
language eng
format Journal:Article
Journal
Journal:Journal
author Wernhuar Tarng
Yuan-Yuan Chen
Chien-Lung Li
Kun-Rong Hsie
Mingteh Chen
title Applications of Support Vector Machines on Smart Phone Systems for Emotional Speech Recognition
publishDate 2010
topic Smart phones
emotional speech recognition
socialnetworks
support vector machines
time-frequency parameter
Mel-scale frequency cepstral coefficients (MFCC)
url https://zenodo.org/record/1072525
contents An emotional speech recognition system for the applications on smart phones was proposed in this study to combine with 3G mobile communications and social networks to provide users and their groups with more interaction and care. This study developed a mechanism using the support vector machines (SVM) to recognize the emotions of speech such as happiness, anger, sadness and normal. The mechanism uses a hierarchical classifier to adjust the weights of acoustic features and divides various parameters into the categories of energy and frequency for training. In this study, 28 commonly used acoustic features including pitch and volume were proposed for training. In addition, a time-frequency parameter obtained by continuous wavelet transforms was also used to identify the accent and intonation in a sentence during the recognition process. The Berlin Database of Emotional Speech was used by dividing the speech into male and female data sets for training. According to the experimental results, the accuracies of male and female test sets were increased by 4.6% and 5.2% respectively after using the time-frequency parameter for classifying happy and angry emotions. For the classification of all emotions, the average accuracy, including male and female data, was 63.5% for the test set and 90.9% for the whole data set.
id IOS16997.1072525
institution ZAIN Publications
institution_id 7213
institution_type library:special
library
library Cognizance Journal of Multidisciplinary Studies
library_id 5267
collection Cognizance Journal of Multidisciplinary Studies
repository_id 16997
subject_area Multidisciplinary
city Stockholm
province INTERNASIONAL
shared_to_ipusnas_str 1
repoId IOS16997
first_indexed 2022-06-06T06:40:15Z
last_indexed 2022-06-06T06:40:15Z
recordtype dc
_version_ 1734909718773628928
score 17.538404