Applications of Support Vector Machines on Smart Phone Systems for Emotional Speech Recognition
Main Authors: | Wernhuar Tarng, Yuan-Yuan Chen, Chien-Lung Li, Kun-Rong Hsie, Mingteh Chen |
---|---|
Format: | Article Journal |
Bahasa: | eng |
Terbitan: |
, 2010
|
Subjects: | |
Online Access: |
https://zenodo.org/record/1072525 |
ctrlnum |
1072525 |
---|---|
fullrecord |
<?xml version="1.0"?>
<dc schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><creator>Wernhuar Tarng</creator><creator>Yuan-Yuan Chen</creator><creator>Chien-Lung Li</creator><creator>Kun-Rong Hsie</creator><creator>Mingteh Chen</creator><date>2010-12-24</date><description>An emotional speech recognition system for the
applications on smart phones was proposed in this study to combine
with 3G mobile communications and social networks to provide users
and their groups with more interaction and care. This study developed
a mechanism using the support vector machines (SVM) to recognize
the emotions of speech such as happiness, anger, sadness and normal.
The mechanism uses a hierarchical classifier to adjust the weights of
acoustic features and divides various parameters into the categories of
energy and frequency for training. In this study, 28 commonly used
acoustic features including pitch and volume were proposed for
training. In addition, a time-frequency parameter obtained by
continuous wavelet transforms was also used to identify the accent and
intonation in a sentence during the recognition process. The Berlin
Database of Emotional Speech was used by dividing the speech into
male and female data sets for training. According to the experimental
results, the accuracies of male and female test sets were increased by
4.6% and 5.2% respectively after using the time-frequency parameter
for classifying happy and angry emotions. For the classification of all
emotions, the average accuracy, including male and female data, was
63.5% for the test set and 90.9% for the whole data set.</description><identifier>https://zenodo.org/record/1072525</identifier><identifier>10.5281/zenodo.1072525</identifier><identifier>oai:zenodo.org:1072525</identifier><language>eng</language><relation>doi:10.5281/zenodo.1072524</relation><relation>url:https://zenodo.org/communities/waset</relation><rights>info:eu-repo/semantics/openAccess</rights><rights>https://creativecommons.org/licenses/by/4.0/legalcode</rights><subject>Smart phones</subject><subject>emotional speech recognition</subject><subject>socialnetworks</subject><subject>support vector machines</subject><subject>time-frequency parameter</subject><subject>Mel-scale frequency cepstral coefficients (MFCC).</subject><title>Applications of Support Vector Machines on Smart Phone Systems for Emotional Speech Recognition</title><type>Journal:Article</type><type>Journal:Article</type><recordID>1072525</recordID></dc>
|
language |
eng |
format |
Journal:Article Journal Journal:Journal |
author |
Wernhuar Tarng Yuan-Yuan Chen Chien-Lung Li Kun-Rong Hsie Mingteh Chen |
title |
Applications of Support Vector Machines on Smart Phone Systems for Emotional Speech Recognition |
publishDate |
2010 |
topic |
Smart phones emotional speech recognition socialnetworks support vector machines time-frequency parameter Mel-scale frequency cepstral coefficients (MFCC) |
url |
https://zenodo.org/record/1072525 |
contents |
An emotional speech recognition system for the
applications on smart phones was proposed in this study to combine
with 3G mobile communications and social networks to provide users
and their groups with more interaction and care. This study developed
a mechanism using the support vector machines (SVM) to recognize
the emotions of speech such as happiness, anger, sadness and normal.
The mechanism uses a hierarchical classifier to adjust the weights of
acoustic features and divides various parameters into the categories of
energy and frequency for training. In this study, 28 commonly used
acoustic features including pitch and volume were proposed for
training. In addition, a time-frequency parameter obtained by
continuous wavelet transforms was also used to identify the accent and
intonation in a sentence during the recognition process. The Berlin
Database of Emotional Speech was used by dividing the speech into
male and female data sets for training. According to the experimental
results, the accuracies of male and female test sets were increased by
4.6% and 5.2% respectively after using the time-frequency parameter
for classifying happy and angry emotions. For the classification of all
emotions, the average accuracy, including male and female data, was
63.5% for the test set and 90.9% for the whole data set. |
id |
IOS16997.1072525 |
institution |
ZAIN Publications |
institution_id |
7213 |
institution_type |
library:special library |
library |
Cognizance Journal of Multidisciplinary Studies |
library_id |
5267 |
collection |
Cognizance Journal of Multidisciplinary Studies |
repository_id |
16997 |
subject_area |
Multidisciplinary |
city |
Stockholm |
province |
INTERNASIONAL |
shared_to_ipusnas_str |
1 |
repoId |
IOS16997 |
first_indexed |
2022-06-06T06:40:15Z |
last_indexed |
2022-06-06T06:40:15Z |
recordtype |
dc |
_version_ |
1734909718773628928 |
score |
17.538404 |