Authors
Tianye Jian, Yizhun Peng*, Wanlong Peng, Zhou Yang
College of Electronic Information and Automation, Tianjin University of
Science and Technology, Tianjin 300222, China
*Corresponding author. Email: [email protected]
Corresponding Author
Yizhun Peng
Received 16 November 2020, Accepted 28 July 2021, Available Online 9 October
2021.
DOI
https://doi.org/10.2991/jrnal.k.210922.013
Keywords
Children’s emotional; time–sequence relationship; frame-level speech feature;
deep attention gate; long- and short-term memory (LSTM)
Abstract
According to the different emotional needs of infants, the effective acquisition
of frame-level speech features is realized, and the infant speech emotion
recognition model based on the improved Long- and Short-Term Memory (LSTM)
network is established. The frame-level speech features are used instead
of the traditional statistical features to preserve the temporal relationships
in the original speech, and the traditional forgetting and input gates
are transformed into attention gates by introducing an attention mechanism,
to improve the performance of speech emotion recognition, the depth attention
gate is calculated according to the self-defined depth strategy. The results
show that, in Fau Aibo Children’s emotional data corpus and baby crying
emotional needs database, compared with the traditional LSTM based model,
the recall rate and F1-score of this model are 3.14%, 5.50%, 1.84% and
5.49% higher, respectively, compared with the traditional model based on
LSTM and gated recurrent unit, the training time is shorter and the speech
emotion recognition rate of baby is higher.
Copyright
© 2021 The Authors. Published by ALife Robotics Corp. Ltd.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license
(http://creativecommons.org/licenses/by-nc/4.0/