必威(betway·官方网站)西汉姆联-EURO CUP

Online System

Reviewer Login

Editor Login

Author Login

Download

Statement of Competing Interests

Authors Contribution Form

Online Journal

Advanced search

Special subject

Current Issue

Previous Issue

Introduction

Bimonthly, started in 1957
Administrator
Shanxi Provincial Education Department
Sponsor
Taiyuan University of Technology
Publisher
Ed. Office of Journal of TYUT
Editor-in-Chief
SUN Hongbin
ISSN: 1007-9432
CN: 14-1220/N

Links

Shanxi Provincial Education Department

Taiyuan University of Technology

location: home > paper >

References:

LI Baoyun ZHANG Xueying LI Juan HUANG Lixia CHEN Guijun SUN Ying.Speech Emotion Recognition Based on Multi-task Deep Feature Extraction and MKPCA Feature Fusion[J].Taiyuan University of technology,2023,54(05):782-788

PDFdownloadsize：2.03MBviewed：download：

Speech Emotion Recognition Based on Multi-task Deep Feature Extraction and MKPCA Feature Fusion

DOI:

10.16355/j.tyut.1007-9432.2023.05.004

Received:

2022-03-04

Accepted:

2022-04-10

Corresponding author		Institute
ZHANG Xueying		College of Information and Computer,Taiyuan University of Technology

abstract:

【Purposes】 Speech emotion recognition allows computers to understand the emotional information contained in human speech, and is an important part of intelligent human-computer interaction. Feature extraction and fusion are key parts in speech emotion recognition systems, and have an important impact on recognition results. Aiming at the problem of insufficient emotional information contained in traditional acoustic features, a deep feature extraction method based on multi-task learning for optimization of acoustic features is proposed in this paper. 【Methods】 The proposed acoustic depth feature can better characterize itself and has more emotional information. Then, on the basis of the complementarity between acoustic features and spectrogram features, spectrogram features through convolutional neural network are extracted. Then, the multi-kernel principal component analysis method is used to perform feature fusion and dimension reduction on these two features, and the obtained fusion features can effectively improve the system recognition performance. 【Findings】 Experiments are carried out on the EMODB and the CASIA speech databases. When the DNN classifier is used, the multi-kernel fusion feature of the acoustic depth feature and the spectrogram feature achieve the highest recognition rates of 92.71% and 88.25%, respectively. Compared with direct feature splicing, this method increased the recognition rate by 2.43% and 2.83%, respectively.

Keywords:

speech emotion recognition; multi-task learning; acoustic depth features; spectrogram features; multi-kernel principal component analysis;