基于多任务深度特征提取及MKPCA特征融合的语音情感识别

Title

Speech Emotion Recognition Based on Multi-task Deep Feature Extraction and MKPCA Feature Fusion
作者

李宝芸张雪英李娟黄丽霞陈桂军孙颖
Author

LI Baoyun;ZHANG Xueying;LI Juan;HUANG Lixia;CHEN Guijun;SUN Ying
单位

太原理工大学信息与计算机学院
Organization

College of Information and Computer, Taiyuan University of Technology
摘要

【目的】针对传统声学特征所含情感信息不足的问题,提出一种基于多任务学习的深度特征提取模型优化声学特征,所提声学深度特征既能更好表征自身又拥有更多情感信息。【方法】基于声学特征与语谱图特征之间的互补性,首先通过卷积神经网络提取语谱图特征,然后使用多核主成分分析方法对这两个特征进行特征融合降维,所得融合特征可有效提升系统识别性能。【结果】在EMODB语音库与CASIA语音库上进行实验验证,当采用DNN分类器时,声学深度特征与语谱图特征的多核融合特征取得最高识别率为92.71%、88.25%,相比直接拼接特征,识别率分别提升2.43%、2.83%.
Abstract

【Purposes】 Speech emotion recognition allows computers to understand the emo-tional information contained in human speech, and is an important part of intelligent human-com-puter interaction. Feature extraction and fusion are key parts in speech emotion recognition sys-tems, and have an important impact on recognition results. Aiming at the problem of insufficient emotional information contained in traditional acoustic features, a deep feature extraction method based on multi-task learning for optimization of acoustic features is proposed in this paper. 【Methods】 The proposed acoustic depth feature can better characterize itself and has more emo-tional information. Then, on the basis of the complementarity between acoustic features and spectrogram features, spectrogram features through convolutional neural network are extracted. Then, the multi-kernel principal component analysis method is used to perform feature fusion and dimension reduction on these two features, and the obtained fusion features can effectively improve the system recognition performance. 【Findings】 Experiments are carried out on the EMODB and the CASIA speech databases. When the DNN classifier is used, the multi-kernel fu-sion feature of the acoustic depth feature and the spectrogram feature achieve the highest recogni-tion rates of 92.71% and 88.25%, respectively. Compared with direct feature splicing, this method increased the recognition rate by 2.43% and 2.83%, respectively.
关键词

语音情感识别多任务学习声学深度特征语谱图特征多核主成分分析
KeyWords

speech emotion recognition; multi-task learning; acoustic depth features; spectro-gram features; multi-kernel principal component analysis
基金项目(Foundation)

国家自然科学基金资助项目(61371193);山西省回国留学人员科研资助项目(HGKY2019025)
DOI

10.16355/j.tyut.1007-9432.2023.05.004
引用格式

李宝芸,张雪英,李娟,等.基于多任务深度特征提取及 MKPCA 特征融合的语音情感识别[J].太原理工大学学报,2023,54(5):782-788.
Citation

LI Baoyun,ZHANG Xueying,LI Juan,et al.Speech emotion recognition based on multi-task deep feature extrac-tion and MKPCA feature fusion[J].Journal of Taiyuan University of Technology,2023,54(5):782-788.
相关文章

[1]语音情感识别的关键技术
[2]基于主辅网络特征融合的语音情感识别