Bi-modal music genre classification model MGTN based on convolutional attention mechanism

Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (12): 2226-2236.

• Artificial Intelligence and Data Mining • Previous Articles Next Articles

Bi-modal music genre classification model MGTN based on convolutional attention mechanism

JIAO Jia-hui1,2,MA Si-yuan1,2,SONG Yu2,SONG Wei1

(1.Henan Academy of Big Data,Zhengzhou University,Zhengzhou 450052;
2.School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,China)

Received:2022-08-12 Revised:2022-11-14 Accepted:2023-12-25 Online:2023-12-25 Published:2023-12-14

Abstract

Abstract: In the field of music information retrieval (MIR), classification according to music genres is a challenging task. Traditional audio feature engineering methods requires manually selecting and extracting music signal features for processing, resulting in complex feature extraction process, unstable model performance and poor generalization. The method combining deep learning with spectrogram also has some problems such as unsuitable model for some data and difficulty in global feature extraction. This paper proposes a music genre classification model based on convolutional attention mechanism, called MGTN. MGTN combines two music genre classification methods: input spectrogram and audio signal feature extraction, to construct audio time series data, which greatly improves the model's ability to extract features and generalization, and provides a new idea for music genre classification. Experimental results on GTZAN and Ballroom datasets show that the MGTN model can effectively fuse input data from two different modalities. Compared with dozens of benchmark models, the MGTN model has strong advantages.

Key words: music genre classification, Transformer model, spectrogram, audio feature engineering, attention mechanism

JIAO Jia-hui, MA Si-yuan, SONG Yu, SONG Wei. Bi-modal music genre classification model MGTN based on convolutional attention mechanism[J]. Computer Engineering & Science, 2023, 45(12): 2226-2236.

[1]	LIU Guo-qi, HE Ting-nian, RONG Yi-xuan, LI Zhuo-ran . A point of interest recommendation model based on tracks and friend relationship of users [J]. Computer Engineering & Science, 2024, 46(09): 1693-1701.
[2]	LIU Xiao-hua, XU Ru-zhi, YANG Cheng-yue. A Chinese named entity recognition model based on multi-feature fusion embedding#br# [J]. Computer Engineering & Science, 2024, 46(08): 1473-1481.
[3]	WANG Ze-yu, XU Hui-ying, ZHU Xin-zhong, LI Chen, LIU Zi-yang, WANG Zi-yi. An improved dense pedestrian detection algorithm based on YOLOv8: MER-YOLO [J]. Computer Engineering & Science, 2024, 46(06): 1050-1062.
[4]	CAO Hao-dong, WANG Hai-tao, HE Jian-fen. Date-aware sequential recommendation fusing local information of sequences [J]. Computer Engineering & Science, 2024, 46(04): 734-742.
[5]	LIANG Xiu-man, ZHOU Jia-run, YANG Ruo-lan. LPD-YOLO:Lightweight obscured pedestrian detection model [J]. Computer Engineering & Science, 2023, 45(12): 2197-2205.
[6]	JIA Kang, LI Xiao-nan, LI Guan-yu. A graph similarity computation model based on adaptive structure aware pooling graph matching [J]. Computer Engineering & Science, 2023, 45(11): 1999-2007.
[7]	YIN Chun-yong, FENG Meng-xue. A semi-supervised log anomaly detection method based on attention mechanism [J]. Computer Engineering & Science, 2023, 45(08): 1405-1415.
[8]	YU Zi-cheng, LING Jie. A DGA domain name detection method based on Transformer and multi-feature fusion [J]. Computer Engineering & Science, 2023, 45(08): 1416-1423.
[9]	WANG Jian, JIANG Lin, WANG Lin-qin, YU Zheng-tao, ZHANG Song, GAO Sheng-xiang, . A low-resource Lao text regularization task based on BiLSTM [J]. Computer Engineering & Science, 2023, 45(07): 1292-1299.
[10]	PU Zi-jun, ZHANG Shou-ming. A sound event localization and detection algorithm based on feature fusion and Transformer model [J]. Computer Engineering & Science, 2023, 45(06): 1097-1105.
[11]	WANG Yang, CHEN Zhi-bin. A dynamic graph transformer model for solving CVRP [J]. Computer Engineering & Science, 2023, 45(05): 859-868.
[12]	YUAN Ye, LIAO Wei. A text similarity calculation method based on multiple related information interaction [J]. Computer Engineering & Science, 2022, 44(07): 1313-1320.
[13]	ZHANG Yu-jie, ZHANG Zan. Application of DenseNet in voiceprint recognition [J]. Computer Engineering & Science, 2022, 44(01): 132-137.
[14]	WU Xiang-ning, HE Peng, DENG Zhong-gang, LI Jia-qi, WANG Wen, CHEN Miao. A deep learning model of small object detection based on attention mechanism [J]. Computer Engineering & Science, 2021, 43(01): 95-104.
[15]	ZHANG Xin,CHENG Hua,FANG Yi-quan. A DGA domain name detection method based on Transformer [J]. Computer Engineering & Science, 2020, 42(03): 411-417.

Bi-modal music genre classification model MGTN based on convolutional attention mechanism

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles 0

Metrics

Comments