• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (12): 2226-2236.

• Artificial Intelligence and Data Mining • Previous Articles     Next Articles

Bi-modal music genre classification model MGTN based on convolutional attention mechanism

JIAO Jia-hui1,2,MA Si-yuan1,2,SONG Yu2,SONG Wei1   

  1. (1.Henan Academy of Big Data,Zhengzhou University,Zhengzhou 450052;
    2.School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,China)
  • Received:2022-08-12 Revised:2022-11-14 Accepted:2023-12-25 Online:2023-12-25 Published:2023-12-14

Abstract: In the field of music information retrieval (MIR), classification according to music genres is a challenging task. Traditional audio feature engineering methods requires manually selecting and extracting music signal features for processing, resulting in complex feature extraction process, unstable model performance and poor generalization. The method combining deep learning with spectrogram also has some problems such as unsuitable model for some data and difficulty in global feature extraction. This paper proposes a music genre classification model based on convolutional attention mechanism, called MGTN. MGTN combines two music genre classification methods: input spectrogram and audio signal feature extraction, to construct audio time series data, which greatly improves the model's ability to extract features and generalization, and provides a new idea for music genre classification. Experimental results on GTZAN and Ballroom datasets show that the MGTN model can effectively fuse input data from two different modalities. Compared with dozens of benchmark models, the MGTN model has strong advantages.


Key words: music genre classification, Transformer model, spectrogram, audio feature engineering, attention mechanism