• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (09): 1554-1565.

• High Performance Computing • Previous Articles     Next Articles

Research on key technologies of distributed training for Level2 market quotation factor mining

ZHAO Xin-bo1,2,3,LU Zhong-hua2   

  1. (1.School of Public Security Information Technology and Intelligence,
    Criminal Investigation Police University of China,Shenyang 110854;
    2.Computer Network Information Center,Chinese Academy of Sciences,Beijing 100083;
    3.University of Chinese Academy of Sciences,Beijing 100049,China) 
  • Received:2023-10-30 Revised:2023-12-12 Accepted:2024-09-25 Online:2024-09-25 Published:2024-09-19

Abstract: Level2 market quotation data is the new generation of real-time market data products from the Shanghai and Shenzhen Stock Exchanges. Serving as an enhanced version of basic market data, it currently has the highest information density, the greatest amount of information, and the most insufficient mining in China. The data is of significant value in identifying potential risks in the securities market, but existing research lacks risk measurement and analysis based on it. Moreover, the scale of Level2 market quotation data in the entire market is large, and the deep learning models used to extract information are becoming increasingly complex. Although hardware computing power is constantly developing and improving, it still cannot solve problems such as long training time and low efficiency. Therefore, based on Level2 market quotation data of CSI 300, deep learning and other methods are used to mine high-frequency volatility factors, and builds a high-frequency volatility prediction model based on TabNet and LightGBM. At the same time, a distributed training algorithm Parallel_DE based on parallel differential evolution is proposed for parameter calculation in the process of model distributed training, its scene mapping scheme and overall process design are elaborated. The above two work are fully verified based on the proposed distributed training platform. The experimental results show that the high-frequency volatility prediction model can predict the realized volatility with high precision, and the effect has certain advantages compared with other methods; the Parallel_DE algorithm can effectively reduce the error of local parameters on the test set while retaining the diversity of parameters to a certain extent, so as to efficiently and distributedly train a deep learning model with excellent performance. This paper provides valuable technologies and methodologies for leveraging Level2 market quotation data in risk identification within the securities market.


Key words: Level2 market quotation, realized volatility, distributed training, differential evolution