• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    

基于集成学习方法的点击率预估模型研究

贺小娟1,潘文捷1,程宏2   

  1. (1.上海对外经贸大学统计与信息学院,上海 201620;2.上海立信会计金融学院统计与数学学院,上海 201209)
  • 收稿日期:2019-03-27 修回日期:2019-08-16 出版日期:2019-12-25 发布日期:2019-12-25
  • 基金资助:

    2016年上海市青年科技英才扬帆计划(16YF1415900);上海立信会计金融学院统计学一级学科建设项目

An advertisement click-through rate prediction
model based on ensemble learning

HE Xiao-juan1,PAN Wen-jie1,CHENG Hong2   

  1. (1.School of Statistics and Information,Shanghai University of International Business and Economics,Shanghai 201620;
    2.School of Statistics and Mathematics,Shanghai Lixin University of Accounting and Finance,Shanghai 201209,China)
  • Received:2019-03-27 Revised:2019-08-16 Online:2019-12-25 Published:2019-12-25

摘要:

由于互联网中积累的广告日志具有数据稀疏、特征量大、正负样本分布极其不均匀等问题,使得人工特征提取费时费力,并且单一预测模型很难得到更好的预测性能。针对这些问题,提出梯度提升树GBDT和Stacking相融合的点击率预测模型GBDT-Stacking。通过引入梯度提升树自动进行特征提取与构造,并结合Stacking集成模型对在线广告点击率进行预测,有效提高了单个预测模型的性能。在真实广告数据集上的实验结果表明,GBDT-Stacking集成模型比对比模型在AUC的取值上至少提升了4%。
 

关键词: 梯度提升树, Stacking集成学习, SMOTE, 广告点击率

Abstract:

Because the accumulated advertisement logs in the Internet have the problems of sparse data, a large number of features and extremely unbalanced distribution of positive and negative samples, manual feature extraction is time-consuming and laborious, and it is difficult for a single prediction model to obtain better prediction performance. In response to these problems, this paper completes a click through rate prediction model based on GBDT model and stacking. This model uses GBDT model to automatically extract and construct features, and predicts and classifies click-through rate by Stacking model, which effectively improves the performance of the single prediction model. Experiments on real advertising data sets show that the GBDT-Stacking ensemble method increases the AUC value by at least 4% compared to the comparison model.
 

Key words: GBDT(gradient boosted decision tree), Stacking ensemble learning, SMOTE(synthetic minority oversampling technique), click-through rate