基于多模型融合的流失用户预测方法

计算机工程与科学

基于多模型融合的流失用户预测方法

叶成，郑红，程云辉

（华东理工大学信息科学与工程学院，上海 200237）

收稿日期:2019-06-15 修回日期:2019-08-11 出版日期:2019-11-25 发布日期:2019-11-25
基金资助:
国家自然科学基金（61103115，61103172）；上海市科委科技创新行动计划高新技术领域项目（16511101000）

A user churn prediction method

based on multi-model fusion

YE Cheng,ZHENG Hong,CHENG Yun-hui

(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)

Received:2019-06-15 Revised:2019-08-11 Online:2019-11-25 Published:2019-11-25

摘要/Abstract

摘要：

准确的用户流失预测能力有助于企业提高用户保持率、增加用户数量和增加盈利。现有的流失用户预测模型大多为单一模型或是多个模型的简单融合，没有充分发挥多模型集成的优势。借鉴了随机森林的Bootstrap Sampling的思想，提出了一种改进的Stacking集成方法，并将该方法应用到了真实数据集上进行流失用户的预测。通过验证集上的实验比较可知，提出的方法在流失用户F1值、召回率和预测准确率3项指标上均好于所有相同结构的经典Stacking集成方法；当采用恰当的集成结构时，其表现可超越基分类器上的最优表现。

关键词: Stacking集成学习, 用户流失预测, Bootstrap Sampling, 机器学习

Abstract:

Accurate user churn prediction ability facilitates improving user retention rate, increasing user count and increasing profitability. Most of the existing user churn prediction models are single model or simple integration of multiple models, and the advantages of multi-model integration are not fully utilized.This paper draws on the idea of Bootstrap Sampling in random forests, proposes an improved Stacking ensemble method, and applies the method to the real data set to predict the user churn. Through the experimental comparison on the validation set, the proposed method is better than the classical Stacking ensemble method with the same structure in the terms of the F1-score, recall rate and prediction accuracy of user churn. When the appropriate structure is adopted, the performance can surpass the optimal performance on the base classifier.

Key words: Stacking ensemble learning, user churn prediction, Bootstrap Sampling, machine learning

叶成，郑红，程云辉. 基于多模型融合的流失用户预测方法[J]. 计算机工程与科学.

YE Cheng,ZHENG Hong,CHENG Yun-hui.

A user churn prediction method

based on multi-model fusion

[J]. Computer Engineering & Science.

[1]	彭林, 张鹏, 陈俊峰, 唐滔, 黄春. 基于监督学习的稀疏矩阵乘算法优选[J]. 计算机工程与科学, 2025, 47(03): 381-391.
[2]	陈文锦. QTorch:基于独立的量子程序设计语言的量子-经典混合机器学习框架[J]. 计算机工程与科学, 2025, 47(03): 412-421.
[3]	王宇飞, 刘强, 张唯贞, 伍晓洁, 李佳雯, 王煜恒. rtTorTIM：基于多模态特征融合和Stacking集成学习的实时Tor流量识别方法#br#[J]. 计算机工程与科学, 2025, 47(02): 238-246.
[4]	温鑫, 曾焘, 李春波, 徐子晨. 面向服务器无感计算的模型推理服务切换方法研究[J]. 计算机工程与科学, 2024, 46(07): 1210-1217.
[5]	丁建平, 李卫军, 刘雪洋, 陈旭. 命名实体识别研究综述[J]. 计算机工程与科学, 2024, 46(07): 1296-1310.
[6]	黄智慧, 肖祥立, 张玉书, 薛明富. 基于隐形后门水印的开源数据集版权保护[J]. 计算机工程与科学, 2024, 46(06): 1013-1021.
[7]	高珊, 李世杰, 蔡志平. 基于深度学习的中文文本分类综述[J]. 计算机工程与科学, 2024, 46(04): 684-692.
[8]	黄鹏程, 冯超超, 马驰远, . 未知工艺角下时序违反的机器学习预测[J]. 计算机工程与科学, 2024, 46(03): 395-399.
[9]	李扬, 尹大鹏, 马自强, 姚梓豪, 魏良根, . 结合决策树和AdaBoost的缓存侧信道攻击检测[J]. 计算机工程与科学, 2024, 46(03): 440-452.
[10]	彭畅, 刘青枝, 陈长波, . 多面体模型下的循环置换与自动调优[J]. 计算机工程与科学, 2023, 45(12): 2121-2134.
[11]	赵振宇, 杨天豪, 蒋汶乘, 张书政. 基于机器学习的多压多温多参标准单元延迟快速计算方法[J]. 计算机工程与科学, 2023, 45(08): 1331-1338.
[12]	李小玲, 方建滨, 马俊, 谭霜, 谭郁松. 基于监督学习的稀疏矩阵自动任务分配[J]. 计算机工程与科学, 2023, 45(05): 782-789.
[13]	苏赋, 罗海波. 改进Stacking集成学习的指纹识别算法[J]. 计算机工程与科学, 2022, 44(12): 2153-2161.
[14]	胡艳芳, 熊文, 高炜. 基于 Spark 平台的网络游戏用户流失预测方法[J]. 计算机工程与科学, 2022, 44(10): 1730-1737.
[15]	唐阳坤, 鲜港, 杨文祥, 喻杰, 张晓蓉, 王耀彬. 基于用户行为的超级计算机作业失败预测方法[J]. 计算机工程与科学, 2022, 44(10): 1753-1761.