• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (10): 1730-1737.

• 高性能计算 • 上一篇    下一篇

基于 Spark 平台的网络游戏用户流失预测方法

胡艳芳,熊文,高炜   

  1. (云南师范大学信息学院,云南 昆明 650500)
  • 收稿日期:2022-01-25 修回日期:2022-05-18 接受日期:2022-10-25 出版日期:2022-10-25 发布日期:2022-10-28
  • 基金资助:
    国家自然科学基金(61862066)

An online game user churn prediction method based on Spark platform

HU Yan-fang,XIONG Wen,GAO Wei   

  1. (School of Information,Yunnan Normal University,Kunming 650500,China)
  • Received:2022-01-25 Revised:2022-05-18 Accepted:2022-10-25 Online:2022-10-25 Published:2022-10-28

摘要: 随着移动互联网的广泛普及,国内网络游戏市场日趋饱和,游戏公司获得新用户的成本不断增加,如何预防存量用户的流失已经成为市场营销的重心。提出了一种基于Spark平台的网络游戏用户流失预测方法,基于一个真实游戏日志数据对用户进行了流失预测。首先,从日志数据中抽取和计算了用户特征;随后,按权重选取了一组重要特征;最后,以特征为输入、流失与否为输出进行了二分类建模。综合比较了随机森林、支持向量机、多层感知机、梯度提升决策树和逻辑回归等6种常见分类算法。实验结果表明,随机森林算法表现最优,模型预测精度达到91%。

关键词: 用户流失预测, Spark, 二分类, 机器学习, 随机森林

Abstract: With the widespread popularity of the mobile Internet, the domestic online game market has become increasingly saturated. The cost of acquiring new users for game companies continues to increase. How to prevent the loss of existing users has become the focus of marketing. This paper predicts user churn based on a real game log data. First, user features are extracted and computed from log data. Second, a set of important features is selected by weight. Finally, a binary classification model is constructed with features as input and churn as output. 6 common algorithms such as random forest, support vector machine, multi-layer perceptron, gradient boosting decision tree, and logistic regression are comprehensively compared. The experimental results show that the random forest algorithm performs the best, and its model prediction accuracy reaches 91%.

Key words: churn prediction;Spark, binary classification, machine learning, random forest