• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (7): 1226-1236.

• 软件工程 • 上一篇    下一篇

BotChecker:一种基于Transformer的GitHub Bot检测模型#br# #br#

张锦1,3,吴星瑾1,张洋2,许舜宇1   

  1. (1.湖南师范大学信息科学与工程学院,湖南 长沙 410081;2.国防科技大学计算机学院,湖南 长沙 410073;
    3.长沙理工大学计算机与通信学院,湖南 长沙 410114)

  • 收稿日期:2023-12-15 修回日期:2024-03-25 出版日期:2025-07-25 发布日期:2025-08-25
  • 基金资助:
    国家自然科学基金(62141209,61972055);湖南省自然科学基金(2021JJ30456,2021JJ30734);国防科技重点实验室基金(2021-KJWPDL-06,2021-KJWPDL-17)

BotChecker:A Transformer-based GitHub bot detection model#br#

ZHANG Jin1,3,WU Xingjin1,ZHANG Yang2,XU Shunyu1   

  1. (1.College of Information Science and Engineering,Hunan Normal University,Changsha 410081;
    2.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073;
    3.School of Computer and Communication Engineering,
    Changsha University of Science and Technology,Changsha 410114,China)
  • Received:2023-12-15 Revised:2024-03-25 Online:2025-07-25 Published:2025-08-25

摘要: 在开源软件中,准确识别软件开发辅助机器人(Bot)和人类贡献者对于理解和评估贡献活动至关重要。针对深度学习模型在自然语言处理和软件工程相关领域中的优异表现,提出了一种基于Transformer架构的Bot自动检测模型BotChecker。通过在Transformer中引入增强的全连接层和专用的二分类器结构,该模型能有效学习Bot和人类账户的评论文本数据,进而对Bot进行检测。实验验证了BotChecker在Bot检测任务中的有效性,准确率、召回率和F1值分别达到0.941,0.894和0.938。此外,还分析了模型超参数、零样本学习对于BotChecker性能的影响。所提出的模型可为开源社区Bot账户识别提供技术支撑,并为后续研究提供方法基准。

关键词: 开源平台, Bot检测技术, 实证分析, 文本处理

Abstract: In open-source software,accurately identifying  software development assistant robots(Bots) and human contributors is crucial for understanding and evaluating contribution activities.Given the outstanding performance of deep learning models in NLP and software engineering-related fields,this paper proposes BotChecker,a Transformer-based automated bot detection model.By incorporating enhanced fully connected layers and a dedicated binary classifier structure into the Transformer,the model can effectively learn from comment text data of bot and human accounts to detect bots.Experiments validate the effectiveness of BotChecker in bot detection tasks,achieving Accuracy,Recall,and F1-score of 0.941,0.894,and 0.938,respectively.Furthermore,this paper analyzes the impact of model hyperparameters and zero-shot learning on BotChecker’s performance.The proposed model can provide technical support for bot account identification in open-source communities and serve as a methodological benchmark for future research.

Key words: open-source platform, Bot detection technique, empirical analysis, text processing