• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (09): 1670-1678.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于改进自适应DBSCAN的混合式MOOC视频观看模式挖掘

王若宾1,耿芳东1,张永梅1,宋威1,王伟锋1,徐琳2   

  1. (1.北方工业大学信息学院,北京 100144;2.南澳大学STEM学院,澳大利亚 阿德莱德 5095)

  • 收稿日期:2022-02-07 修回日期:2022-04-01 接受日期:2023-09-25 出版日期:2023-09-25 发布日期:2023-09-12
  • 基金资助:
    国家自然科学基金(61977001);北京市高等教育学会重点课题(ZD202127);全国高等院校计算机基础教育研究会项目(2023-AFCEC-134);教育部高等学校大学计算机课程教学指导委员会联合高等教育出版社项目(2022)

Blended MOOC video viewing pattern mining based on an improved self-adaptive DBSCAN

WANG Ruo-bin1,GENG Fang-dong1,ZHANG Yong-mei1,SONG Wei1,WANG Wei-feng1,XU Lin2   

  1. (1.School of Information Science and Technology,North China University of Technology,Beijing 100144,China;
    2.STEM,University of South Australia,Adelaide 5095,Australia)
  • Received:2022-02-07 Revised:2022-04-01 Accepted:2023-09-25 Online:2023-09-25 Published:2023-09-12

摘要: 基于密度聚类的DBSCAN算法能够依据数据特征自动执行分类任务,多应用于含噪声的复杂数据集的聚类分析,但也存在难以确定参数以及人工参与度高的缺陷,限制了自动高准确率挖掘的应用。基于此,提出了一种基于k-dist图斜率的自适应DBSCAN算法KSSA-DBSCAN,可以依据k-dist图斜率自动选择合适的k-dist图拐点作为最佳邻域,并在聚类迭代过程中依据聚类数目的变化自动确定最佳密度阈值,克服了难以确定参数和人工参与度过高的缺陷。基于6个数据集将KSSA-DBSCAN和DBSCAN、KANN-DBSCAN进行了对比,实验结果显示,该算法的准确率在4个数据集上均优于其它算法,并且与DBSCAN相比准确率最大提高了25%。将其应用于某混合式MOOC视频观看行为数据的模式挖掘,结果显示该算法能够对视频观看模式进行有效的自动挖掘,进一步验证了该算法的有效性。

关键词: 密度聚类, 自适应, k-dist图, 混合式MOOC, 视频观看模式 ,

Abstract: The DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm based on density clustering can automatically perform classification tasks according to data features, and is often used for clustering analysis of complex data sets with noise. However, it has the defects of difficult parameter determination and high degree of human participation, which limits the application of automatic and high-precision mining. To overcome these defects, an adaptive DBSCAN algorithm based on the k-dist graph slope (KSSA-DBSCAN) is proposed. The algorithm can automatically select the appropriate k-dist graph inflection point as the optimal neighborhood based on the slope of the k-dist graph, and automatically determine the optimal density threshold during the clustering iteration process according to the change in the number of clusters, which overcomes the defects of difficult parameter determination and high degree of human participation. KSSA-DBSCAN is compared with DBSCAN and KANN-DBSCAN on six data sets, and the experimental results show that the accuracy of the algorithm is better than that of other algorithms on the four data sets, and the accuracy is increased by up to 25% compared with DBSCAN. When it is applied to the pattern mining of blended MOOC videos viewing behavior data, the results show that the algorithm can effectively and automatically mine the video viewing patterns, further verifying the effectiveness of the algorithm.

Key words: density-based clustering, self-adaptive, k-dist graph, blended MOOC, video viewing pattern