• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (01): 17-27.

• 高性能计算 • 上一篇    下一篇

基于Flink的k-支配skyline体并行求解算法

孙国璋1,2,3,黄山1,2,3,艾力卡木·再比布拉1,2,3,徐浩桐1,2,3,段晓东1,2,3   

  1. (1.大连民族大学计算机科学与工程学院,辽宁 大连 116600;
    2.大数据应用技术国家民委重点实验室(大连民族大学),辽宁 大连116600;
    3.大连市民族文化数字技术重点实验室(大连民族大学),辽宁 大连116600)

  • 收稿日期:2022-09-05 修回日期:2022-10-21 接受日期:2023-01-25 出版日期:2023-01-25 发布日期:2023-01-25
  • 基金资助:
    国家重点研发计划(2018YFB1004402)

A k-dominant skyline body parallel solving algorithm based on Flink

SUN Guo-zhang1,2,3,HUANG Shan1,2,3 ,ALKAM Zabibul1,2,3,XU Hao-tong1,2,3,DUAN Xiao-dong1,2,3   

  1. (1.College of Computer Science and Engineering,Dalian Minzu University,Dalian 116600;
    2.State Ethnic Affairs Commission Key Laboratory of Big Data Applied Technology (Dalian Minzu University),Dalian 116600;
    3.Dalian Key Laboratory of Digital Technology for National Culture (Dalian Minzu University),Dalian 116600,China)
  • Received:2022-09-05 Revised:2022-10-21 Accepted:2023-01-25 Online:2023-01-25 Published:2023-01-25

摘要: k-支配skyline算法弱化了数据点之间的支配关系,更适合高维数据。k-支配skyline体适应于多名用户使用k-支配skyline算法查询,而现有的求解算法在时间效率和代码扩展性方面都有待提高。因此,提出了面向多用户的k-支配skyline体求解优化算法MKSSOA,该算法对每名用户的候选集和中间集分别进行存储,同时在k-支配检查过程中利用2集合中数据点出现的先后次序将候选集中的非k-支配skyline点存储到对应用户的中间集中,以便下一名用户筛选使用,这样可以减少数据点之间的比较次数,避免重复计算,从而提升查询效率。同时,提出了面向多用户的k-支配skyline体并行求解算法MKSPSA,通过Apache Flink并行处理框架有效减少了数据点的比较时间。理论研究和实验结果显示,提出的算法具有较高的效率,能很好地处理多用户k-支配skyline问题。

关键词: k-支配, skyline查询, 多用户, Apache Flink, 并行查询

Abstract: The k-dominated skyline algorithm weakens the domination relationship between data points and is more suitable for high-dimensional data. k-dominated skyline bodies are suitable for multiple users to query with the k-dominated skyline algorithm, but the existing solution algorithms need to be improved in terms of time efficiency and code scalability. Therefore, this paper proposes an optimization algorithm for solving k-dominated skyline bodies. This algorithm stores the candidate set and the intermediate set for each user separately, and stores the non-k-dominated skyline points in the candidate set to the intermediate set of the corresponding user in the order of appearance of data points in the two sets during the k-domination checking process, so that the next user can filter and use them, which can reduce the number of comparisons between data points, avoids double counting, and improve query efficiency. A multi-user k-dominated skyline body parallel solving algorithm is also proposed, which effectively reduces the comparison time of data points through the Apache Flink parallel processing framework. The theoretical study and experimental data show that the proposed algorithm is highly efficient and can handle the multi-user k-dominated skyline problem well.

Key words: k-dominant, skyline query, multi-user, Apache Flink, parallel query