• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (01): 17-27.

• High Performance Computing • Previous Articles     Next Articles

A k-dominant skyline body parallel solving algorithm based on Flink

SUN Guo-zhang1,2,3,HUANG Shan1,2,3 ,ALKAM Zabibul1,2,3,XU Hao-tong1,2,3,DUAN Xiao-dong1,2,3   

  1. (1.College of Computer Science and Engineering,Dalian Minzu University,Dalian 116600;
    2.State Ethnic Affairs Commission Key Laboratory of Big Data Applied Technology (Dalian Minzu University),Dalian 116600;
    3.Dalian Key Laboratory of Digital Technology for National Culture (Dalian Minzu University),Dalian 116600,China)
  • Received:2022-09-05 Revised:2022-10-21 Accepted:2023-01-25 Online:2023-01-25 Published:2023-01-25

Abstract: The k-dominated skyline algorithm weakens the domination relationship between data points and is more suitable for high-dimensional data. k-dominated skyline bodies are suitable for multiple users to query with the k-dominated skyline algorithm, but the existing solution algorithms need to be improved in terms of time efficiency and code scalability. Therefore, this paper proposes an optimization algorithm for solving k-dominated skyline bodies. This algorithm stores the candidate set and the intermediate set for each user separately, and stores the non-k-dominated skyline points in the candidate set to the intermediate set of the corresponding user in the order of appearance of data points in the two sets during the k-domination checking process, so that the next user can filter and use them, which can reduce the number of comparisons between data points, avoids double counting, and improve query efficiency. A multi-user k-dominated skyline body parallel solving algorithm is also proposed, which effectively reduces the comparison time of data points through the Apache Flink parallel processing framework. The theoretical study and experimental data show that the proposed algorithm is highly efficient and can handle the multi-user k-dominated skyline problem well.

Key words: k-dominant, skyline query, multi-user, Apache Flink, parallel query