• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (08): 1383-1392.

• High Performance Computing • Previous Articles     Next Articles

A massive subway passenger trajectory similarity connection method:A case study of Shenzhen metro 

WANG Xing-su1,XIONG Wen1,ZHANG Rui2   

  1. (1.School of Information Science and Technology,Yunnan Normal University,Kunming 650500;
    2.Shenzhen Institute of Beidou Applied Technology,Shenzhen 518000,China)
  • Received:2022-06-09 Revised:2022-09-23 Accepted:2023-08-25 Online:2023-08-25 Published:2023-08-18

Abstract: The current main trajectory similarity connection methods are based on GPS trajectories. Optimization methods for GPS trajectories cannot be directly applied to the problem of connecting subway passenger trajectories. By fully utilizing the spatiotemporal characteristics of subway passenger tra- jectories and leveraging the trajectorys repetitiveness and symmetry, the trajectory is transformed from a point sequence to an OD sequence to reduce the trajectory length and save storage space. This paper focuses on the design and implementation of the trajectory connection algorithm based on PPJoin+ on the Spark platform. The method is validated on a 13-node Spark cluster and a large-scale dataset contain- ing 5 million passenger trajectories (560 million tap records collected in two consecutive months). The experimental results show that the PPJoin+ algorithm based on OD sequence only takes 14.0 minutes, which saves 62.5% of the execution time compared to the default point sequence trajectory connection algorithm and 78.2% of the execution time compared to the Dima connection algorithm, and exhibits good scalability.

Key words: trajectory similarity, metro system, PPJoin+, Spark ,