• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (02): 244-250.

• 计算机网络与信息安全 • 上一篇    下一篇

基于深度学习的GPS轨迹去匿名研究

卜冠华1,2,周礼亮3,李昊1,张敏1    

  1. (1.中国科学院软件研究所可信计算与信息保障实验室,北京 100089;

    2.中国科学院大学,北京 100089;

    3.中国电子科技集团公司航空电子信息系统技术重点实验室,四川 成都 610036)


  • 收稿日期:2020-07-20 修回日期:2020-10-30 接受日期:2022-02-25 出版日期:2022-02-25 发布日期:2022-02-17
  • 基金资助:
    中国科学院青年创新促进会项目(2018YFC0809300)

GPS trajectory de-anonymization based on deep learning

BU Guan-hua1,2 ,ZHOU Li-liang3,LI Hao1,ZHANG Min1   

  1. (1.Trusted Computing and Information Assurance Laboratory,
    Institute of Software,Chinese Academy of Sciences,Beijing 100089;

    2.University of Chinese Academy of Sciences,Beijing 100089;

    3.CETC Key Laboratory of Avionic Information System Technology,Chengdu 610036,China)

  • Received:2020-07-20 Revised:2020-10-30 Accepted:2022-02-25 Online:2022-02-25 Published:2022-02-17

摘要: 移动互联网和LBS技术的高速发展使得位置服务提供商可以轻松收集到大量用户位置轨迹数据,近期研究表明,深度学习方法能够从轨迹数据集中提取出用户身份标识等隐私信息。然而现有工作主要针对社交网络采集的签到点轨迹,针对GPS轨迹的去匿名研究则较为缺乏。因此,对基于深度学习的GPS轨迹去匿名技术开展研究。首先提出一种GPS轨迹数据预训练方法,经过子轨迹划分、位置点转化和位置点嵌入,原始GPS轨迹中的空间距离和上下文信息被嵌入到定长向量中,使得GPS轨迹数据能够作为神经网络的输入。其次提出一种基于深度神经网络训练的GPS轨迹去匿名方法,基于预训练得到的向量序列,采用LSTM、GRU等神经网络作为编码器训练拟合用户标识,实现匿名轨迹数据的用户关联。最后基于Geolife轨迹数据集对上述方法进行验证,实验中轨迹去匿名的准确率和Top5准确率分别达到了56.73%和73.48%,实验结果表明,基于深度学习的GPS轨迹去匿名方法能够从匿名轨迹数据中较为准确地识别出用户标识。

关键词: 深度学习, 循环神经网络, 轨迹去匿名, GPS轨迹, 数据预训练 ,

Abstract: The rapid development of mobile Internet and LBS technology allows location service providers to easily collect an ocean of user location trajectory data. Recent studies have shown that deep learning methods can extract user privacy such as user identity from trajectory datasets. However, the existing work mainly focuses on the check-in trajectories collected by social networks, and the de- anonymization research of GPS trajectories is relatively lacking. Therefore, the research on the de- anonymization technology of GPS trajectory based on deep learning is carried out. Firstly, a pre-training method of GPS trajectory data is proposed. After sub-trajectory division, location conversion and location embedding, the spatial distance and context information of GPS coordinates in original trajectories are embedded into fixed-length vectors, so that the GPS trajectory data can be used as the input of neural network. Secondly, a GPS trajectory de-anonymization method based on deep neural network training is proposed. Based on the pre-trained vector sequences obtained in data pre-training, neural networks such as LSTM and GRU are used as encoders to train and fit user identification to achieve trajectory-user link from anonymous trajectories. Finally, the above methods are verified on Geolife trajectory dataset. In the experiment, the accuracy and Top5-accuracy of trajectory de-anonymization reach 56.73% and 7348%. The experimental results demonstrate that the GPS trajectory de-anonymization method based on deep learning can obtain more accurate user identification from the anonymous trajectory data.

Key words: deep learning, recurrent neural network, trajectory de-anonymization, GPS trajectory, data pre-training