• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 计算机网络与信息安全 • 上一篇    下一篇

异构复杂信息网络敏感数据流动态挖掘

熊菊霞1,2,3,吴尽昭1,2,3   

  1. (1.中国科学院成都计算机应用研究所,四川 成都 610041;2.中国科学院大学,北京 100049;
    3.广西民族大学广西混杂计算与集成电路设计分析重点实验室,广西 南宁 530006)
  • 收稿日期:2019-06-13 修回日期:2019-10-09 出版日期:2020-04-25 发布日期:2020-04-25
  • 基金资助:

    国家自然科学基金(61772006);广西科技重大专项项目(AA17204096);广西科技基地和人才专项项目(2016AD05050);
    广西“八桂学者”专项资助;广西高校中青年教师基础能力提升项目(2017KY0174)

Dynamic mining of sensitive data streams in
heterogeneous complex information networks

XIONG Ju-xia1,2,3,WU Jin-zhao1,2,3
  

  1. (1.Chengdu  Institute of Computer Application,Chinese Academy of Sciences,Chengdu  610041;
    2.University of Chinese Academy of Sciences,Beijing 100049;
    3.Guangxi Key Laboratory of Hybrid Computational and IC Design Analysis,
    Guangxi University for Nationalities,Nanning 530006,China)
  • Received:2019-06-13 Revised:2019-10-09 Online:2020-04-25 Published:2020-04-25

摘要:

针对异构复杂信息网络中存在高维冗余的敏感数据流,可挖掘数据特征形成概率较低,导致需要多次挖掘、挖掘内存占用高、挖掘精度低、时间长的问题,提出基于最大类间散度的网络敏感数据流动态挖掘方法。将敏感数据的差异最大化间隔作为分类基础,得到网络敏感数据的最大类间散度,在遗传迭代状态下确定最优散度迭代函数,对迭代函数进行挖掘特征优选,得出动态可挖掘特征。对可挖掘特征进行聚类分析,挖掘得到数据隐藏信息模式,并对其进行评价,将合理的信息模式进行知识表示,从而实现异构复杂信息网络敏感数据流动态挖掘。实验结果表明,所提方法可挖掘特征形成概率高达98%,labels标记与实际值较为接近。所提方法挖掘精度高,且运行时间较短、内存占用率低。

关键词: 异构复杂信息网络, 敏感数据流, 动态挖掘, 散度迭代函数, 聚类分析

Abstract:

For the sensitive data streams with high-dimensional redundancy in heterogeneous complex information networks, the probability of data feature formation is low, which leads to multiple mining, high memory usage, low mining accuracy and long running time. Aiming at the above problems, a dynamic network sensitive data stream mining method based on the maximum inter-class divergence is proposed. The maximum difference interval between sensitive data is used as the basis for classification to obtain the maximum inter-class divergence of the network sensitive data. The optimal divergence iterative function is determined in the genetic iterative state. The mining characteristics of the iterative function are preferably selected to obtain the dynamic mining characteristics. Clustering analysis is performed on the mining characteristics to obtain data hiding information modes. These modes are evaluated, and knowledge representation is carried out on the reasonable information modes, so as to realize the dynamic mining of the sensitive data streams in the heterogeneous complex information networks. The experimental results show that the mineable feature formation probability of the method can be up to 98%, and the labels are close to the actual values. The method has the advantages of high mining accuracy, short running time and low memory usage.
 

Key words: heterogeneous complex information network, sensitive data stream, dynamic mining, divergence iterative function, clustering analysis