• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (09): 1546-1557.

Previous Articles     Next Articles

Truth discovery based on neural network encoding

CAO Jian-jun1,CHANG Chen1,2,WENG Nian-feng1,TAO Jia-qing1,3,JIANG Chun1   

  1. (1.The Sixty-third Research Institute,National University of Defense Technology,Nanjing 210007;

    2.Institute of Command and Control Engineering,Army Engineering University,Nanjing 210007;

    3.Department of Industrial Engineering,Nanjing University of Technology,Nanjing 210009,China)


  • Received:2020-09-10 Revised:2020-11-20 Accepted:2021-09-25 Online:2021-09-25 Published:2021-09-26

Abstract: Due to the openness and diversity of the Internet, different platforms provide different quality information,and the descriptions of the same object can be conflict with each other. Truth discovery is one of the important technical means to resolve semantic conflicts and improve the data quality. Traditional truth discovery methods usually assume that the relationship between source reliability and claim credibility can be represented by a simple function. These methods design iterative rules or probability models to find trustworthy claims and sources. However, manually-defined factors are often difficult to reflect the real underlying distribution of the data, resulting in an unsatisfied truth discovery result. In order to solve this problem, a truth discovery method based on neural network encoding is proposed. Firstly, the method constructs a double-loss deep neural network which contains “source-source” and “source-claim” relationships. Secondly, it embeds the sources and claim into a low-dimensional space, which indicates the source reliability and claim credibility. Based on the optimization, the reliable sources and the trustworthy claims are close in the embedding space (meanwhile, unreliable sources and untrustworthy claims). Finally, truth discovery is performed based on the embedding space. Compared with traditional methods, it is not necessary for the proposed method to manually define the iterative rules or data distribution before truth discovery. The method utilizes the neural network to automatically learn the complex relationships among sources and claims, and then embeds them into a low- dimensional space. The experimental results on the real dataset show that the proposed model increases the precision by 2%~25% in comparison to the iterative based methods such as Accu, by 2%~4% in comparison to the probabilistic graphical model based methods such as 3-Estimate, by 2%~5% in comparison to the optimization based method such as CRH, and by 1%~2% in comparison to the neural network based method FFMN.

Key words: data quality, data cleaning, conflict resolution, truth discovery, neural network