A new proof of the equivalence between random walk
sequences and sentences in representation learning

Computer Engineering & Science

A new proof of the equivalence between random walk

sequences and sentences in representation learning

SUN Yan1,3,4,5,SUN Mao-song1,2,ZHAO Hai-xing1,3,4,YE Zhong-lin1,3,4

(1.School of Computer,Qinghai Normal University,Xining,Qinghai 810016;

2.Department of Computer Science and Technology,Tsinghua University,Beijing 100084;

3.Key Laboratory of Tibetan Information Processing and Machine Translation in QH,Xining 810008;

4.Key Laboratory of the Education Ministry for Tibetan Information Processing,Xining 810008;

5.School of Computer,Qinghai Nationalities University,Xining 810007,China)

Received:2019-07-11 Revised:2019-09-16 Online:2020-02-25 Published:2020-02-25

Abstract

Abstract:

Representation learning is to map the information with association relationship into a low-dimensional vector space through a shallow neural network in machine learning. The goal of word representation learning is to map the relationship between words and their context words into the low-dimensional vector, while the goal of network representation learning is to map the relationship between network nodes and context nodes into the low-dimensional vector. The word vector is the output of the word representation learning, and the node representation vector is the output of the network representation learning. DeepWalk obtains the walk sequence in the network as a sentence of word2vec model through a random walk strategy, and the node pair is trained in the neural network through the sliding window. word2vec and DeepWalk use the same underlying model and optimization method: Skip-Gram model and negative sampling optimization method. In word2vec and DeepWalk, the Skip-Gram model with negative sampling is called SGNS. The existing research results show that both the word representation learning and the network representation learning algorithm based on the SGNS model both implicitly decompose the target feature matrix. Perozzi et al. proposed that word frequency obeys Zipf's law and node degree in the network obeys the power law distribution, and they considered that the random walk sequence of the network is equivalent to the sentence of the language model. However, the reason for judging whether a sentence is equivalent to a random walk sequence is only based on the power law distribution, which is not sufficient. Therefore, based on the theory and basis of SGNS's implicit decomposition of the target feature matrix, this paper designs two comparative experiments. The experiments use singular value decomposition and matrix completion method to perform node classification tasks on three public data sets, and confirms the equivalence between sentences and random walk sequences.

Key words: word vector, shifted positive pointwise mutual information, sentence, random walk sequence

SUN Yan, SUN Mao-song, ZHAO Hai-xing, YE Zhong-lin, .

A new proof of the equivalence between random walk

sequences and sentences in representation learning

[J]. Computer Engineering & Science.

[1]	WANG Qin-chen, DUAN Li-guo, WANG Jun-shan, ZHANG Hao-yan, GAO Hao. A short text semantic matching strategy based on BERT sentence vector and differential attention [J]. Computer Engineering & Science, 2024, 46(07): 1321-1330.
[2]	ZHANG Huan, LI Wei-jiang, . Distant supervision relation extraction based on type attention and GCN [J]. Computer Engineering & Science, 2024, 46(02): 316-324.
[3]	YU Jin-ping, ZHU Wei-feng, LIAO Lie-fa. Entity recognition of support policy text based on RoBERTa-wwm-BiLSTM-CRF [J]. Computer Engineering & Science, 2023, 45(08): 1498-1507.
[4]	DONG Peng-shan, ZHANG Jing, JIN Ri-ze. Sentiment analysis of Chinese product reviews based on dual-channel gated composite network [J]. Computer Engineering & Science, 2023, 45(05): 911-919.
[5]	XIAN Yan-tuan, ZHANG Zhi-ju, WANG Hong-bin, WEN Yong-hua, . Thai sentence segmentation based on Siamese recurrent neural network [J]. Computer Engineering & Science, 2021, 43(12): 2238-2242.
[6]	YAN Xiong, DUAN Yuexing, ZHANG Zehua. Entity relationship extraction fusing self-attention mechanism and CNN [J]. Computer Engineering & Science, 2020, 42(11): 2059-2066.
[7]	JIANG Ya-fang, YAN Xin, XU Guang-yi, ZHOU Feng, DENG Zhong-ying, . Multi-document summarization extraction based on multi-information sentence graph model [J]. Computer Engineering & Science, 2020, 42(03): 535-542.
[8]	LI Jian-bing1,2,3，LIU Li-cai1,3. A film criticism sentiment analysis algorithm based on improved neural network [J]. Computer Engineering & Science, 2019, 41(12): 2261-2269.
[9]	SHEN Qiangqiang，XIONG Zeyu,XIONG Yueshan. A new automatic summarization method based on paragraph vector [J]. Computer Engineering & Science, 2019, 41(06): 1064-1070.
[10]	LU Dawei1,SONG Rou2,SHANG Ying3. A cognitive computational model of generalized topic structure in Chinese text [J]. Computer Engineering & Science, 2018, 40(07): 1264-1274.
[11]	LIU Lu-fang1,LI Bo1,CHEN Peng1,ZHOU Ling-han1,WANG Bing2. Bilingual lexicon extraction based on word vector and comparable corpus [J]. Computer Engineering & Science, 2018, 40(02): 368-373.
[12]	LI Yuan1,DIAO Sheng-quan1,HU Jin-zhu1,2,ZHAI Hong-sen1,YANG Meng-chuan1,HUANG Wen-can1. Marked complex sentence hierarchy analysis based on semantics and rules [J]. Computer Engineering & Science, 2017, 39(12): 2306-2313.
[13]	Nurahmat·Amat1,Azragul1,2,Yusup·Abaydulla1. Sentence component analysis of modern Uighur [J]. J4, 2015, 37(12): 2339-2344.
[14]	CAI Zangtai. Research on the Automatic Identification of Tibetan Sentence Boundaries with Maximum Entropy Classifier [J]. J4, 2012, 34(6): 187-190.
[15]	CHENG Chuanpeng,WU Zhigang. A Method of Sentence Similarity Computing Based on Hownet [J]. J4, 2012, 34(2): 172-175.

A new proof of the equivalence between random walk

sequences and sentences in representation learning

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles

Metrics

Comments