J4 ›› 2015, Vol. 37 ›› Issue (9): 1761-1767.
• 论文 • Previous Articles Next Articles
HUANG Xianying,CHEN Hongyang,LIU Yingtao,XIONG Liyuan
Received:
Revised:
Online:
Published:
Abstract:
The valid features of microblog short texts are sparse and difficult to extract, which reduces the accuracy of text representation, classification and clustering. We propose a novel algorithm for feature selection on microblog short texts based on statistics and semantic information. We utilize Term FrequencyInverse Document Frequency (TFIDF), POS and the length of term to construct the evaluation function, and together with the semantic relevance between term and microblog short texts, the feature selection on microblog short texts is achieved, which guarantees that the selected terms can represent the meaning of microblog short texts more accurately. The new feature selection algorithm is integrated with Naive Bayesian categorization algorithm, and the experiments on an open microblog corpus show the proposed algorithm can acquire a higher precision rate of text categorization compared with the traditional strategies, indicating that the selected terms by the proposed algorithm can represent the topic of micro-blog short text more accurately.
Key words: micro-blog short text;feature selection;statistics and semantic information;POS grouping;Naive Bayesian classification algorithm
HUANG Xianying,CHEN Hongyang,LIU Yingtao,XIONG Liyuan. A novel algorithm for feature selection on micro-blog short texts [J]. J4, 2015, 37(9): 1761-1767.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2015/V37/I9/1761