J4 ›› 2013, Vol. 35 ›› Issue (4): 136-143.
• 论文 • Previous Articles Next Articles
YU Long,WANG Jinlong
Received:
Revised:
Online:
Published:
Abstract:
According to the graphictext content as the core of the page information extraction, this paper in a formal way forward on the page for elemental analysis of theoretical model. Through the definition of basic elements and rules of transformation, graphictext page model with tree structure to show the page elements within the text and graphic features. The graphictext page model elements in many features, by defining the elements classification of similarity, is proposed in this paper to obtain the best classification feature set and the recognition threshold method and gives the algorithm implementation. The experimental results show that, the graphictext page model simplifies the page element size, feature set in smaller learning costs induction can achieve ideal classification accuracy.
Key words: web extraction;web page element;picturetext model;feature induction
YU Long,WANG Jinlong. Picturetext webpage model and page element feature induction [J]. J4, 2013, 35(4): 136-143.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2013/V35/I4/136