• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (4): 162-166.

• 论文 • Previous Articles     Next Articles

Imbalanced Classification Approaches to Automatic SingleDocument Summarization

NI Weijian,LIU Tong,ZENG Qingtian,ZHAO Hua,TANG Jianyu   

  1. (School of Information Science and Engineering,
    Shandong University of Science and Technology,Qingdao 266510,China)
  • Received:2011-11-05 Revised:2012-02-10 Online:2012-04-26 Published:2012-04-25

Abstract:

Machine learning based automatic document summarization approaches have drawn increasing attentions in the natural language processing literature. However, neither of them takes the imbalanced class distribution in automatic document summarization into account, i.e., the number of the sentences in summary is much fewer than that of in the whole document. It is obvious that the highly imbalanced data distribution will degrade the effectiveness of the conventional machine learning algorithms. This paper addresses the problem of automatic document summarization from a perspective of imbalanced classification and proposes two learning strategies to deal with the highly imbalanced distributed data in automatic singledocument summarization effectively. The experimental results on the DUC 2001 data set show the significant performance improvements of our approaches in terms of F1 and ROUGH2.

Key words: imbalanced classification;automatic document summarization;SVM;margin;bagging