An improvement of random forests algorithm based on 
comprehensive sampling without replacement

J4 ›› 2015, Vol. 37 ›› Issue (7): 1233-1238.

• 论文 • Previous Articles Next Articles

An improvement of random forests algorithm based on
comprehensive sampling without replacement

LI Hui,LI Zheng,SHE Kun

（School of Computer Science and Engineering,University
of Electronic Science and Technology of China,Chengdu 611731,China）

Received:2014-08-05 Revised:2015-04-03 Online:2015-07-25 Published:2015-07-25

Abstract

Abstract:

Data mining is an important method in big data and service computing. As a typical method in data mining, random forest is widely used due to its low error rate. In order to dealing with big data more accurately and efficiently, we make a further improvement in the accuracy and efficiency of the random forest. It demonstrates both theoretically and practically that our method can decrease the generalization error by about 12%~20% when the number we choose for replacement is beyond the number of the samples. Moreover, we replace the method of repeated sampling with a simple method, which proves equal to the method of repeated sampling. By this way, we can decrease the time of building the forest, thus promoting the efficiency by about 10%~40% when it is used alone. And this method can just make up for the efficiency loss of the first improvement. Combing the two aforementioned methods, we promote the efficiency of the unbalanced data by 10%, and improve the accuracy of the balanced data over 12% without any impact on the efficiency. Therefore, the proposed method is more suitable for big data analysis and processing in service computing than the original method.

Key words: random forest;balanced data;unbalanced data;sampling without replacement

LI Hui,LI Zheng,SHE Kun. An improvement of random forests algorithm based on
comprehensive sampling without replacement [J]. J4, 2015, 37(7): 1233-1238.

An improvement of random forests algorithm based on
comprehensive sampling without replacement

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 0

Recommended Articles

Metrics

Comments

An improvement of random forests algorithm based on comprehensive sampling without replacement

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 0

Recommended Articles

Metrics

Comments

An improvement of random forests algorithm based on
comprehensive sampling without replacement