J4 ›› 2015, Vol. 37 ›› Issue (07): 1233-1238.
• 论文 • Previous Articles Next Articles
LI Hui,LI Zheng,SHE Kun
Received:
Revised:
Online:
Published:
Abstract:
Data mining is an important method in big data and service computing. As a typical method in data mining, random forest is widely used due to its low error rate. In order to dealing with big data more accurately and efficiently, we make a further improvement in the accuracy and efficiency of the random forest. It demonstrates both theoretically and practically that our method can decrease the generalization error by about 12%~20% when the number we choose for replacement is beyond the number of the samples. Moreover, we replace the method of repeated sampling with a simple method, which proves equal to the method of repeated sampling. By this way, we can decrease the time of building the forest, thus promoting the efficiency by about 10%~40% when it is used alone. And this method can just make up for the efficiency loss of the first improvement. Combing the two aforementioned methods, we promote the efficiency of the unbalanced data by 10%, and improve the accuracy of the balanced data over 12% without any impact on the efficiency. Therefore, the proposed method is more suitable for big data analysis and processing in service computing than the original method.
Key words: random forest;balanced data;unbalanced data;sampling without replacement
LI Hui,LI Zheng,SHE Kun. An improvement of random forests algorithm based on comprehensive sampling without replacement [J]. J4, 2015, 37(07): 1233-1238.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2015/V37/I07/1233