Random forest (RF) is a competitive machine learning theorem, while one of the big challenges for it is imbalanced real-world data. A Two-dimensional-reduction RF (2DRRF) is presented in this paper, which is optimized based on traditional RF and three innovation points as follows. To improve RF in terms of performance on imbalanced data, a two-dimensional-reduction approach is created. Then, a modified T- link is proposed focusing on detecting and reducing safe samples. Moreover, a biased sampling manner is employed to build up optimal training datasets. Across 13 imbalanced datasets from KEEL-dataset with imbalance-ratio ranging from 6.38 to 129.44, experiments are carried out indicating that 2DRRF steadily holds advantages over the other two relevant implementations of RF in terms of accuracy, recall, precision and F-value.
The slide is not available in public now. Drop me an email if you’re interested.
More information can be found in this link .