Effects of Data Augmentation by Replicating Instances: Classification Performance by Ensembles of Decision Trees
Keywords:
Resampling data, Diabetes, Bagging, REPTree, Random Tree, AccuracyAbstract
The classification problem of unbalanced instances is rectified using the resampling technique, which makes the prediction easier by modifying the training data. We have machine learning algorithms to combat imbalanced classification. Among them, resampling is a useful technique that helps to balance instances based on the class majority and minority using under-sampling and over-sampling methods. However, despite its circulation, sampling has issues in the efficient evaluation of small-sized data. This study analyzes the sampling with ensembles of decision tree classifiers of different split percentages using a diabetes dataset, which wavers units of imbalance and produces better accuracy. The evaluation measure for each replication percentage for REPTree and Random Tree classifiers is calculated and interpreted in the discussion.
