Effects of Data Augmentation by Replicating Instances: Classification Performance by Ensembles of Decision Trees

Kalaiselvi B; Venkateshan

Authors

Kalaiselvi B Associate Professor, Department of Electronics and Communication Engineering, Bharath Institute of Higher Education and Research, Chennai, India
Venkateshan Assistant Professor, Department of Electronics and Communication Engineering, Bharath Institute of Higher Education and Research, Chennai, India

Keywords:

Resampling data, Diabetes, Bagging, REPTree, Random Tree, Accuracy

Abstract

The classification problem of unbalanced instances is rectified using the resampling technique, which makes the prediction easier by modifying the training data. We have machine learning algorithms to combat imbalanced classification. Among them, resampling is a useful technique that helps to balance instances based on the class majority and minority using under-sampling and over-sampling methods. However, despite its circulation, sampling has issues in the efficient evaluation of small-sized data. This study analyzes the sampling with ensembles of decision tree classifiers of different split percentages using a diabetes dataset, which wavers units of imbalance and produces better accuracy. The evaluation measure for each replication percentage for REPTree and Random Tree classifiers is calculated and interpreted in the discussion.

Effects of Data Augmentation by Replicating Instances: Classification Performance by Ensembles of Decision Trees

Authors

Keywords:

Abstract

Additional Files

Published

Issue

Section

How to Cite