Effects of Data Augmentation by Replicating Instances: Classification Performance by Ensembles of Decision Trees

Authors

  • Kalaiselvi B Associate Professor, Department of Electronics and Communication Engineering, Bharath Institute of Higher Education and Research, Chennai, India
  • Venkateshan Assistant Professor, Department of Electronics and Communication Engineering, Bharath Institute of Higher Education and Research, Chennai, India

Keywords:

Resampling data, Diabetes, Bagging, REPTree, Random Tree, Accuracy

Abstract

The classification problem of unbalanced instances is rectified using the resampling technique, which makes the prediction easier by modifying the training data. We have machine learning algorithms to combat imbalanced classification. Among them, resampling is a useful technique that helps to balance instances based on the class majority and minority using under-sampling and over-sampling methods. However, despite its circulation, sampling has issues in the efficient evaluation of small-sized data. This study analyzes the sampling with ensembles of decision tree classifiers of different split percentages using a diabetes dataset, which wavers units of imbalance and produces better accuracy. The evaluation measure for each replication percentage for REPTree and Random Tree classifiers is calculated and interpreted in the discussion.

Additional Files

Published

2025-06-12

Issue

Section

Papers

How to Cite

Kalaiselvi B, Venkateshan. “Effects of Data Augmentation by Replicating Instances: Classification Performance by Ensembles of Decision Trees”. International Journal of Knowledge Exploration in Computational Intelligence. Vol. 1, Issue 1, pp. 22–31, Jun. 2025. DOI: To be applied