KEYWORDS: Machine learning, Data modeling, Data mining, Performance modeling, Software engineering, Data centers, Precision measurement, Facial recognition systems, Medical diagnostics, Detection and tracking algorithms
The class imbalance problem is one of the key challenges in machine learning and data mining. Imbalanced data can result in the sub-optimal performance of classification models. To address the problem, a variety of data sampling methods have been proposed in previous studies. However, there is no universal solution and it is worth to explore which kind of data sampling technique is more effective in balancing class distribution in terms of the type of data and classifier. In this work, we present an experimental study based on a number of real-world data sets obtained from different disciplines. The goal is to investigate different sampling techniques in terms of the effectiveness of increasing the classification performance in imbalanced data sets. In particular, we study ten sampling methods of different types, including random sampling, clusterbased sampling, ensemble sampling and so on. Besides, the C4.5 decision tree algorithm is used to train the base classifiers and the performance is measured by using precision, G-Measure and Cohen's Kappa statistic.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.