Classification in Imbalanced Datasets

This thesis studies well-known approaches to the classification problem in the presence of class imbalanced data, such as Cost-Sensitivity, Bagging for Imbalanced Datasets, MetaCost and SMOTE. The main contribution of the thesis is a new approach to the problem that we call Naive Bayes Sampling. The approach is a generative approach. It generates new instances of the minority class by bootstrapping values of each feature present in the training data. Experiments show the superiority of our approach on 4 UCI datasets and a medical dataset provided by KULeuven.
- Introduction
- Classification
- Classification in Imbalanced Datasets
- Report & Articles
- Software
- Bibliography
Copyright 2009 Thomas Debray