Classification in Imbalanced Datasets

Introduction

This thesis studies well-known approaches to the classification problem in the presence of class imbalanced data, such as Cost-Sensitivity, Bagging for Imbalanced Datasets, MetaCost and SMOTE. The main contribution of the thesis is a new approach to the problem that we call Naive Bayes Sampling. The approach is a generative approach. It generates new instances of the minority class by bootstrapping values of each feature present in the training data. Experiments show the superiority of our approach on 4 UCI datasets and a medical dataset provided by KULeuven.

  1. Introduction
  2. Classification
  3. Classification in Imbalanced Datasets
  4. Report & Articles
  5. Software
  6. Bibliography

Copyright 2009 Thomas Debray