mlresearch.datasets.ImbalancedBinaryDatasets

class mlresearch.datasets.ImbalancedBinaryDatasets(names: str | list = 'all', data_home: str = None, download_if_missing: bool = True)[source]

Class to download, transform and save binary class imbalanced datasets.


MULTIPLICATION_FACTORS = [2, 3]
download()[source]

Download the datasets and append undersampled versions of them.

fetch_breast_tissue()[source]

Download and transform the Breast Tissue Data Set. The minority class is identified as the car and fad labels and the majority class as the rest of the labels.

http://archive.ics.uci.edu/ml/datasets/breast+tissue

fetch_cleveland()[source]

Download and transform the Heart Disease Cleveland Data Set. The minority class is identified as the positive label and the majority class as the negative label.

https://archive.ics.uci.edu/ml/datasets/heart+disease

fetch_dermatology()[source]

Download and transform the Dermatology Data Set. The minority class is identified as the positive label and the majority class as the negative label.

https://archive.ics.uci.edu/ml/datasets/Dermatology

fetch_ecoli()[source]

Download and transform the Ecoli Data Set. The minority class is identified as the pp label and the majority class as the rest of the labels.

https://archive.ics.uci.edu/ml/datasets/ecoli

fetch_eucalyptus()[source]

Download and transform the Eucalyptus Data Set. The minority class is identified as the best label and the majority class as the rest of the labels.

https://www.openml.org/d/188

fetch_glass()[source]

Download and transform the Glass Identification Data Set. The minority class is identified as the 1 label and the majority class as the rest of the labels.

https://archive.ics.uci.edu/ml/datasets/glass+identification

fetch_haberman()[source]

Download and transform the Haberman’s Survival Data Set. The minority class is identified as the 1 label and the majority class as the 0 label.

https://archive.ics.uci.edu/ml/datasets/Haberman’s+Survival

fetch_heart()[source]

Download and transform the Heart Data Set. The minority class is identified as the 2 label and the majority class as the 1 label.

http://archive.ics.uci.edu/ml/datasets/statlog+(heart)

fetch_iris()[source]

Download and transform the Iris Data Set. The minority class is identified as the 1 label and the majority class as the rest of the labels.

https://archive.ics.uci.edu/ml/datasets/iris

fetch_led()[source]

Download and transform the LED Display Domain Data Set. The minority class is identified as the positive label and the majority class as the negative label.

https://www.openml.org/d/40496

fetch_libras()[source]

Download and transform the Libras Movement Data Set. The minority class is identified as the 1 label and the majority class as the rest of the labels.

https://archive.ics.uci.edu/ml/datasets/Libras+Movement

fetch_liver()[source]

Download and transform the Liver Disorders Data Set. The minority class is identified as the 1 label and the majority class as the ‘2’ label.

https://archive.ics.uci.edu/ml/datasets/liver+disorders

fetch_new_thyroid_1()[source]

Download and transform the Thyroid Disease Data Set. The minority class is identified as the positive label and the majority class as the negative label.

Note

The positive class was originally label 2.

https://archive.ics.uci.edu/ml/datasets/Thyroid+Disease

fetch_new_thyroid_2()[source]

Download and transform the Thyroid Disease Data Set. The minority class is identified as the positive label and the majority class as the negative label.

Note

The positive class was originally label 3.

https://archive.ics.uci.edu/ml/datasets/Thyroid+Disease

fetch_page_blocks()[source]

Download and transform the Page Blocks Data Set. The minority class is identified as the positive label and the majority class as the negative label.

https://www.openml.org/d/1021

fetch_pima()[source]

Download and transform the Pima Indians Diabetes Data Set. The minority class is identified as the 1 label and the majority class as the ‘0’ label.

https://www.kaggle.com/uciml/pima-indians-diabetes-database

fetch_vehicle()[source]

Download and transform the Vehicle Silhouettes Data Set. The minority class is identified as the 1 label and the majority class as the rest of the labels.

https://archive.ics.uci.edu/ml/datasets/Statlog+(Vehicle+Silhouettes)

fetch_vowel()[source]

Download and transform the Vowel Recognition Data Set. The minority class is identified as the positive label and the majority class as the negative label.

https://www.openml.org/d/375

fetch_wine()[source]

Download and transform the Wine Data Set. The minority class is identified as the 2 label and the majority class as the rest of the labels.

https://archive.ics.uci.edu/ml/datasets/wine

fetch_yeast()[source]

Download and transform the Yeast Data Set. The minority class is identified as the positive label and the majority class as the negative label.

https://archive.ics.uci.edu/ml/datasets/Yeast

imbalance_datasets(imbalance_ratio: float, random_state: int = None)

Appends imbalanced versions of datasets with predefined imbalance ratios to self.content_.

\[IR = \frac{|C_{maj}|}{|C_{min}|}\]
Parameters:
imbalance_ratiofloat

Final Imbalance Ratio expected in the datasets.

random_stateint, RandomState instance, default=None

Control the randomization of the algorithm.

  • If int, random_state is the seed used by the random number generator;

  • If RandomState instance, random_state is the random number generator;

  • If None, the random number generator is the RandomState instance used by np.random.

Returns:
selfDatasets
items()
keys()
save(path, db_name)

Save datasets.

summarize_datasets()

Create a summary of the downloaded datasets.

Returns:
datasets_summarypd.DataFrame

Dataframe with summary statistics of all datasets.

values()