mlresearch.datasets
.ImbalancedBinaryDatasets¶
- class mlresearch.datasets.ImbalancedBinaryDatasets(names: str | list = 'all', data_home: str = None, download_if_missing: bool = True)[source]¶
Class to download, transform and save binary class imbalanced datasets.
- MULTIPLICATION_FACTORS = [2, 3]¶
- fetch_breast_tissue()[source]¶
Download and transform the Breast Tissue Data Set. The minority class is identified as the car and fad labels and the majority class as the rest of the labels.
- fetch_cleveland()[source]¶
Download and transform the Heart Disease Cleveland Data Set. The minority class is identified as the positive label and the majority class as the negative label.
- fetch_dermatology()[source]¶
Download and transform the Dermatology Data Set. The minority class is identified as the positive label and the majority class as the negative label.
- fetch_ecoli()[source]¶
Download and transform the Ecoli Data Set. The minority class is identified as the pp label and the majority class as the rest of the labels.
- fetch_eucalyptus()[source]¶
Download and transform the Eucalyptus Data Set. The minority class is identified as the best label and the majority class as the rest of the labels.
- fetch_glass()[source]¶
Download and transform the Glass Identification Data Set. The minority class is identified as the 1 label and the majority class as the rest of the labels.
https://archive.ics.uci.edu/ml/datasets/glass+identification
- fetch_haberman()[source]¶
Download and transform the Haberman’s Survival Data Set. The minority class is identified as the 1 label and the majority class as the 0 label.
- fetch_heart()[source]¶
Download and transform the Heart Data Set. The minority class is identified as the 2 label and the majority class as the 1 label.
- fetch_iris()[source]¶
Download and transform the Iris Data Set. The minority class is identified as the 1 label and the majority class as the rest of the labels.
- fetch_led()[source]¶
Download and transform the LED Display Domain Data Set. The minority class is identified as the positive label and the majority class as the negative label.
- fetch_libras()[source]¶
Download and transform the Libras Movement Data Set. The minority class is identified as the 1 label and the majority class as the rest of the labels.
- fetch_liver()[source]¶
Download and transform the Liver Disorders Data Set. The minority class is identified as the 1 label and the majority class as the ‘2’ label.
- fetch_new_thyroid_1()[source]¶
Download and transform the Thyroid Disease Data Set. The minority class is identified as the positive label and the majority class as the negative label.
Note
The positive class was originally label 2.
- fetch_new_thyroid_2()[source]¶
Download and transform the Thyroid Disease Data Set. The minority class is identified as the positive label and the majority class as the negative label.
Note
The positive class was originally label 3.
- fetch_page_blocks()[source]¶
Download and transform the Page Blocks Data Set. The minority class is identified as the positive label and the majority class as the negative label.
- fetch_pima()[source]¶
Download and transform the Pima Indians Diabetes Data Set. The minority class is identified as the 1 label and the majority class as the ‘0’ label.
- fetch_vehicle()[source]¶
Download and transform the Vehicle Silhouettes Data Set. The minority class is identified as the 1 label and the majority class as the rest of the labels.
https://archive.ics.uci.edu/ml/datasets/Statlog+(Vehicle+Silhouettes)
- fetch_vowel()[source]¶
Download and transform the Vowel Recognition Data Set. The minority class is identified as the positive label and the majority class as the negative label.
- fetch_wine()[source]¶
Download and transform the Wine Data Set. The minority class is identified as the 2 label and the majority class as the rest of the labels.
- fetch_yeast()[source]¶
Download and transform the Yeast Data Set. The minority class is identified as the positive label and the majority class as the negative label.
- imbalance_datasets(imbalance_ratio: float, random_state: int = None)¶
Appends imbalanced versions of datasets with predefined imbalance ratios to
self.content_
.\[IR = \frac{|C_{maj}|}{|C_{min}|}\]- Parameters:
- imbalance_ratiofloat
Final Imbalance Ratio expected in the datasets.
- random_stateint, RandomState instance, default=None
Control the randomization of the algorithm.
If int,
random_state
is the seed used by the random number generator;If
RandomState
instance, random_state is the random number generator;If
None
, the random number generator is theRandomState
instance used bynp.random
.
- Returns:
- selfDatasets
- items()¶
- keys()¶
- save(path, db_name)¶
Save datasets.
- summarize_datasets()¶
Create a summary of the downloaded datasets.
- Returns:
- datasets_summarypd.DataFrame
Dataframe with summary statistics of all datasets.
- values()¶