Skip to content

ML-Research mlresearch.datasets.BinaryDatasets

`mlresearch.datasets`.BinaryDatasets¶

class mlresearch.datasets.BinaryDatasets(names: str | list = 'all', data_home: str = None, download_if_missing: bool = True)[source]¶

Class to download, transform and save binary class datasets.

download(keep_index=False)¶: Download the datasets.

fetch_arcene()[source]¶

Download and transform the Arcene Data Set.

https://archive.ics.uci.edu/ml/datasets/Arcene

fetch_audit()[source]¶

Download and transform the Audit Data Set.

https://archive.ics.uci.edu/ml/datasets/Audit+Data

fetch_banknote_authentication()[source]¶

Download and transform the Banknote Authentication Data Set.

https://archive.ics.uci.edu/ml/datasets/banknote+authentication

fetch_breast_cancer()[source]¶

Download and transform the Breast Cancer Wisconsin Data Set.

https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)

fetch_ionosphere()[source]¶

Download and transform the Ionosphere Data Set.

https://archive.ics.uci.edu/ml/datasets/ionosphere

fetch_parkinsons()[source]¶

Download and transform the Parkinsons Data Set.

https://archive.ics.uci.edu/ml/datasets/parkinsons

fetch_spambase()[source]¶

Download and transform the Spambase Data Set.

https://archive.ics.uci.edu/ml/datasets/Spambase

imbalance_datasets(imbalance_ratio: float, random_state: int = None)¶

Appends imbalanced versions of datasets with predefined imbalance ratios to self.content_.

\[IR = \frac{|C_{maj}|}{|C_{min}|}\]

Parameters:

imbalance_ratiofloat

Final Imbalance Ratio expected in the datasets.

random_stateint, RandomState instance, default=None

Control the randomization of the algorithm.

If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used by np.random.

Returns:

selfDatasets

items()¶

keys()¶

save(path, db_name)¶: Save datasets.

summarize_datasets()¶

Create a summary of the downloaded datasets.

Returns:

datasets_summarypd.DataFrame: Dataframe with summary statistics of all datasets.

values()¶