mlresearch.datasets
.ContinuousCategoricalDatasets¶
- class mlresearch.datasets.ContinuousCategoricalDatasets(names: str | list = 'all', data_home: str = None, download_if_missing: bool = True)[source]¶
Class to download, transform and save datasets with both continuous and categorical features.
- fetch_contraceptive()[source]¶
Download and transform the Contraceptive Method Choice Data Set.
https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice
- fetch_german_credit()[source]¶
Download and transform the German Credit Data Set.
https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
- fetch_thyroid()[source]¶
Download and transform the Thyroid Disease Data Set. Label 0 corresponds to no disease found. Label 1 corresponds to one or multiple diseases found.
- imbalance_datasets(imbalance_ratio: float, random_state: int = None)¶
Appends imbalanced versions of datasets with predefined imbalance ratios to
self.content_
.\[IR = \frac{|C_{maj}|}{|C_{min}|}\]- Parameters:
- imbalance_ratiofloat
Final Imbalance Ratio expected in the datasets.
- random_stateint, RandomState instance, default=None
Control the randomization of the algorithm.
If int,
random_state
is the seed used by the random number generator;If
RandomState
instance, random_state is the random number generator;If
None
, the random number generator is theRandomState
instance used bynp.random
.
- Returns:
- selfDatasets
- items()¶
- keys()¶
- save(path, db_name)¶
Save datasets.
- summarize_datasets()[source]¶
Create a summary of the downloaded datasets.
- Returns:
- datasets_summarypd.DataFrame
Dataframe with summary statistics of all datasets.
- values()¶