research.data_augmentation.OverSamplingAugmentation¶
- class research.data_augmentation.OverSamplingAugmentation(oversampler=None, augmentation_strategy='oversampling', value=None, random_state=None)[source]¶
A wrapper to facilitate the use of imblearn.over_sampling objects for data augmentation.
- Parameters
- oversampleroversampler estimator, default=None
Over-sampler to be used for data augmentation.
- augmentation_strategyfloat, dict or {‘oversampling’, ‘constant’, ‘proportional’} , default=’oversampling’
Specifies how the data augmentation is done.
When
floatorint, each class’ frequency is augmented according to the specified ratio (which is equivalent to theproportionalstrategy).When
oversampling, the data augmentation is done according to the sampling strategy passed in theoversamplerobject. Ifvalueis not None, then the number of samples generated for each class equals the number of samples in the majority class multiplied byvalue.When
constant, each class frequency is augmented to match the value passed in the parametervalue.When
proportional, relative class frequencies are preserved and the number of samples in the dataset is matched with the value passed in the parametervalue.
- valueint, float, default=None
Value to be used as the new frequency of each class. It is ignored unless the augmentation strategy is set to
constantoroversampling.- random_stateint, RandomState instance, default=None
Control the randomization of the algorithm.
If int,
random_stateis the seed used by the random number generator;If
RandomStateinstance, random_state is the random number generator;If
None, the random number generator is theRandomStateinstance used bynp.random.
Methods
fit(X, y)Check inputs and statistics of the sampler.
fit_resample(X, y, **fit_params)Resample the dataset.
get_params([deep])Get parameters for this estimator.
set_params(**params)Set the parameters of this estimator.
- fit(X, y)[source]¶
Check inputs and statistics of the sampler.
You should use
fit_resamplein all cases.- Parameters
- X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features)
Data array.
- yarray-like of shape (n_samples,)
Target array.
- Returns
- selfobject
Return the instance itself.
- fit_resample(X, y, **fit_params)[source]¶
Resample the dataset.
- Parameters
- X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features)
Matrix containing the data which have to be sampled.
- yarray-like of shape (n_samples,)
Corresponding label for each sample in X.
- Returns
- X_resampled{array-like, dataframe, sparse matrix} of shape (n_samples_new, n_features)
The array containing the resampled data.
- y_resampledarray-like of shape (n_samples_new,)
The corresponding label of X_resampled.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.