`research.data_augmentation`.OverSamplingAugmentation¶

class research.data_augmentation.OverSamplingAugmentation(oversampler=None, augmentation_strategy='oversampling', value=None, random_state=None)[source]¶

A wrapper to facilitate the use of imblearn.over_sampling objects for data augmentation.

Parameters

oversampleroversampler estimator, default=None

Over-sampler to be used for data augmentation.

augmentation_strategyfloat, dict or {‘oversampling’, ‘constant’, ‘proportional’} , default=’oversampling’

Specifies how the data augmentation is done.

When float or int, each class’ frequency is augmented according to the specified ratio (which is equivalent to the proportional strategy).
When oversampling, the data augmentation is done according to the sampling strategy passed in the oversampler object. If value is not None, then the number of samples generated for each class equals the number of samples in the majority class multiplied by value.
When constant, each class frequency is augmented to match the value passed in the parameter value.
When proportional, relative class frequencies are preserved and the number of samples in the dataset is matched with the value passed in the parameter value.

valueint, float, default=None

Value to be used as the new frequency of each class. It is ignored unless the augmentation strategy is set to constant or oversampling.

random_stateint, RandomState instance, default=None

Control the randomization of the algorithm.

If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used by np.random.

Methods

`fit`(X, y)	Check inputs and statistics of the sampler.
`fit_resample`(X, y, **fit_params)	Resample the dataset.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.

fit(X, y)[source]¶

Check inputs and statistics of the sampler.

You should use fit_resample in all cases.

Parameters

X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features): Data array.
yarray-like of shape (n_samples,): Target array.

Returns

selfobject: Return the instance itself.

fit_resample(X, y, **fit_params)[source]¶

Resample the dataset.

Parameters

X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features): Matrix containing the data which have to be sampled.
yarray-like of shape (n_samples,): Corresponding label for each sample in X.

Returns

X_resampled{array-like, dataframe, sparse matrix} of shape (n_samples_new, n_features): The array containing the resampled data.
y_resampledarray-like of shape (n_samples_new,): The corresponding label of X_resampled.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict: Parameter names mapped to their values.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**paramsdict: Estimator parameters.

Returns

selfestimator instance: Estimator instance.

research.data_augmentation.OverSamplingAugmentation¶

`research.data_augmentation`.OverSamplingAugmentation¶