research.data_augmentation.OverSamplingAugmentation

class research.data_augmentation.OverSamplingAugmentation(oversampler=None, augmentation_strategy='oversampling', value=None, random_state=None)[source]

A wrapper to facilitate the use of imblearn.over_sampling objects for data augmentation.

Parameters
oversampleroversampler estimator, default=None

Over-sampler to be used for data augmentation.

augmentation_strategyfloat, dict or {‘oversampling’, ‘constant’, ‘proportional’} , default=’oversampling’

Specifies how the data augmentation is done.

  • When float or int, each class’ frequency is augmented according to the specified ratio.

  • When oversampling, the data augmentation is done according to the sampling strategy passed in the oversampler object.

  • When constant, each class frequency is augmented to match the value passed in the parameter value.

  • When proportional, relative class frequencies are preserved and the number of samples in the dataset is matched with the value passed in the parameter value.

valueint, float, default=None

Value to be used as the new absolute frequency of each class. It is ignored unless the augmentation strategy is set to ‘constant’.

random_stateint, RandomState instance, default=None

Control the randomization of the algorithm.

  • If int, random_state is the seed used by the random number generator;

  • If RandomState instance, random_state is the random number generator;

  • If None, the random number generator is the RandomState instance used by np.random.

Methods

fit(X, y)

Check inputs and statistics of the sampler.

fit_resample(X, y, **fit_params)

Resample the dataset.

get_params([deep])

Get parameters for this estimator.

set_params(**params)

Set the parameters of this estimator.


fit(X, y)[source]

Check inputs and statistics of the sampler.

You should use fit_resample in all cases.

Parameters
X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features)

Data array.

yarray-like of shape (n_samples,)

Target array.

Returns
selfobject

Return the instance itself.

fit_resample(X, y, **fit_params)[source]

Resample the dataset.

Parameters
X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features)

Matrix containing the data which have to be sampled.

yarray-like of shape (n_samples,)

Corresponding label for each sample in X.

Returns
X_resampled{array-like, dataframe, sparse matrix} of shape (n_samples_new, n_features)

The array containing the resampled data.

y_resampledarray-like of shape (n_samples_new,)

The corresponding label of X_resampled.

get_params(deep=True)

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.