mlresearch.synthetic_data.OverSamplingAugmentation

class mlresearch.synthetic_data.OverSamplingAugmentation(oversampler=None, augmentation_strategy='oversampling', value=None, random_state=None)[source]

A wrapper to facilitate the use of imblearn.over_sampling objects for data augmentation.

Parameters:
oversampleroversampler estimator, default=None

Over-sampler to be used for data augmentation.

augmentation_strategyfloat, dict or {‘oversampling’, ‘constant’, ‘proportional’} , default=’oversampling’

Specifies how the data augmentation is done.

  • When float or int, each class’ frequency is augmented according to the specified ratio (which is equivalent to the proportional strategy).

  • When oversampling, the data augmentation is done according to the sampling strategy passed in the oversampler object. If value is not None, then the number of samples generated for each class equals the number of samples in the majority class multiplied by value.

  • When constant, each class frequency is augmented to match the value passed in the parameter value.

  • When proportional, relative class frequencies are preserved and the number of samples in the dataset is matched with the value passed in the parameter value.

valueint, float, default=None

Value to be used as the new frequency of each class. It is ignored unless the augmentation strategy is set to constant or oversampling.

random_stateint, RandomState instance, default=None

Control the randomization of the algorithm.

  • If int, random_state is the seed used by the random number generator;

  • If RandomState instance, random_state is the random number generator;

  • If None, the random number generator is the RandomState instance used by np.random.

Attributes:
sampling_strategy_dict

Dictionary containing the information to sample the dataset. The keys corresponds to the class labels from which to sample and the values are the number of samples to sample.

n_features_in_int

Number of features in the input dataset.


fit(X, y)[source]

Check inputs and statistics of the sampler.

You should use fit_resample in all cases.

Parameters:
X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features)

Data array.

yarray-like of shape (n_samples,)

Target array.

Returns:
selfobject

Return the instance itself.

fit_resample(X, y, **fit_params)[source]

Resample the dataset.

Parameters:
X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features)

Matrix containing the data which have to be sampled.

yarray-like of shape (n_samples,)

Corresponding label for each sample in X.

Returns:
X_resampled{array-like, dataframe, sparse matrix} of shape (n_samples_new, n_features)

The array containing the resampled data.

y_resampledarray-like of shape (n_samples_new,)

The corresponding label of X_resampled.

get_feature_names_out(input_features=None)

Get output feature names for transformation.

Parameters:
input_featuresarray-like of str or None, default=None

Input features.

  • If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].

  • If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns:
feature_names_outndarray of str objects

Same as input features.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.