mlresearch.active_learning
.AugmentationAL¶
- class mlresearch.active_learning.AugmentationAL(classifier: BaseEstimator | ClassifierMixin = None, generator: BaseOverSampler = None, param_grid: dict = None, cv=None, acquisition_func=None, n_init: int | float = None, budget: int | float = None, max_iter: int = None, evaluation_metric=None, continue_training: bool = False, random_state: int = None)[source]¶
Active Learning with pipelined Data Augmentation. This method is implemented and analysed in a working paper.
- Parameters:
- classifierclassifier object, default=None
Classifier or pipeline to be trained in the iterative process. If None, defaults to sklearn’s RandomForestClassifier with default parameters and uses the
random_state
passed in the Active Learning model.- generatorgenerator estimator, default=None
Generator to be used for artificial data generation within Active Learning iterations.
- param_griddict or list of dictionaries
Used to optimize the classifier and generator hyperparameters at each iteration via cross-validated grid-search. If None, parameter tuning is skipped. Dictionary with parameters names (
str
) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.- cvint, cross-validation generator or an iterable, default=None
Determines the cross-validation splitting strategy. Used to optimize the classifier and generator hyperparameters at each iteration. Possible inputs for cv are:
None, to use the default 5-fold cross validation,
integer, to specify the number of folds in a (Stratified)KFold,
CV splitter.
For integer/None inputs, if the estimator is a classifier and
y
is either binary or multiclass,StratifiedKFold
is used. In all other cases,KFold
is used. These splitters are instantiated with shuffle=False so the splits will be the same across calls.- acquisition_funcfunction or {‘entropy’, ‘breaking_ties’, ‘random’}, default=None
Method used to quantify the prediction’s uncertainty level. All predefined functions are set up so that a higher value means higher uncertainty (higher likelihood of selection) and vice-versa. The uncertainty estimate is used to select the instances to be added to the labeled/training dataset. Acquisition functions may be added or changed in the
UNCERTAINTY_FUNCTIONS
dictionary. If None, defaults to “random”.- n_initint or float, default=None
Number of observations to include in the initial training dataset. If
n_init
< 1, then the corresponding percentage of the original dataset will be used as the initial training set. If None, defaults to 2% of the size of the original dataset.- budgetint or float, default=None
Number of observations to be added to the training dataset at each iteration. If
budget
< 1, then the corresponding percentage of the original dataset will be used as the initial training set. If None, defaults to 2% of the size of the original dataset.- max_iterint, default=None
Maximum number of iterations allowed. If None, the experiment will run until 100% of the dataset is added to the training set.
- evaluation_metricstring, default=’accuracy’
Metric used to calculate the test scores. See
mlresearch.metrics
for info on available performance metrics.- continue_trainingbool, default=False
If
False
, fit a new classifier at each iteration. IfTrue
, the classifier fitted in the previous iteration is used for further training in subsequent iterations.- random_stateint, RandomState instance, default=None
Control the randomization of the algorithm.
If int,
random_state
is the seed used by the random number generator;If
RandomState
instance, random_state is the random number generator;If
None
, the random number generator is theRandomState
instance used bynp.random
.
- Attributes:
- acquisition_func_function
Method used to calculate the classification uncertainty at each iteration.
- evaluation_metric_scorer
Metric used to estimate the performance of the AL classifier at each iteration.
- classifier_estimator object
The classifier used in the iterative process. It is the classifier fitted in the last iteration.
- metadata_dict
Contains the performance estimations, classifiers, labeled pool mask and original dataset.
- n_init_int
Number of observations included in the initial training dataset.
- budget_int
Number of observations to be added to the training set per iteration.
- max_iter_int
Maximum number of iterations allowed.
- labeled_pool_array-like of shape (n_samples,)
Mask that filters the labeled observations from the original dataset.
- fit(X, y, **kwargs)¶
Fit an Active Learning model from training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
The target values (class labels) as integers or strings.
- Returns:
- selfActive Learning Classifier
Fitted Active Learning model.
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- initialization(X, y, initial_selection=None, **kwargs)¶
- iteration(X, y, **kwargs)¶
- predict(X)¶
Predict class or regression value for X.
For a classification model, the predicted class for each sample in X is returned. For a regression model, the predicted value based on X is returned.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The test input samples.
- Returns:
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
The predicted classes, or the predict values.
- score(X, y, sample_weight=None)¶
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') AugmentationAL ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
- Returns:
- selfobject
The updated object.