mlresearch.active_learning.StandardAL

class mlresearch.active_learning.StandardAL(classifier: BaseEstimator | ClassifierMixin | None = None, acquisition_func=None, n_init: int | float | None = None, budget: int | float | None = None, max_iter: int | None = None, evaluation_metric=None, continue_training: bool = False, random_state: int | None = None)[source]

Standard Active Learning model with a random initial data selection

Parameters:
classifierclassifier object, default=None

Classifier or pipeline to be trained in the iterative process. If None, defaults to sklearn’s RandomForestClassifier with default parameters and uses the random_state passed in the Active Learning model.

acquisition_funcfunction or {‘entropy’, ‘breaking_ties’, ‘random’}, default=None

Method used to quantify the prediction’s uncertainty level. All predefined functions are set up so that a higher value means higher uncertainty (higher likelihood of selection) and vice-versa. The uncertainty estimate is used to select the instances to be added to the labeled/training dataset. Acquisition functions may be added or changed in the UNCERTAINTY_FUNCTIONS dictionary. If None, defaults to “random”.

n_initint or float, default=None

Number of observations to include in the initial training dataset. If n_init < 1, then the corresponding percentage of the original dataset will be used as the initial training set. If None, defaults to 2% of the size of the original dataset.

budgetint or float, default=None

Number of observations to be added to the training dataset at each iteration. If budget < 1, then the corresponding percentage of the original dataset will be used as the initial training set. If None, defaults to 2% of the size of the original dataset.

max_iterint, default=None

Maximum number of iterations allowed. If None, the experiment will run until 100% of the dataset is added to the training set.

evaluation_metricstring, default=’accuracy’

Metric used to calculate the test scores. See research.metrics for info on available performance metrics.

continue_trainingbool, default=False

If False, fit a new classifier at each iteration. If True, the classifier fitted in the previous iteration is used for further training in subsequent iterations.

random_stateint, RandomState instance, default=None

Control the randomization of the algorithm.

  • If int, random_state is the seed used by the random number generator;

  • If RandomState instance, random_state is the random number generator;

  • If None, the random number generator is the RandomState instance used by np.random.

Attributes:
acquisition_func_function

Method used to calculate the classification uncertainty at each iteration.

evaluation_metric_scorer

Metric used to estimate the performance of the AL classifier at each iteration.

classifier_estimator object

The classifier used in the iterative process. It is the classifier fitted in the last iteration.

metadata_dict

Contains the performance estimations, classifiers, labeled pool mask and original dataset.

n_init_int

Number of observations included in the initial training dataset.

budget_int

Number of observations to be added to the training set per iteration. Also known as budget.

max_iter_int

Maximum number of iterations allowed.

labeled_pool_array-like of shape (n_samples,)

Mask that filters the labeled observations from the original dataset.


fit(X, y, **kwargs)

Fit an Active Learning model from training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

The target values (class labels) as integers or strings.

Returns:
selfActive Learning Classifier

Fitted Active Learning model.

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

initialization(X, y, initial_selection=None, **kwargs)
iteration(X, y, **kwargs)
predict(X)

Predict class or regression value for X.

For a classification model, the predicted class for each sample in X is returned. For a regression model, the predicted value based on X is returned.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The test input samples.

Returns:
yarray-like of shape (n_samples,) or (n_samples, n_outputs)

The predicted classes, or the predict values.

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) wrt. y.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.