research.active_learning.ALWrapper

class research.active_learning.ALWrapper(classifier=None, generator=None, init_clusterer=None, init_strategy='random', selection_strategy='entropy', max_iter=None, n_initial=100, increment=50, save_classifiers=False, save_test_scores=True, auto_load=True, test_size=None, evaluation_metric='accuracy', random_state=None)[source]

Class to perform Active Learning experiments.

This algorithm is an implementation of an Active Learning framework as presented in [1]. The initialization strategy is WIP.

Parameters
classifierclassifier object, default=None

Classifier to be used as Chooser and Predictor.

generatorgenerator estimator, default=None

Generator to be used for artificial data generation within Active Learning iterations.

init_clustererclusterer estimator, default=None

WIP

init_strategyWIP, default=’random’

WIP

selection_strategyfunction or {‘entropy’, ‘breaking_ties’, ‘random’}, default=’entropy’

Method used to quantify the chooser’s uncertainty level and select the instances to be added to the labeled/training dataset.

max_iterint, default=None

Maximum number of iterations allowed.

n_initialint, default=100

Number of observations to include in the initial training dataset.

incrementint, default=50

Number of observations to be added to the training dataset at each iteration.

save_classifiersbool, default=False

Save classifiers fit at each iteration. These classifiers are stored in a list self.classifiers_.

save_test_scoresbool, default=True

If True, test scores are saved in the list self.test_scores_. Size of the test set is defined with the test_size parameter.

auto_loadbool, default=True

If True, the classifier with the best training score is saved in the method self.classifier_. It’s the classifier object used in the predict method.

test_sizefloat or int, default=None

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to 0.25.

evaluation_metricstring, default=’accuracy’

Metric used to calculate the test scores. See research.metrics for info on available performance metrics.

random_stateint, RandomState instance, default=None

Control the randomization of the algorithm.

  • If int, random_state is the seed used by the random number generator;

  • If RandomState instance, random_state is the random number generator;

  • If None, the random number generator is the RandomState instance used by np.random.

References

1

Fonseca, J., Douzas, G., Bacao, F. (2021). Increasing the Effectiveness of Active Learning: Introducing Artificial Data Generation in Active Learning for Land Use/Land Cover Classification. Remote Sensing, 13(13), 2619. https://doi.org/10.3390/rs13132619

Methods

fit(X, y)

Run an Active Learning procedure from training set (X, y).

get_params([deep])

Get parameters for this estimator.

load_best_classifier(X, y)

Loads the best classifier in the self.classifiers_ list.

predict(X)

Predict class or regression value for X.

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_params(**params)

Set the parameters of this estimator.


fit(X, y)[source]

Run an Active Learning procedure from training set (X, y).

Parameters
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

The target values (class labels) as integers or strings.

Returns
selfALWrapper

Completed Active Learning procedure

get_params(deep=True)

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

load_best_classifier(X, y)[source]

Loads the best classifier in the self.classifiers_ list.

The best classifier is used in the predict method according to the performance metric passed.

Parameters
X{array-like, sparse matrix} of shape (n_samples, n_features)

The test input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

The target values (class labels) as integers or strings.

Returns
selfALWrapper

Completed Active Learning procedure

predict(X)[source]

Predict class or regression value for X.

For a classification model, the predicted class for each sample in X is returned. For a regression model, the predicted value based on X is returned.

Parameters
X{array-like, sparse matrix} of shape (n_samples, n_features)

The test input samples.

Returns
yarray-like of shape (n_samples,) or (n_samples, n_outputs)

The predicted classes, or the predict values.

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns
scorefloat

Mean accuracy of self.predict(X) wrt. y.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.