research.active_learning.ALWrapper¶
- class research.active_learning.ALWrapper(classifier=None, generator=None, init_clusterer=None, init_strategy='random', selection_strategy='entropy', max_iter=None, n_initial=100, increment=50, save_classifiers=False, save_test_scores=True, auto_load=True, test_size=None, evaluation_metric='accuracy', random_state=None)[source]¶
Class to perform Active Learning experiments.
This algorithm is an implementation of an Active Learning framework as presented in [1]. The initialization strategy is WIP.
- Parameters
- classifierclassifier object, default=None
Classifier to be used as Chooser and Predictor.
- generatorgenerator estimator, default=None
Generator to be used for artificial data generation within Active Learning iterations.
- init_clustererclusterer estimator, default=None
WIP
- init_strategyWIP, default=’random’
WIP
- selection_strategyfunction or {‘entropy’, ‘breaking_ties’, ‘random’}, default=’entropy’
Method used to quantify the chooser’s uncertainty level and select the instances to be added to the labeled/training dataset.
- max_iterint, default=None
Maximum number of iterations allowed.
- n_initialint, default=100
Number of observations to include in the initial training dataset.
- incrementint, default=50
Number of observations to be added to the training dataset at each iteration.
- save_classifiersbool, default=False
Save classifiers fit at each iteration. These classifiers are stored in a list
self.classifiers_.- save_test_scoresbool, default=True
If
True, test scores are saved in the listself.test_scores_. Size of the test set is defined with thetest_sizeparameter.- auto_loadbool, default=True
If True, the classifier with the best training score is saved in the method
self.classifier_. It’s the classifier object used in thepredictmethod.- test_sizefloat or int, default=None
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to 0.25.
- evaluation_metricstring, default=’accuracy’
Metric used to calculate the test scores. See
research.metricsfor info on available performance metrics.- random_stateint, RandomState instance, default=None
Control the randomization of the algorithm.
If int,
random_stateis the seed used by the random number generator;If
RandomStateinstance, random_state is the random number generator;If
None, the random number generator is theRandomStateinstance used bynp.random.
References
- 1
Fonseca, J., Douzas, G., Bacao, F. (2021). Increasing the Effectiveness of Active Learning: Introducing Artificial Data Generation in Active Learning for Land Use/Land Cover Classification. Remote Sensing, 13(13), 2619. https://doi.org/10.3390/rs13132619
Methods
fit(X, y)Run an Active Learning procedure from training set (X, y).
get_params([deep])Get parameters for this estimator.
load_best_classifier(X, y)Loads the best classifier in the
self.classifiers_list.predict(X)Predict class or regression value for X.
score(X, y[, sample_weight])Return the mean accuracy on the given test data and labels.
set_params(**params)Set the parameters of this estimator.
- fit(X, y)[source]¶
Run an Active Learning procedure from training set (X, y).
- Parameters
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
The target values (class labels) as integers or strings.
- Returns
- selfALWrapper
Completed Active Learning procedure
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- load_best_classifier(X, y)[source]¶
Loads the best classifier in the
self.classifiers_list.The best classifier is used in the predict method according to the performance metric passed.
- Parameters
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The test input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
The target values (class labels) as integers or strings.
- Returns
- selfALWrapper
Completed Active Learning procedure
- predict(X)[source]¶
Predict class or regression value for X.
For a classification model, the predicted class for each sample in X is returned. For a regression model, the predicted value based on X is returned.
- Parameters
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The test input samples.
- Returns
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
The predicted classes, or the predict values.
- score(X, y, sample_weight=None)¶
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns
- scorefloat
Mean accuracy of
self.predict(X)wrt. y.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.