mlresearch.active_learning.StandardAL¶
- class mlresearch.active_learning.StandardAL(classifier: BaseEstimator | ClassifierMixin | None = None, acquisition_func=None, n_init: int | float | None = None, budget: int | float | None = None, max_iter: int | None = None, evaluation_metric=None, continue_training: bool = False, random_state: int | None = None)[source]¶
Standard Active Learning model with a random initial data selection
- Parameters:
- classifierclassifier object, default=None
Classifier or pipeline to be trained in the iterative process. If None, defaults to sklearn’s RandomForestClassifier with default parameters and uses the
random_statepassed in the Active Learning model.- acquisition_funcfunction or {‘entropy’, ‘breaking_ties’, ‘random’}, default=None
Method used to quantify the prediction’s uncertainty level. All predefined functions are set up so that a higher value means higher uncertainty (higher likelihood of selection) and vice-versa. The uncertainty estimate is used to select the instances to be added to the labeled/training dataset. Acquisition functions may be added or changed in the
UNCERTAINTY_FUNCTIONSdictionary. If None, defaults to “random”.- n_initint or float, default=None
Number of observations to include in the initial training dataset. If
n_init< 1, then the corresponding percentage of the original dataset will be used as the initial training set. If None, defaults to 2% of the size of the original dataset.- budgetint or float, default=None
Number of observations to be added to the training dataset at each iteration. If
budget< 1, then the corresponding percentage of the original dataset will be used as the initial training set. If None, defaults to 2% of the size of the original dataset.- max_iterint, default=None
Maximum number of iterations allowed. If None, the experiment will run until 100% of the dataset is added to the training set.
- evaluation_metricstring, default=’accuracy’
Metric used to calculate the test scores. See
research.metricsfor info on available performance metrics.- continue_trainingbool, default=False
If
False, fit a new classifier at each iteration. IfTrue, the classifier fitted in the previous iteration is used for further training in subsequent iterations.- random_stateint, RandomState instance, default=None
Control the randomization of the algorithm.
If int,
random_stateis the seed used by the random number generator;If
RandomStateinstance, random_state is the random number generator;If
None, the random number generator is theRandomStateinstance used bynp.random.
- Attributes:
- acquisition_func_function
Method used to calculate the classification uncertainty at each iteration.
- evaluation_metric_scorer
Metric used to estimate the performance of the AL classifier at each iteration.
- classifier_estimator object
The classifier used in the iterative process. It is the classifier fitted in the last iteration.
- metadata_dict
Contains the performance estimations, classifiers, labeled pool mask and original dataset.
- n_init_int
Number of observations included in the initial training dataset.
- budget_int
Number of observations to be added to the training set per iteration. Also known as budget.
- max_iter_int
Maximum number of iterations allowed.
- labeled_pool_array-like of shape (n_samples,)
Mask that filters the labeled observations from the original dataset.
- fit(X, y, **kwargs)¶
Fit an Active Learning model from training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
The target values (class labels) as integers or strings.
- Returns:
- selfActive Learning Classifier
Fitted Active Learning model.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- initialization(X, y, initial_selection=None, **kwargs)¶
- iteration(X, y, **kwargs)¶
- predict(X)¶
Predict class or regression value for X.
For a classification model, the predicted class for each sample in X is returned. For a regression model, the predicted value based on X is returned.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The test input samples.
- Returns:
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
The predicted classes, or the predict values.
- score(X, y, sample_weight=None)¶
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)wrt. y.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.