This is the full API documentation of the mlresearch package.


Toolbox to develop research in Machine Learning.

ml-research is a library containing the implementation of various algorithms developed in Machine Learning research, as well as utilities to facilitate the formatting of pandas dataframes into LaTeX tables.


Print debugging information.


Module which contains Active Learning implementations.

active_learning.StandardAL([classifier, ...])

Standard Active Learning model with a random initial data selection

active_learning.AugmentationAL([classifier, ...])

Active Learning with pipelined Data Augmentation.


Download, transform and simulate various datasets.

datasets.BinaryDatasets([names, data_home, ...])

Class to download, transform and save binary class datasets.

datasets.ImbalancedBinaryDatasets([names, ...])

Class to download, transform and save binary class imbalanced datasets.


Class to download, transform and save datasets with both continuous and categorical features.

datasets.MultiClassDatasets([names, ...])

Class to download, transform and save multi-class datasets.

datasets.RemoteSensingDatasets([names, ...])

Class to download, transform and save remote sensing datasets.


This module contains several functions to prepare and format tables for LaTeX documents.

latex.format_table(table[, indices, ...])

Sort and rename rows and columns.

latex.make_bold(table[, maximum, threshold, ...])

Make bold the lowest or highest values, or values lower than, or higher than the passed value in threshold per row or column.

latex.make_mean_sem_table(mean_vals[, ...])

Generate table with rounded decimals, bold maximum/minimum values or values above/below a given threshold, and combine mean and sem values.

latex.export_longtable(table[, path, ...])

Exports a pandas dataframe to LaTeX (longtable) format.


This module contains various performance metrics/scorers that are not included in scikit-learn’s scorers’ dictionary. Additionally, an expanded dictionary of scorers (as compared with scikit-learn’s) is also provided.


Get a scorer from string.


Get the names of all available scorers.

metrics.geometric_mean_score_macro(y_true, ...)

Geometric mean score with macro average.

metrics.precision_at_k(y_true, y_score[, k, ...])

Calculate precision at k, where k is the number of relevant items to consider (sorted in descending order by its score).

metrics.area_under_learning_curve(metadata, ...)

Area under the learning curve.

metrics.data_utilization_rate(metadata[, ...])

Data Utilization Rate.

metrics.ALScorer(score_func[, sign])

Make an Active Learning scorer from a AL-specific metric or loss function.

metrics.AlphaPrecision(scorer_real[, alpha])

Measures synthetic data fidelity.

metrics.BetaRecall([scorer_synth, beta, ...])

Checks whether the synthetic data is diverse enough to cover the variability of real data, i.e., a model should be able to generate a wide variety of good samples.

metrics.Authenticity([metric, n_jobs])

Quantifies the rate by which a model generates new samples.


The mlresearch.model_selection module includes the model and parameter search methods.

model_selection.ModelSearchCV(estimators, ...)

Exhaustive search over specified parameter values for a collection of estimators.

model_selection.HalvingModelSearchCV(...[, ...])

Search over specified parameter values for a collection of estimators with successive halving.



Unsupervised One-Class neural network.


Data preprocessing methods adapted or modified from sklearn.

preprocessing.PipelineEncoder([features, ...])

Pipeline-compatible wrapper of Scikit-learn's Transformer objects.


Module which contains the implementation of variations of oversampling/data augmentation algorithms, as well as helper classes to use oversampling algorithms as data augmentation techniques.


Class to to perform over-sampling using Geometric SMOTE.


A wrapper to facilitate the use of imblearn.over_sampling objects for data augmentation.


This module contains a variety of general utility functions and tools used to format and prepare tables to incorporate into LaTeX code.

utils.image_to_dataframe(X[, y, bands, ...])

Converts an image array (height, width, bands) to a pandas dataframe (height * width, bands).

utils.dataframe_to_image(df[, bands, ...])

Converts a pandas dataframe to an image.

utils.load_datasets(data_dir[, prefix, ...])

Load all datasets in a directory from sqlite databases and/or csv files.

utils.check_pipelines(*objects_list, ...)

Extract estimators and parameter grids to be passed to ModelSearchCV.

utils.check_pipelines_wrapper(*objects_list, ...)

Extract estimators within a wrapper object and parameter grids to be passed to ModelSearchCV.

utils.check_random_states(random_state, n_runs)

Create random states for experiments.

utils.set_matplotlib_style([font_size, ...])

Load LaTeX-style configurations for Matplotlib Visualizations.


Returns a list of available fonts in the current system.


Check and display the available fonts in matplotlib.

utils.feature_to_color(col[, cmap])

Converts a column of values to hex-type colors.

utils.parallel_loop(function, iterable[, ...])

Parallelize a loop and optionally add a progress bar.


Generate data, results and analysis paths.