API

This is the full API documentation of the mlresearch package.

mlresearch.active_learning

Module which contains Active Learning implementations.

active_learning.StandardAL([classifier, ...])

Standard Active Learning model with a random initial data selection

active_learning.AugmentationAL([classifier, ...])

Active Learning with pipelined Data Augmentation.

mlresearch.datasets

Download, transform and simulate various datasets.

datasets.BinaryDatasets([names, data_home, ...])

Class to download, transform and save binary class datasets.

datasets.ImbalancedBinaryDatasets([names, ...])

Class to download, transform and save binary class imbalanced datasets.

datasets.ContinuousCategoricalDatasets([...])

Class to download, transform and save datasets with both continuous and categorical features.

datasets.MultiClassDatasets([names, ...])

Class to download, transform and save multi-class datasets.

datasets.RemoteSensingDatasets([names, ...])

Class to download, transform and save remote sensing datasets.

mlresearch.latex

This module contains several functions to prepare and format tables for LaTeX documents.

latex.format_table(table[, indices, ...])

Sort and rename rows and columns.

latex.make_bold(table[, maximum, threshold, ...])

Make bold the lowest or highest values, or values lower than, or higher than the passed value in threshold per row or column.

latex.make_mean_sem_table(mean_vals[, ...])

Generate table with rounded decimals, bold maximum/minimum values or values above/below a given threshold, and combine mean and sem values.

latex.export_longtable(table[, path, ...])

Exports a pandas dataframe to LaTeX (longtable) format.

mlresearch.metrics

This module contains various performance metrics/scorers that are not included in scikit-learn’s scorers’ dictionary. Additionally, an expanded dictionary of scorers (as compared with scikit-learn’s) is also provided.

metrics.get_scorer(scoring)

Get a scorer from string.

metrics.get_scorer_names()

Get the names of all available scorers.

metrics.geometric_mean_score_macro(y_true, ...)

Geometric mean score with macro average.

metrics.precision_at_k(y_true, y_score[, k, ...])

Calculate precision at k, where k is the number of relevant items to consider (sorted in descending order by its score).

metrics.area_under_learning_curve(metadata, ...)

Area under the learning curve.

metrics.data_utilization_rate(metadata[, ...])

Data Utilization Rate.

metrics.ALScorer(score_func[, sign])

Make an Active Learning scorer from a AL-specific metric or loss function.

metrics.AlphaPrecision(scorer_real[, alpha])

Measures synthetic data fidelity.

metrics.BetaRecall([scorer_synth, beta, ...])

Checks whether the synthetic data is diverse enough to cover the variability of real data, i.e., a model should be able to generate a wide variety of good samples.

metrics.Authenticity([metric, n_jobs])

Quantifies the rate by which a model generates new samples.

mlresearch.neural_network

neural_network.OneClassMLP([...])

Unsupervised One-Class neural network.

mlresearch.preprocessing

Data preprocessing methods adapted or modified from sklearn.

preprocessing.PipelineEncoder([features, ...])

Pipeline-compatible wrapper of Scikit-learn's Transformer objects.

mlresearch.synthetic_data

Module which contains the implementation of variations of oversampling/data augmentation algorithms, as well as helper classes to use oversampling algorithms as data augmentation techniques.

synthetic_data.GeometricSMOTE([...])

Class to to perform over-sampling using Geometric SMOTE.

synthetic_data.OverSamplingAugmentation([...])

A wrapper to facilitate the use of imblearn.over_sampling objects for data augmentation.

mlresearch.utils

This module contains a variety of general utility functions and tools used to format and prepare tables to incorporate into LaTeX code.

utils.image_to_dataframe(X[, y, bands, ...])

Converts an image array (height, width, bands) to a pandas dataframe (height * width, bands).

utils.dataframe_to_image(df[, bands, ...])

Converts a pandas dataframe to an image.

utils.load_datasets(data_dir[, prefix, ...])

Load all datasets in a directory from sqlite databases and/or csv files.

utils.check_pipelines(*objects_list, ...)

Extract estimators and parameter grids to be passed to ModelSearchCV.

utils.check_pipelines_wrapper(*objects_list, ...)

Extract estimators within a wrapper object and parameter grids to be passed to ModelSearchCV.

utils.check_random_states(random_state, n_runs)

Create random states for experiments.

utils.set_matplotlib_style([font_size, ...])

Load LaTeX-style configurations for Matplotlib Visualizations.

utils.feature_to_color(col[, cmap])

Converts a column of values to hex-type colors.

utils.parallel_loop(function, iterable[, ...])

Parallelize a loop and optionally add a progress bar.

utils.generate_paths(filepath)

Generate data, results and analysis paths.