API

This is the full API documentation of the research package.

research.active_learning

This submodule contains the code developed for experiments related to Active Learning.

active_learning.ALSimulation([classifier, ...])

Class to simulate Active Learning experiments.

research.data_augmentation

Contains the implementation of variations of oversampling/data augmentation algorithms, as well as helper classes to use oversampling algorithms as data augmentation techniques.

data_augmentation.GeometricSMOTE([...])

Class to to perform over-sampling using Geometric SMOTE.

data_augmentation.OverSamplingAugmentation([...])

A wrapper to facilitate the use of imblearn.over_sampling objects for data augmentation.

research.datasets

Download, transform and simulate various datasets.

These classes were extracted from the utils.py script from AlgoWit’s publications repo, to which I have also contributed.

Link to related repo: https://github.com/AlgoWit/publications

datasets.Datasets([names])

Base class to download and save datasets.

datasets.BinaryDatasets([names])

Class to download, transform and save binary class datasets.

datasets.ImbalancedBinaryDatasets([names])

Class to download, transform and save binary class imbalanced datasets.

datasets.ContinuousCategoricalDatasets([names])

Class to download, transform and save datasets with both continuous and categorical features.

datasets.MulticlassDatasets([names])

Class to download, transform and save multiclass datasets.

datasets.RemoteSensingDatasets([names, ...])

Class to download, transform and save remote sensing datasets.

research.metrics

This submodule contains various performance metrics/scorers that are not included in scikit-learn’s scorers’ dictionary. Additionally, an expanded dictionary of scorers (as compared with scikit-learn’s) is also provided.

Parts of this code was taken from the utils.py script from AlgoWit’s publications repo, to which I have also contributed.

Link to related repo: https://github.com/AlgoWit/publications

metrics.geometric_mean_score_macro(y_true, ...)

Geometric mean score with macro average.

metrics.area_under_learning_curve(...)

Area under the learning curve.

metrics.data_utilization_rate(test_scores, ...)

Data Utilization Rate.

metrics.ALScorer(score_func[, sign])

Make an Active Learning scorer from a AL-specific metric or loss function.

research.utils

This submodule contains a variety of general utility functions as well as tools used to format and prepare tables to incorporate into LaTeX code.

Additionally, an expanded (as compared with scikit-learn’s) dictionary of scorers is also provided.

This code was taken from the utils.py script from AlgoWit’s publications repo, to which I have also contributed.

Link to related repo: https://github.com/AlgoWit/publications

utils.generate_mean_std_tbl(mean_vals, std_vals)

Generate table that combines mean and sem values.

utils.generate_pvalues_tbl(tbl)

Format p-values.

utils.sort_tbl(tbl[, ds_order, ovrs_order, ...])

Sort tables rows and columns.

utils.generate_paths(filepath)

Generate data, results and analysis paths.

utils.make_bold(row[, maximum, ...])

Make bold the lowest or highest value(s).

utils.generate_mean_std_tbl_bold(mean_vals, ...)

Generate table that combines mean and sem values.

utils.img_array_to_pandas(X, y)

Converts an image as numpy array (with ground truth) to a pandas dataframe

utils.load_datasets(data_dir[, suffix, ...])

Load datasets from sqlite database and/or csv files.

utils.check_pipelines(objects_list, ...)

Extract estimators and parameters grids.

utils.check_pipelines_wrapper(objects_list, ...)

utils.load_plt_sns_configs([font_size])

Load LaTeX style configurations for Matplotlib/Seaborn Visualizations.

utils.val_to_color(col[, cmap])

Converts a column of values to hex-type colors.