API

This is the full API documentation of the research package.

research.active_learning

This submodule contains the code developed for experiments related to Active Learning.

active_learning.ALWrapper([classifier, …])

Class to perform Active Learning experiments.

active_learning.entropy(unlabeled_ids, …)

Sample selection based on Entropy selection criterion.

active_learning.breaking_ties(unlabeled_ids, …)

Sample selection based on breaking ties selection criterion.

active_learning.random(unlabeled_ids, increment)

Random sample selection.

research.datasets

Download, transform and simulate various datasets.

These classes were extracted from the utils.py script from AlgoWit’s publications repo, to which I have also contributed.

Link to related repo: https://github.com/AlgoWit/publications

datasets.Datasets([names])

Class to download and save datasets.

datasets.BinaryDatasets([names])

Class to download, transform and save binary class datasets.

datasets.ImbalancedBinaryDatasets([names])

Class to download, transform and save binary class imbalanced datasets.

datasets.ContinuousCategoricalDatasets([names])

Class to download, transform and save datasets with both continuous and categorical features.

datasets.RemoteSensingDatasets([names, …])

Class to download, transform and save remote sensing datasets.

research.metrics

This submodule contains various performance metrics/scorers that are not included in scikit-learn’s scorers’ dictionary. Additionally, an expanded dictionary of scorers (as compared with scikit-learn’s) is also provided.

Parts of this code was taken from the utils.py script from AlgoWit’s publications repo, to which I have also contributed.

Link to related repo: https://github.com/AlgoWit/publications

metrics.geometric_mean_score_macro(y_true, …)

Geometric mean score with macro average.

metrics.area_under_learning_curve(…)

Area under the learning curve.

metrics.data_utilization_rate(test_scores, …)

Data Utilization Rate.

metrics.ALScorer(score_func[, sign])

Methods

research.utils

This submodule contains a variety of general utility functions as well as tools used to format and prepare tables to incorporate into LaTeX code.

Additionally, an expanded (as compared with scikit-learn’s) dictionary of scorers is also provided.

This code was taken from the utils.py script from AlgoWit’s publications repo, to which I have also contributed.

Link to related repo: https://github.com/AlgoWit/publications

utils.generate_mean_std_tbl(mean_vals, std_vals)

Generate table that combines mean and sem values.

utils.generate_pvalues_tbl(tbl)

Format p-values.

utils.sort_tbl(tbl[, ds_order, ovrs_order, …])

Sort tables rows and columns.

utils.generate_paths(filepath)

Generate data, results and analysis paths.

utils.make_bold(row[, maximum, …])

Make bold the lowest or highest value(s).

utils.generate_mean_std_tbl_bold(mean_vals, …)

Generate table that combines mean and sem values.

utils.img_array_to_pandas(X, y)

Converts an image as numpy array (with ground truth) to a pandas dataframe

utils.load_datasets(data_dir[, suffix, …])

Load datasets from sqlite database and/or csv files.

utils.check_pipelines(objects_list, …)

Extract estimators and parameters grids.

utils.check_pipelines_wrapper(objects_list, …)

utils.load_plt_sns_configs([font_size])

Load LaTeX style configurations for Matplotlib/Seaborn Visualizations.

utils.val_to_color(col[, cmap])

Converts a column of values to hex-type colors.