API¶

This is the full API documentation of the mlresearch package.

`mlresearch`¶

Toolbox to develop research in Machine Learning.

ml-research is a library containing the implementation of various algorithms developed in Machine Learning research, as well as utilities to facilitate the formatting of pandas dataframes into LaTeX tables.

show_versions([github])

Print debugging information.

`mlresearch.active_learning`¶

Module which contains Active Learning implementations.

`active_learning.StandardAL`([classifier, ...])	Standard Active Learning model with a random initial data selection
`active_learning.AugmentationAL`([classifier, ...])	Active Learning with pipelined Data Augmentation.

`mlresearch.datasets`¶

Download, transform and simulate various datasets.

`datasets.BinaryDatasets`([names, data_home, ...])	Class to download, transform and save binary class datasets.
`datasets.ImbalancedBinaryDatasets`([names, ...])	Class to download, transform and save binary class imbalanced datasets.
`datasets.ContinuousCategoricalDatasets`([...])	Class to download, transform and save datasets with both continuous and categorical features.
`datasets.MultiClassDatasets`([names, ...])	Class to download, transform and save multi-class datasets.
`datasets.RemoteSensingDatasets`([names, ...])	Class to download, transform and save remote sensing datasets.

`mlresearch.latex`¶

This module contains several functions to prepare and format tables for LaTeX documents.

`latex.format_table`(table[, indices, ...])	Sort and rename rows and columns.
`latex.make_bold`(table[, maximum, threshold, ...])	Make bold the lowest or highest values, or values lower than, or higher than the passed value in `threshold` per row or column.
`latex.make_mean_sem_table`(mean_vals[, ...])	Generate table with rounded decimals, bold maximum/minimum values or values above/below a given threshold, and combine mean and sem values.
`latex.export_table`(table[, path, caption, ...])	Exports a pandas dataframe to LaTeX (tabular or longtable) format.

`mlresearch.metrics`¶

This module contains various performance metrics/scorers that are not included in scikit-learn’s scorers’ dictionary. Additionally, an expanded dictionary of scorers (as compared with scikit-learn’s) is also provided.

`metrics.get_scorer`(scoring)	Get a scorer from string.
`metrics.get_scorer_names`()	Get the names of all available scorers.
`metrics.geometric_mean_score_macro`(y_true, ...)	Geometric mean score with macro average.
`metrics.precision_at_k`(y_true, y_score[, k, ...])	Calculate precision at `k`, where `k` is the number of relevant items to consider (sorted in descending order by its score).
`metrics.area_under_learning_curve`(metadata, ...)	Area under the learning curve.
`metrics.data_utilization_rate`(metadata[, ...])	Data Utilization Rate.

`metrics.ALScorer`(score_func[, sign])	Make an Active Learning scorer from a AL-specific metric or loss function.
`metrics.AlphaPrecision`(scorer_real[, alpha])	Measures synthetic data fidelity.
`metrics.BetaRecall`([scorer_synth, beta, ...])	Checks whether the synthetic data is diverse enough to cover the variability of real data, i.e., a model should be able to generate a wide variety of good samples.
`metrics.Authenticity`([metric, n_jobs])	Quantifies the rate by which a model generates new samples.

`mlresearch.model_selection`¶

The mlresearch.model_selection module includes the model and parameter search methods.

`model_selection.ModelSearchCV`(estimators, ...)	Exhaustive search over specified parameter values for a collection of estimators.
`model_selection.HalvingModelSearchCV`(...[, ...])	Search over specified parameter values for a collection of estimators with successive halving.

`mlresearch.neural_network`¶

neural_network.OneClassMLP([...])

Unsupervised One-Class neural network.

`mlresearch.preprocessing`¶

Data preprocessing methods adapted or modified from sklearn.

preprocessing.PipelineEncoder([features, ...])

Pipeline-compatible wrapper of Scikit-learn's Transformer objects.

`mlresearch.synthetic_data`¶

Module which contains the implementation of variations of oversampling/data augmentation algorithms, as well as helper classes to use oversampling algorithms as data augmentation techniques.

`synthetic_data.GeometricSMOTE`([...])	Class to to perform over-sampling using Geometric SMOTE.
`synthetic_data.OverSamplingAugmentation`([...])	A wrapper to facilitate the use of imblearn.over_sampling objects for data augmentation.

`mlresearch.utils`¶

This module contains a variety of general utility functions and tools used to format and prepare tables to incorporate into LaTeX code.

`utils.image_to_dataframe`(X[, y, bands, ...])	Converts an image array (height, width, bands) to a pandas dataframe (height * width, bands).
`utils.dataframe_to_image`(df[, bands, ...])	Converts a pandas dataframe to an image.
`utils.load_datasets`(data_dir[, prefix, ...])	Load all datasets in a directory from sqlite databases and/or csv files.
`utils.check_pipelines`(*objects_list, ...)	Extract estimators and parameter grids to be passed to ModelSearchCV.
`utils.check_pipelines_wrapper`(*objects_list, ...)	Extract estimators within a wrapper object and parameter grids to be passed to ModelSearchCV.
`utils.check_random_states`(random_state, n_runs)	Create random states for experiments.
`utils.set_matplotlib_style`([font_size, ...])	Load LaTeX-style configurations for Matplotlib Visualizations.
`utils.list_available_fonts`()	Returns a list of available fonts in the current system.
`utils.display_available_fonts`([...])	Check and display the available fonts in matplotlib.
`utils.feature_to_color`(col[, cmap])	Converts a column of values to hex-type colors.
`utils.parallel_loop`(function, iterable[, ...])	Parallelize a loop and optionally add a progress bar.
`utils.generate_paths`(filepath)	Generate data, results and analysis paths.