pipeline package¶

Submodules¶

pipeline.argument_parser module¶

Handles the program arguments (default values, doc, …).

pipeline.argument_parser.get_args(args)¶

Creates the parser and returns it.

Parameters: args – List[String]; args to parse
Returns: ArgumentParser; arguments of the program

pipeline.datasets module¶

Contains all the dataset creation and preprocessing parts.

pipeline.datasets.get_dataset(name, scaler='Standard', ms_prop=0.2, ms_setting='mcar', ms_method='uniform', train_size=0.7, seed=0)¶

Downloads and returns the preprocessed dataset.

Parameters

name – String; name of the dataset (has to be in DATASETS)
scaler – “Standard” or “MinMax”; scaler to use
ms_prop – Float; proportion of missingness in the samples having missing values
ms_setting – ‘mcar’ or ‘mnar’; type of missingness
ms_method – ‘uniform’ or ‘random’; either to apply on all columns or only half of these
train_size – Float in [0, 1]; proportion of training samples
seed – Integer; seed to use for the preprocessing steps

Returns

(np.ndarray(Float), np.ndarray(Bool), np.ndarray(Float)); (train_samples, train_masks, train_targets)
(np.ndarray(Float), np.ndarray(Bool), np.ndarray(Float)); (test_samples, test_masks, test_targets)
Bool; True if it is a classification dataset

pipeline.metrics module¶

Contains all the metrics.

pipeline.metrics.dml_metric(imputed_samples_train, imputed_samples_test, targets_train, targets_test, classif, measures=10)¶

Computes the downstream machine-learning metric (NRMS for regression tasks, Accuracy for classification tasks) using random forests.

Parameters

imputed_samples_train – np.ndarray(Float); imputed train-set samples
imputed_samples_test – np.ndarray(Float); imputed test-set samples
targets_train – np.ndarray(Float); targets of the train set
targets_test – np.ndarray(Float); targets of the test set
classif – Bool; True for a classification task
measures – Integer; number of random forests to run for more precision

Returns

Float; downstream result using the imputation

pipeline.metrics.nrms(ground_truth_samples, imputed_samples, masks=None, sigma2=None)¶

Computes the NRMS score measured only on missing values.

Parameters

ground_truth_samples – np.ndarray(Float); ground_truth samples
imputed_samples – np.ndarray(Float); imputed samples to be evaluated
masks – np.ndarray(Bool); corresponding mask matrix
sigma2 – variances of columns for NRMS computation (default to real_data variances)

Returns

Float; nrms of the imputation

pipeline.utils module¶

Contains the helper functions and constants.

pipeline.utils.fix_seed(seed)¶

Fixes the seeds of numpy and torch

Parameters: seed – Integer; seed to use

Module contents¶

Contains the pipeline used to test models.