pipeline package

Submodules

pipeline.argument_parser module

Handles the program arguments (default values, doc, …).

pipeline.argument_parser.get_args(args)

Creates the parser and returns it.

Parameters

args – List[String]; args to parse

Returns

ArgumentParser; arguments of the program

pipeline.datasets module

Contains all the dataset creation and preprocessing parts.

pipeline.datasets.get_dataset(name, scaler='Standard', ms_prop=0.2, ms_setting='mcar', ms_method='uniform', train_size=0.7, seed=0)

Downloads and returns the preprocessed dataset.

Parameters
  • name – String; name of the dataset (has to be in DATASETS)

  • scaler – “Standard” or “MinMax”; scaler to use

  • ms_prop – Float; proportion of missingness in the samples having missing values

  • ms_setting – ‘mcar’ or ‘mnar’; type of missingness

  • ms_method – ‘uniform’ or ‘random’; either to apply on all columns or only half of these

  • train_size – Float in [0, 1]; proportion of training samples

  • seed – Integer; seed to use for the preprocessing steps

Returns

  • (np.ndarray(Float), np.ndarray(Bool), np.ndarray(Float)); (train_samples, train_masks, train_targets)

  • (np.ndarray(Float), np.ndarray(Bool), np.ndarray(Float)); (test_samples, test_masks, test_targets)

  • Bool; True if it is a classification dataset

pipeline.metrics module

Contains all the metrics.

pipeline.metrics.dml_metric(imputed_samples_train, imputed_samples_test, targets_train, targets_test, classif, measures=10)

Computes the downstream machine-learning metric (NRMS for regression tasks, Accuracy for classification tasks) using random forests.

Parameters
  • imputed_samples_train – np.ndarray(Float); imputed train-set samples

  • imputed_samples_test – np.ndarray(Float); imputed test-set samples

  • targets_train – np.ndarray(Float); targets of the train set

  • targets_test – np.ndarray(Float); targets of the test set

  • classif – Bool; True for a classification task

  • measures – Integer; number of random forests to run for more precision

Returns

Float; downstream result using the imputation

pipeline.metrics.nrms(ground_truth_samples, imputed_samples, masks=None, sigma2=None)

Computes the NRMS score measured only on missing values.

Parameters
  • ground_truth_samples – np.ndarray(Float); ground_truth samples

  • imputed_samples – np.ndarray(Float); imputed samples to be evaluated

  • masks – np.ndarray(Bool); corresponding mask matrix

  • sigma2 – variances of columns for NRMS computation (default to real_data variances)

Returns

Float; nrms of the imputation

pipeline.utils module

Contains the helper functions and constants.

pipeline.utils.fix_seed(seed)

Fixes the seeds of numpy and torch

Parameters

seed – Integer; seed to use

Module contents

Contains the pipeline used to test models.