pipeline package¶
Submodules¶
pipeline.argument_parser module¶
Handles the program arguments (default values, doc, …).
- pipeline.argument_parser.get_args(args)¶
Creates the parser and returns it.
- Parameters
args – List[String]; args to parse
- Returns
ArgumentParser; arguments of the program
pipeline.datasets module¶
Contains all the dataset creation and preprocessing parts.
- pipeline.datasets.get_dataset(name, scaler='Standard', ms_prop=0.2, ms_setting='mcar', ms_method='uniform', train_size=0.7, seed=0)¶
Downloads and returns the preprocessed dataset.
- Parameters
name – String; name of the dataset (has to be in DATASETS)
scaler – “Standard” or “MinMax”; scaler to use
ms_prop – Float; proportion of missingness in the samples having missing values
ms_setting – ‘mcar’ or ‘mnar’; type of missingness
ms_method – ‘uniform’ or ‘random’; either to apply on all columns or only half of these
train_size – Float in [0, 1]; proportion of training samples
seed – Integer; seed to use for the preprocessing steps
- Returns
(np.ndarray(Float), np.ndarray(Bool), np.ndarray(Float)); (train_samples, train_masks, train_targets)
(np.ndarray(Float), np.ndarray(Bool), np.ndarray(Float)); (test_samples, test_masks, test_targets)
Bool; True if it is a classification dataset
pipeline.metrics module¶
Contains all the metrics.
- pipeline.metrics.dml_metric(imputed_samples_train, imputed_samples_test, targets_train, targets_test, classif, measures=10)¶
Computes the downstream machine-learning metric (NRMS for regression tasks, Accuracy for classification tasks) using random forests.
- Parameters
imputed_samples_train – np.ndarray(Float); imputed train-set samples
imputed_samples_test – np.ndarray(Float); imputed test-set samples
targets_train – np.ndarray(Float); targets of the train set
targets_test – np.ndarray(Float); targets of the test set
classif – Bool; True for a classification task
measures – Integer; number of random forests to run for more precision
- Returns
Float; downstream result using the imputation
- pipeline.metrics.nrms(ground_truth_samples, imputed_samples, masks=None, sigma2=None)¶
Computes the NRMS score measured only on missing values.
- Parameters
ground_truth_samples – np.ndarray(Float); ground_truth samples
imputed_samples – np.ndarray(Float); imputed samples to be evaluated
masks – np.ndarray(Bool); corresponding mask matrix
sigma2 – variances of columns for NRMS computation (default to real_data variances)
- Returns
Float; nrms of the imputation
pipeline.utils module¶
Contains the helper functions and constants.
- pipeline.utils.fix_seed(seed)¶
Fixes the seeds of numpy and torch
- Parameters
seed – Integer; seed to use
Module contents¶
Contains the pipeline used to test models.