generators
muppet.components.perturbator.generator.base
Base generator classes for producing perturbation values in MUPPET XAI framework.
This module defines the foundation for generators used by perturbators to create realistic replacement values for masked regions during the perturbation process. Generators are essential components that enable sophisticated perturbation strategies beyond simple replacement with zeros or noise.
In the MUPPET four-step framework (generate masks → apply perturbations → calculate attributions → aggregate results), generators support the perturbation step by providing contextually appropriate replacement values. This is crucial for maintaining data realism and producing meaningful explanations.
The module contains
Generator: Abstract base class for generators that don't require training, suitable for simple statistical sampling or rule-based value generation. TrainableGenerator: Extended abstract class with built-in training infrastructure for neural network-based generators that learn data distributions.
Key Design Principles
- Generators focus solely on producing replacement values
- Training is handled transparently with early stopping and validation splits
- Deterministic sampling support through optional seed parameters
- Extensible architecture for domain-specific perturbation strategies
Note
Generators are typically not used directly but are embedded within perturbator implementations. They enable advanced explanation methods like conditional sampling, learned imputations, and distribution-aware perturbations.
Classes
Generator
Bases: ABC
Abstract base class for data generators in perturbation methods.
Generators create synthetic data to replace masked or perturbed regions in input examples. They provide the core imputation functionality for creating meaningful perturbations.
Abstract class for generators that don't need to be trained on data.
Attributes:
-
device–The used device. Will get updated from the main explainer after initialization.
Source code in muppet/components/perturbator/generator/base.py
Functions
abstractmethod
Responsible for generating the perturbed values. It is called by the Perturbator.perturbate method.
Fully customizable and must be implemented in child generator that is required by a perturbator.
For deterministic sampling at inference time, take advantage of the passing a seed parameter
in order to fix the seed, something like torch.manual_seed(seed), as in GaussianFeatureGenerator.
Source code in muppet/components/perturbator/generator/base.py
TrainableGenerator
Bases: Module, Generator
Abstract base class for trainable neural network generators.
Extends the basic Generator with PyTorch neural network capabilities and built-in training infrastructure. Supports complex learned perturbation strategies through gradient-based optimization
Implementing subclass of this trainable perturbator only requires to
implement the run_epoch method.
Abstract class for generators with the train method implemented.
Parameters:
-
lr(float) –Learning rate
-
num_epochs(int) –Number of epochs
Attributes:
-
device–The used device. Will get updated from the main explainer after initialization.
Source code in muppet/components/perturbator/generator/base.py
Functions
Train the model.
Parameters:
-
train_loader(DataLoader) –The train data loader
Returns:
-
Tuple[list, list]–The training results trends history
Source code in muppet/components/perturbator/generator/base.py
abstractmethod
Run one training epoch. This is a customizable method that depends on the nature of the generator!
Source code in muppet/components/perturbator/generator/base.py
muppet.components.perturbator.generator.conditional_timestep_generator
Conditional Gaussian generator for time series perturbations using RNN-based VAE.
This module implements a sophisticated conditional generator for time series data that learns to impute missing values by modeling the conditional distribution P(X_t|X_0:t-1). The generator uses a variational autoencoder architecture with RNN encoder and Gaussian decoder to generate contextually appropriate perturbations for temporal explanations.
As part of the MUPPET perturbation framework, this generator enables advanced time series explanation methods by producing realistic substitute values that maintain temporal dependencies and feature correlations. This is essential for explaining models that depend on sequential patterns and temporal dynamics.
The module contains
ConditionalGaussianFeatureGenerator: Main trainable generator combining encoder-decoder with conditional sampling capabilities for multivariate time series GaussianRNNEncoder: RNN-based encoder that maps time series to latent Gaussian parameters GaussianDecoder: Decoder that generates likelihood distributions from latent representations check_cov_pd: Utility function ensuring positive definite covariance matrices
Key Technical Features
- Variational autoencoder with RNN encoder for temporal modeling
- Conditional sampling P(X_S'|X_S) for feature subsets
- Multivariate Gaussian distributions with learned covariances
- Positive definite covariance correction with noise injection
- Support for both univariate and multivariate time series
- Deterministic sampling for reproducible explanations
The generator is designed for use with time series explanation methods like temporal LIME, SHAP for sequences, or custom perturbation-based attributions that require realistic temporal imputations rather than simple masking strategies.
Classes
ConditionalGaussianFeatureGenerator
ConditionalGaussianFeatureGenerator(
feature_size,
hidden_size,
latent_size,
mid_layer_size,
prediction_size,
num_samples,
cov_noise_level,
max_noise_correction,
lr,
num_epochs,
timesteps_divide_num,
seed=None,
)
Bases: TrainableGenerator
Conditional Gaussian generator for time series perturbations.
Implements a variational autoencoder with RNN encoder and Gaussian decoder for learning conditional distributions P(X_t|X_{0:t-1}). Enables sophisticated temporal perturbations that preserve realistic time series patterns and feature dependencies.
Conditional generator model to predict perturbed values.
Parameters:
-
feature_size(int) –Number of features in the input (f)
-
hidden_size(int) –The encoder's hidden layer size
-
latent_size(int) –The encoder's latent space size
-
mid_layer_size(int) –The mid-layer size used in Encoder and Decoder
-
prediction_size(int) –The number of predictions to make. The prediction window [t:t+p] (p)
-
num_samples(int) –Number of Zs to sample from the latent distribution (n)
-
cov_noise_level(float) –The noise to add to the covariance to make it positive definite (PD)
-
max_noise_correction(int) –Maximum number of covariance PD correction iterations
-
lr(float) –Training learning rate used with Adam optimizer
-
num_epochs(int) –Training number of epochs
-
timesteps_divide_num(int) –Used to divide the time series. E.g, when set to 1, it means predict only at time t=T using X0:T-1
-
seed(int | None, default:None) –the seed to used for reproducible sampling at inference time. If not provided the sampling is nondeterministic
Source code in muppet/components/perturbator/generator/conditional_timestep_generator.py
Functions
Estimate the mean and (co)variance of the joint distribution P(X_t|X_0:t-1).
Parameters:
-
past(Tensor) –Batch of past data of the shape (b, f, t).
Returns: mean and (co)variance
Source code in muppet/components/perturbator/generator/conditional_timestep_generator.py
Generate the missing measurements at time current(=t) based on past (X0:t-1) through sampling from the joint distribution P(X_t|X_0:t-1).
Parameters:
-
past(Tensor) –Batch of previous data measurements (b, f, t).
Returns:
-
Tensor–torch.Tensor: A sample from the the Gaussian distribution of P(X_t|X_0:t-1).
Source code in muppet/components/perturbator/generator/conditional_timestep_generator.py
Generate values for the features_to_perturb at time current(=t) based on past (historical data) through conditional sampling from P(X_{S^,t}|X_{S,t}).
Takes 'current' the measurements at time t, and returns same 'current' at time t with features in S^ being replaced by values estimated from the Gaussian distribution.
Parameters:
-
past(Tensor) –Batch of previous data measurements (b, f, t)
-
current(Tensor) –Batch of measurements at time t (b, f)
-
features_to_perturb(set) –Set of features' indices that are not known/measured. We sample on these features
Returns:
-
full_sample(Tensor) –The imputed sample at time t with the generated values for missing measurements (S^). (b, f, t)
Source code in muppet/components/perturbator/generator/conditional_timestep_generator.py
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 | |
Run one training epoch
Parameters:
-
dataloader(DataLoader) –The train loader
-
in_train(bool) –Either if training or evaluating. E.g set to True ==> training mode.
Returns:
-
float(float) –the epoch loss.
Source code in muppet/components/perturbator/generator/conditional_timestep_generator.py
GaussianRNNEncoder
Bases: Module
RNN encoder for mapping input sequences to Gaussian latent spaces.
Encodes time series data using a GRU and maps it to latent Gaussian parameters (mean and standard deviation). Used in variational approaches for conditional time series generation.
An RNN encoder that is responsible of transforming the input x into an encoding space (hidden) using one-layer GRU, and then maps it to the latent space the represent the Gaussian parameters for every input sample.
Parameters:
-
feature_size(int) –The number of input features
-
hidden_size(int) –The RNN/GRU hidden space size.
-
latent_size(int) –The latent space size. It is multiplied by two as it represents the mean and covariance of the latent representation Z of x.
-
mid_layer_size(int) –The size of a mid layer between the hidden space (RNN encoding) and the latent space (Z).
-
device(str) –The device to use.
Source code in muppet/components/perturbator/generator/conditional_timestep_generator.py
Functions
Estimate mean and the std of the distribution of the latent representation Z of X.
Parameters:
-
X(Tensor) –Input time series (b, f, t)
Returns:
-
Tuple[Tensor, Tensor]–A tuple of mu and std. Each of shape (b, latent_size)
Source code in muppet/components/perturbator/generator/conditional_timestep_generator.py
GaussianDecoder
Bases: Module
Gaussian decoder for generating distributions from latent representations.
Decodes latent variables into likelihood distributions over the output space. Supports both univariate (with variance) and multivariate (with covariance) Gaussian distributions for flexible time series generation.
A Gaussian decoder that estimate the likelihood distribution of the latent representation Z (encoding) of X.
Parameters:
-
feature_size(int) –The number of input features
-
output_size(int) –The expected output size (something like number of predictions to make * number of input features)
-
latent_size(int) –The latent representation size (output size of encoder).
-
mid_layer_size(int) –The size of a mid-layer between the latent space and final output mapping.
-
device(str) –The device to use.
Source code in muppet/components/perturbator/generator/conditional_timestep_generator.py
Functions
Estimate the likelihood Gaussian distribution of the output (proportional to number of features and needed predictions) given the latent representation Z (encoding) of X.
Parameters:
-
mu(Tensor) –The mean of the latent distribution. Shape = (b, latent_size)
-
std(Tensor) –The std of the latent distribution. Shape = (b, latent_size)
-
num_samples(int) –Number of Zs to sample from the latent distribution. In case multi-sampling is needed!
-
cov_noise_level(float) –The noise to add to the covariance to make it positive definite (PD).
-
max_noise_correction(int) –Maximum number of covariance PD correction iterations.
Returns:
-
tuple(Tuple[Tensor, Tensor]) –estimated mean and covariance or variance if univariate case
Note: n = bnum_samples, output_size = pf (number of prediction to make * input features) where p=prediction_size the prediction window.
Source code in muppet/components/perturbator/generator/conditional_timestep_generator.py
Functions
check_cov_pd
Check if a covariance matrix is Positive Definite (PD) if not keep adding noise to it till it becomes PD. If max_noise_correction is exceeded, return the identity matrix.
Parameters:
-
covariance_matrix(Tensor) –A matrix of shape (n, k, k)
-
cov_noise_level(_type_) –A noise value to be added to make the cov PD
-
max_noise_correction(int, default:20) –Number of tries to correct the matrix, if exceeded return I.
-
device(str) –The device to use.
Returns:
-
Tensor–torch.Tensor: A PD covariance matrix with noise added to the original one or the identity matrix I of same shape.
Source code in muppet/components/perturbator/generator/conditional_timestep_generator.py
muppet.components.perturbator.generator.tabular_generator
Tabular data generators for perturbation-based explanations.
This module provides generators specifically designed for tabular data perturbations in the MUPPET XAI framework. These generators create realistic substitute values for masked features during the perturbation process, enabling meaningful explanations for tabular machine learning models.
Tabular data presents unique challenges for perturbation-based explanations due to mixed data types (numerical and categorical), feature correlations, and distribution properties. The generators in this module address these challenges by implementing different sampling strategies tailored to tabular characteristics.
The module contains
GaussianSamplingGenerator: Simple statistical generator using Gaussian distributions estimated from historical data for time series or sequential tabular data StandardGaussianTabularGenerator: Advanced generator for mixed tabular data with separate handling of numerical and categorical features RandomSampleTabularGenerator: Frequency-based generator that samples from observed feature value distributions in training data
Key Features
- Handles mixed numerical and categorical features appropriately
- Preserves feature distributions and correlations from training data
- Supports instance-centered perturbations for local explanations
- Configurable sampling strategies (statistical vs. frequency-based)
- Deterministic sampling for reproducible explanations
These generators are typically used with tabular perturbators and are essential for methods like LIME, SHAP, and other feature attribution techniques applied to structured data, enabling realistic counterfactual analysis and feature importance discovery.
Classes
GaussianSamplingGenerator
Bases: Generator
Simple Gaussian sampling generator for tabular data imputation.
Generates replacement values for perturbed features by sampling from normal distributions. Provides basic statistical imputation without considering feature correlations or data distributions.
A simple random sampling generator. Used for imputing missing values by a sampled ones from a Normal Distribution.
Parameters:
-
seed(int, default:None) –Seed to control reproducibility
Source code in muppet/components/perturbator/generator/tabular_generator.py
Functions
Return sampled values from a Normal Distribution.
past (torch.Tensor): past measurements from which Mean and Std will be estimated (b=1, f, t) current (torch.Tensor): current time step to perturbate (b=1, f, 1) features_to_perturb (torch.Tensor): features to perturb
Returns:
-
Tensor–torch.Tensor: sampled values
Source code in muppet/components/perturbator/generator/tabular_generator.py
StandardGaussianTabularGenerator
StandardGaussianTabularGenerator(
train_data,
categorical_features=[],
sample_around_instance=True,
)
Bases: Generator
Advanced generator for mixed tabular data with statistical modeling.
Handles both numerical and categorical features by computing separate statistics and frequencies. Provides instance-centered perturbations for local explanations and maintains feature distributions from training data.
Initialize the StandardGaussianTabularGenerator for mixed data types.
Sets up a generator that handles both numerical and categorical features by computing separate statistics and frequencies, enabling realistic perturbations for tabular machine learning explanations.
Parameters:
-
train_data(Tensor) –Training dataset tensor used to compute feature statistics and categorical frequencies. Shape: (n_samples, n_features).
-
categorical_features(list[int], default:[]) –List of column indices that contain categorical data. These features will be handled using frequency-based sampling.
-
sample_around_instance(bool, default:True) –If True, generates perturbations centered around the instance being explained. If False, samples from training data distribution. Useful for local vs. global explanation strategies.
Source code in muppet/components/perturbator/generator/tabular_generator.py
Functions
Train the generator to compute summary statistics from the training data.
Source code in muppet/components/perturbator/generator/tabular_generator.py
Generate a perturbed sample based on the learned statistics.
Parameters:
-
x_instance(Tensor) –The instance to be explained, of shape (1, f).
-
data_scaled(Tensor) –Pre-scaled data based on normal distribution, of shape (n, 1, f).
Returns:
-
Tensor–torch.Tensor: Generated sample tensor with perturbations.
Source code in muppet/components/perturbator/generator/tabular_generator.py
RandomSampleTabularGenerator
Bases: Generator
Generate random sample vectors based on feature values and frequencies from training data.
Attributes:
-
train_data(Tensor) –The training data from which feature values will be sampled.
-
n_features(int) –The number of features in the training data.
-
feature_values(list[lists]) –List of unique values for each feature.
-
method(str) –The method to generate samples, either 'freq' or 'mean'.
Methods:
-
train_generator–A static method reserved for future use; currently does nothing.
-
generate–Generates a specified number of random sample vectors from the feature values in the training data based on the specified method ('freq' or 'mean').
Initializes the RandomSampleTabularGenerator with training data.
Parameters:
-
train_data(Tensor) –The training data used to fit samplers. Expected shape: (num_train_samples, num_features).
-
method(str, default:'freq') –The method to generate samples, either 'freq' or 'mean'. Default is 'freq'.
-
seed(int, default:None) –The seed for random number generation. Default is None, which means no fixed seed.
Source code in muppet/components/perturbator/generator/tabular_generator.py
Functions
This method can be extended or implemented in the future if additional training logic is required.
Source code in muppet/components/perturbator/generator/tabular_generator.py
Generates random samples by frequency based sampling feature values or by using the mean values for each feature.
Parameters:
-
n_samples(int) –The number of samples to generate.
Returns:
-
–
torch.Tensor: A tensor containing the generated random samples with shape (n_samples, 1, n_features).