Attributor
muppet.components.attributor.base
Base attributor component for MUPPET XAI.
This module defines the abstract base class for all attributors in the MUPPET XAI framework. Attributors are responsible for calculating attribution scores from model predictions on perturbed inputs. These attributions quantify how much each perturbation affects the model's output and serve as the basis for feature importance calculations.
The attribution process is the third step in the four-step perturbation-based XAI approach: generate masks → apply perturbations → calculate attributions → aggregate results.
Classes:
-
Attributor–Abstract base class defining the interface for all attributor components.
Classes
Attributor
Bases: ABC
Abstract base class for attributor components in MUPPET XAI.
A global component that defines the 'calculate_attribution' method which is responsible for filling-up the premises' attribution. An attribution could be the model's output or something else that will be used by the aggregator to find the final heatmap.
Attributes:
-
device–The used device. It gets updated from the main explainer after initialization.
-
convention–The attribution convention (constructive or destructive).
-
allowed_conventions–The allowed convention types from AttributionConvention.
Example
Typical usage involves subclassing the Attributor base class:
class CustomAttributor(Attributor):
def __init__(self, convention="destructive"):
self.convention = convention
super().__init__()
def calculate_attribution(self, x, perturbed_inputs, model, memory):
# Calculate attributions and store in memory
predictions = model(perturbed_inputs)
# Custom attribution logic here
pass
Initialize the attributor component.
Sets up the default device and attribution convention for the attributor. If no convention is set, defaults to 'destructive' with a warning.
Source code in muppet/components/attributor/base.py
Attributes
Functions
Reset the attributor to its initial state.
This method restores the attributor to its original configuration, clearing any internal state or cached data that may affect subsequent attribution calculations.
Source code in muppet/components/attributor/base.py
abstractmethod
Calculates the attribution based on the example's perturbations (x'), the model and the original example (x) if needed.
It is generally expected that the most impactful masks will get the highest attribution score : indeed, at the end of the pipeline, most Aggregators will use a mask's attribution as a direct proxy for its importance.
Parameters:
-
x(Tensor) –The original input example. Shape (b=1, *x.shape[1:]): - b is the batch size - x.shape[1:] is the input data dimensions. E.g images (c, w, h) channels, width and height.
-
perturbed_inputs(Tensor) –The perturbations calculated by the Perturbator.
-
model(Module) –The black-box model.
-
memory(Memory) –The used memory structure.
Returns:
-
None(None) –It fills up the memory in place.
Source code in muppet/components/attributor/base.py
muppet.components.attributor.classification
Classification-based attributors for MUPPET XAI.
This module provides attribution methods for classification models. These attributors calculate attribution scores based on class probabilities, making them ideal for explaining image classification, text classification, and other discrete classification tasks.
Classes:
-
ClassScoreAttributor–Calculates attributions based on non perturbed input class probability prediction score, measuring how much each perturbation affects the model's confidence in the correct prediction. Supports both destructive and constructive attribution conventions.
Technical Details
The ClassScoreAttributor computes attributions by: 1. Determining the true class from the original input 2. Evaluating the model's probability for this class on each perturbation 3. Converting probabilities to attribution scores based on the convention: - Destructive: Higher scores for perturbations that reduce class confidence - Constructive: Higher scores for perturbations that maintain class confidence
This method is computationally efficient and provides intuitive explanations for classification models across various domains and architectures.
Classes
ClassScoreAttributor
Bases: Attributor
Attribution based on probability score of the true class for classification tasks.
This attributor calculates the probability score of the true class (calculated from the original example) and stores it into premise's attribution. Since in most cases we expect the most impactful perturbations to have the highest attribution, by default the attribution will be MINUS the probability of the true score.
Attributes:
-
true_class–True class index determined from the original input.
Initialize the class score attributor.
Parameters:
-
convention(Union[AttributionConvention, str], default:'destructive') –The attribution convention, either 'destructive' or 'constructive'.
Source code in muppet/components/attributor/classification.py
Functions
Calculates the attribution of perturbed inputs.
Parameters:
-
x(Tensor) –Example to explain. (b=1, *x.shape[1:]).
-
perturbed_inputs(Tensor) –The example's perturbations. Shape (N, *x.shape).
-
model(Module) –The black-box model.
-
memory(Premiselist) –Premises' memory where to save the attributions.
Where b is the batch size (=1), N is the number of generated masks.
Source code in muppet/components/attributor/classification.py
muppet.components.attributor.differentiable
Gradient-based and differentiable attributors for MUPPET XAI.
This module provides attribution methods that leverage gradient-based optimization and differentiable loss functions. These attributors are designed for scenarios where the attribution interacts with a optimisation-based exploration techniques.
Classes:
-
DifferentiableAttributor–Abstract base class for differentiable attributors that use customizable loss functions to compute attributions through backpropagation.
-
MaskRegularizedScoreAttributor–Concrete implementation that combines classification scores with mask regularization terms (L1 and Total Variation) to find minimal and smooth explanatory masks.
Classes
DifferentiableAttributor
Bases: Attributor
Base class for gradient-based attribution methods.
This class is used alongside gradient-based exploration methods. It loops through the premises to fill up their attributions by calling a customizable loss function. All needed arguments for the loss calculation must be initialized within the child class.
This base class sets up the true class placeholder for gradient-based attribution methods that can benefit from backpropagation and differentiable loss functions.
Attributes:
-
true_class–The true class index calculated once from the original input x.
Initialize the differentiable attributor.
Source code in muppet/components/attributor/differentiable.py
Functions
Calculates the loss of an objective function defined by calculate_attribution_loss method.
Parameters:
-
x(Tensor) –Example to explain. Shape (1, *x.shape[1:]) E.g (b=1, c, w, h) for images - b is number of input examples, - c is the channel dimensions, - w is the width, - h is the height,
-
perturbed_inputs(Tensor) –Perturbed versions of the example. Shape (N, x.shape) E.g (N, b, c, w, h) - N is the number of applied perturbations on the example.
-
model(Module) –The black-box model.
-
memory(FlatList) –Structure holding the premises where attributions will be saved.
Source code in muppet/components/attributor/differentiable.py
abstractmethod
Calculates the optimization loss using premise element and model's output corresponding to the predicted class from original input example.
Parameters:
-
premise(Premise) –The memory's element that represent the perturbation.
-
output(Tensor) –The model's output for the corresponding input example.
Raises:
-
NotImplementedError–Must be implemented in child classes.
Source code in muppet/components/attributor/differentiable.py
MaskRegularizedScoreAttributor
Bases: DifferentiableAttributor
Regularized mask attribution using L1 and total variation loss.
This attributor calculates a loss function combining minimal mask penalty, total variation denoising, and true class probability from the perturbed input: Loss = λ|m| + λ'|tv(1-m)| + f(x') By default no regularization is applied on the mask.
Attributes:
-
l1_coeff–L1 regularization coefficient for mask sparsity.
-
tv_coeff–Total Variation coefficient for smoothness regularization.
-
tv_beta–Degree of the Total Variation denoising norm.
-
convention–The attribution convention (constructive or destructive).
-
true_class–The true class index calculated once from the original input x.
Initialize the mask regularized score attributor.
Parameters:
-
l1_coeff(float, default:0) –L1 regularization coefficient for the mask.
-
tv_coeff(float, default:0) –Total variation regularization coefficient for the mask.
-
tv_beta(float, default:0) –Beta parameter for total variation calculation.
-
convention(Union[AttributionConvention, str], default:'destructive') –Attribution convention, either 'destructive' or 'constructive'.
Source code in muppet/components/attributor/differentiable.py
Functions
Calculates the attribution/loss from the sum of the remise's mask mean, mask's TV norm and probability prediction corresponding to the true class.
Parameters:
-
premise–The premise element representing the perturbation.
-
output–The true class predicted probability.
Returns:
-
loss(Tensor) –The calculated loss.
Source code in muppet/components/attributor/differentiable.py
muppet.components.attributor.distribution
Probability distribution-based attributors for MUPPET XAI.
This module provides attribution methods that analyze changes in probability distributions over time, particularly designed for time series and sequential data explanation. These attributors measure how perturbations affect the model's distributional predictions and temporal dynamics.
Classes:
-
ProbaShiftAttributor–Calculates attributions based on the difference between temporal distribution shifts and perturbation-induced distribution changes, implementing the FIT (Feature Importance in Time) methodology for time series explanation.
Classes
ProbaShiftAttributor
Bases: Attributor
Attribution based on probability distribution shifts for time series classification.
The distribution-based attributors are especially valuable for understanding sequential models where the temporal evolution of predictions is as important as the final output. They quantify feature importance by analyzing distributional shifts caused by perturbations.
This attributor works on probability distributions over classes for classification tasks. The attribution is calculated as the difference between \(KL(P(y|X_{0:t}) || P(y|X_{0:t-1}))\) and \(KL(P(y|X_{0:t}) || P(y|X'_{0:t}))\) summed over all classes, where \(X'_{0:t}\) means the values of features are perturbed at time t.
This attributor calculates feature importance based on probability distribution shifts over temporal sequences, implementing the FIT methodology for time series explanation.
Attributes:
-
outputs–Key-value mapping of timestep to model's output when calculating P(y|X_0:t).
-
padding–The padding strategy for time series inputs.
-
convention–The attribution convention (destructive).
Initialize the ProbaShiftAttributor.
Parameters:
-
padding(str) –Padding strategy for sequences. Options: - "left": Zero-pad sequences on the left (common for RNNs) - "right": Zero-pad sequences on the right - None: No padding for models handling variable lengths
Source code in muppet/components/attributor/distribution.py
Functions
For every premise stored in the memory, fills up its attribution calculated from
Parameters:
-
x(Tensor) –The input example to be explained. Shape (b=1, f, t)
-
perturbed_inputs(Tensor) –The calculated perturbations by the Perturbator. Shape (N, *x.shape)
-
model(Module) –The black-box model.
-
memory(Memory, default:PremiseList) –The simple list memory structure.
Source code in muppet/components/attributor/distribution.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 | |
muppet.components.attributor.embedding
Embedding-based attributors for MUPPET XAI.
This module provides attribution methods that work with embedding spaces and latent representations. These attributors are designed for models that output vector embeddings rather than discrete classifications, making them ideal for explaining representation learning models, autoencoders, and embedding-based systems.
Classes:
-
EmbeddingDistanceAttributor–Calculates attributions based on the L2 distance between original and perturbed embeddings, measuring how much each perturbation changes the model's internal representation of the input.
-
DiceScoreAttributor–Specialized attributor for semantic segmentation task that uses Dice coefficient to measure segmentation quality changes caused by perturbations instead of the Lé distance of the vector embedding
Classes
EmbeddingDistanceAttributor
Bases: Attributor
Attribution based on distance between original and perturbed embeddings.
A perturbation's value is equal to how much it changed the original embedding (i.e., original model output). The end goal is to find perturbations that make the perturbed embedding as far away from the original embedding as possible.
This attributor measures how perturbations affect the model's embedding representations by calculating L2 distances between original and perturbed embeddings. The model output is expected to have shape (batch, **embedding_dim).
The EmbeddingDistanceAttributor computes attributions by:
1. Reference computation: Computing the original input's embedding E₀ = model(x)
2. Distance measurement: For each perturbation xᵢ, computing Eᵢ = model(xᵢ)
3. Attribution scoring: Computing distance d(E₀, Eᵢ) = ||E₀ - Eᵢ||₂
4. Sign adjustment: Applying negative sign to maximize distance (destructive convention)
The L2 distance provides a natural measure of representation change:
The method works with any model that outputs continuous vector representations, regardless of the embedding dimensionality or architecture (CNNs, transformers, autoencoders, etc.).
Attributes:
-
input_embedding–Stores the true embedding output for comparison.
Initialize the EmbeddingDistanceAttributor.
Source code in muppet/components/attributor/embedding.py
Functions
Calculate the L2 distance as the attribution between x and its perturbations. Note that the expected shape for x and perturbed_inputs (1 for batch, nb_rows, embedding_dim).
Parameters:
-
x(Tensor) –The input example to be explained.
-
perturbed_inputs(Tensor) –The calculated perturbations by the Perturbator.
-
model(Module) –The black-box model.
-
memory(Memory) –The simple list memory structure.
Source code in muppet/components/attributor/embedding.py
Calculate similarity between perturbed and original embeddings.
Computes the negative L2 distance between embeddings to maximize the distance (higher score for more different embeddings).
Parameters:
-
embedding(Tensor) –The perturbed embedding.
-
true_embedding(Tensor) –The original input embedding.
Returns:
-
–
torch.Tensor: Negative L2 distance (higher values indicate more difference).
Source code in muppet/components/attributor/embedding.py
DiceScoreAttributor
Bases: Attributor
Attribution based on Dice score between probability distributions.
This attributor calculates the Dice score between the predicted probability distribution of a perturbed input and the original example's output. The Dice score measures the overlap between the two distributions, providing a similarity measure for classification outputs.
This attributor is specifically designed for segmentation tasks where it measures how perturbations affect segmentation quality by calculating Dice coefficient changes between original and perturbed predictions.
Attributes:
-
true_class–The true class index calculated from the original input.
Initialize the DiceScoreAttributor.
Inferit from Attributor with true_class is initiated to None
Source code in muppet/components/attributor/embedding.py
Functions
Reset the attributor to its initial state.
Clears the cached true class to ensure fresh calculations for new inputs.
Calculates the attribution (Dice score) for perturbed inputs.
Parameters:
-
x(Tensor) –Example input. Shape: (1, C, H, W)
-
perturbed_inputs(Tensor) –Perturbed inputs. Shape: (N, 1, C, H, W)
-
model(Module) –Model to explain.
-
memory(PremiseList) –Memory structure to attach attributions to.
Source code in muppet/components/attributor/embedding.py
muppet.components.attributor.similarity
Similarity-based attributors for MUPPET XAI.
This module provides attribution methods that incorporate similarity measures between original and perturbed inputs. These attributors are essential for local explanation methods like LIME and SHAP, where the importance of perturbations is weighted by their similarity to the original input.
Classes:
-
SimilarityAttributor–Generic attributor that combines model predictions with configurable similarity functions for flexible local explanation methods.
Functions:
-
lime_similarity–Gaussian kernel similarity function for LIME-style explanations.
-
kernel_shap_similarity–SHAP kernel similarity function based on coalition size.
Classes
SimilarityAttributor
Bases: Attributor
Attribution based on similarity measures between original and perturbed inputs.
This attributor calculates similarities relative to a provided similarity function. The similarity function returns high values when inputs are highly different, making it suitable for LIME-style explanations where we need to weight samples by their distance from the original input.
Similarity-based attribution combines two key components: 1. Model response: How the model's prediction changes with perturbation 2. Input similarity: How similar the perturbation is to the original input
The SimilarityAttributor stores both values:
LIME Similarity: Uses Gaussian kernel with Euclidean distance:
SHAP Kernel: Based on coalition size with theoretical guarantees:
Where M is total features and |S| is coalition size.Dice Score: For segmentation, measures overlap between predictions:
These methods are particularly effective for: - Local explanations: LIME and SHAP-style interpretability - Faithful approximations: Ensuring explanations reflect local model behavior - Segmentation analysis: Understanding model performance on different regions - Coalition-based methods: Game-theoretic explanation approaches
Attributes:
-
predicted_class–The predicted class from the original input.
-
similarity_fun–The similarity function used for calculations.
-
convention–The attribution convention (perturbed_input_similarity).
Example
Using LIME-style similarity weighting:
Initialize the SimilarityAttributor.
Parameters:
-
similarity_fun(Callable[[Tensor, Tensor, Premise], Tensor]) –Function that calculates similarity between original and perturbed inputs. Takes (original_tensor, perturbed_tensor, premise) and returns similarity scores. Higher values indicate greater difference.
Source code in muppet/components/attributor/similarity.py
Functions
Calculates the attribution of perturbed inputs.
Parameters:
-
- x(Tensor) –Example to explain. (b=1, *x.shape[1:]).
-
- perturbed_inputs(Tensor) –The example's perturbations. Shape (N, *x.shape).
-
- model(Module) –Given model we want to explain.
-
- memory(Memory) –Memory where the premises are stored.
Where b is the batch size (=1), N is the number of generated masks.
Source code in muppet/components/attributor/similarity.py
Functions
lime_similarity
Example LIME similarity function using a Gaussian kernel for LIME method. This function computes similarities for all perturbations at once.
Source code in muppet/components/attributor/similarity.py
kernel_shap_similarity
Calculates similarity based on kernel SHAP.
Parameters:
-
x_vector(Tensor) –The original input tensor of shape (1, N).
-
perturbed_vector(Tensor) –The perturbed input tensor of shape (1, N).
-
premise(Premise) –The premise object containing the key tensor of shape (1, f).
Returns:
-
Tensor–torch.Tensor: A tensor of shape [1] containing the similarity score.