deep_lift_shap

tangermeme.deep_lift_shap._captum_deep_lift_shap(model: ~torch.nn.modules.module.Module, X: ~torch.Tensor, args: tuple | None = None, target: int = 0, batch_size: int = 32, references: ~collections.abc.Callable[[...], ~typing.Any] | ~torch.Tensor = <function dinucleotide_shuffle>, n_shuffles: int = 20, return_references: bool = False, hypothetical: bool = False, device: str | ~torch.device | None = None, random_state: int | ~numpy.random.mtrand.RandomState | None = None, verbose: bool = False) → Tensor | AttributionReferencesResult

Calculate attributions using captum’s DeepLiftShap and a given model.

This function will calculate DeepLift/Shap attributions on a set of sequences by delegating to captum’s DeepLiftShap and using a user- supplied (or default dinucleotide_shuffle) reference function or pre-computed reference tensor, the same references contract used by deep_lift_shap. It does NOT make any assumption about the structure of the model output (no BPNet logits/counts split) and it does NOT generate GC-matched negatives.

This is an internal/debugging function that is mostly meant to be used to check for differences with the deep_lift_shap method.

Parameters

model: torch.nn.Module: A PyTorch model to use for making predictions. These models can take in any number of inputs and make any number of outputs. The additional inputs must be specified in the args parameter.
X: torch.tensor, shape=(-1, len(alphabet), length): A set of one-hot encoded sequences to calculate attribution values for.
args: tuple or None, optional: An optional set of additional arguments to pass into the model. If provided, each element in the tuple or list is one input to the model and the element must be formatted to be the same batch size as X. If None, no additional arguments are passed into the forward function. Default is None.
target: int, optional: The output of the model to calculate gradients/attributions for. This will index the last dimension of the predictions. Default is 0.
batch_size: int, optional: The number of sequence-reference pairs to pass through DeepLiftShap at a time. Importantly, this is not the number of elements in X that are processed simultaneously (alongside ALL their references) but the total number of X-reference pairs that are processed. This means that if you are in a memory-limited setting where you cannot process all references for even a single sequence simultaneously that the work is broken down into doing only a few references at a time. Default is 32.
references: func or torch.Tensor, optional: If a function is passed in, the function must accept (X, n=…, random_state=…). It is called once per example with n=n_shuffles and should return a tensor shaped (1, n_shuffles, *X.shape[1:]) (the leading axis is indexed off). The function should transform a sequence into some form of signal-null background, such as by shuffling it. If a torch.Tensor is passed in, that tensor must have shape (len(X), n_shuffles, *X.shape[1:]), in that for each sequence a number of shuffles are provided. Default is the function dinucleotide_shuffle.
n_shuffles: int, optional: The number of shuffles to use if a function is given for references. If a torch.Tensor is provided, this number is ignored. Default is 20.
return_references: bool, optional: Whether to return the references that were generated during this process. Only use if references is not a torch.Tensor. Default is False.
hypothetical: bool, optional: Whether to return attributions for all possible characters at each position (True) or only for the character that is actually in the sequence (False). When False, the per-character attributions are multiplied by the one-hot encoded input so that only the observed character has a non-zero attribution at each position. Default is False.
device: str or torch.device or None, optional: The device to move the model and batches to when making predictions. If None, use CUDA when available and fall back to CPU otherwise. Default is None.
random_state: int or None, optional: The random seed to use to ensure determinism. Passed through to the references callable; numpy.random.RandomState instances are not supported. If None, the process is not deterministic. Default is None.
verbose: bool, optional: Whether to display a progress bar. Default is False.

Returns

attributions: torch.tensor: The attributions calculated for each input sequence, with the same shape as the input sequences.
references: torch.tensor, optional: The references used for each input sequence, with the shape (n_input_sequences, n_shuffles, 4, length). Only returned if return_references = True.

tangermeme.deep_lift_shap.deep_lift_shap(model: ~torch.nn.modules.module.Module, X: ~torch.Tensor, args: tuple | None = None, target: int = 0, batch_size: int = 32, references: ~collections.abc.Callable[[...], ~typing.Any] | ~torch.Tensor = <function dinucleotide_shuffle>, n_shuffles: int = 20, return_references: bool = False, hypothetical: bool = False, warning_threshold: float = 0.001, additional_nonlinear_ops: dict | None = None, print_convergence_deltas: bool = False, raw_outputs: bool = False, only_warn: bool = False, dtype: str | ~torch.dtype | None = None, device: str | ~torch.device | None = None, random_state: int | ~numpy.random.mtrand.RandomState | None = None, verbose: bool = False) → Tensor | AttributionReferencesResult

Calculate attributions for a set of sequences using DeepLIFT/SHAP.

This function will calculate the DeepLIFT/SHAP attributions on a set of sequences given a model. These attributions have the additive property that the sum of the attributions is ~equal to the difference in prediction between the original sequence and the reference sequences.

As an implementation note, the batch size refers to the number of example-reference pairs that are being run simultaneously. When the batch size is smaller than the number of references, multiple batches will be run per example and the attributions will only be averaged across the references after they have all been covered. You may want to do this if the model or examples are so large that only a few can fit in memory at a time. The result will be identical to if all examples could fit in memory and each batch contained all the references.

Convergence deltas are calculated automatically for each example-reference pair. Theoretically, these should be zero, but may in practice just be a small number due to machine precision issues with non-linear models. If these deltas exceed a warning threshold, a non-terminating warning will be issued to let you know that the deltas have been exceeded.

NOTE: predictions MUST yield a (batch_size, n_targets) tensor, even if n_targets is 1. If your model yields something more complicated you must wrap the model in a small class that operates on the outputs in a manner that yields such a tensor, e.g., by slicing the output or summing along a relevant axis.

Parameters

model: torch.nn.Module: A PyTorch model to use for making predictions. These models can take in any number of inputs and make any number of outputs. The additional inputs must be specified in the args parameter.
X: torch.tensor, shape=(-1, len(alphabet), length): A set of one-hot encoded sequences to calculate attribution values for.
args: tuple or None, optional: An optional set of additional arguments to pass into the model. If provided, each element in the tuple or list is one input to the model and the element must be formatted to be the same batch size as X. If None, no additional arguments are passed into the forward function. Default is None.
target: int, optional: The output of the model to calculate gradients/attributions for. This will index the last dimension of the predictions. Default is 0.
batch_size: int, optional: The number of sequence-reference pairs to pass through DeepLiftShap at a time. Importantly, this is not the number of elements in X that are processed simultaneously (alongside ALL their references) but the total number of X-reference pairs that are processed. This means that if you are in a memory-limited setting where you cannot process all references for even a single sequence simultaneously that the work is broken down into doing only a few references at a time. Default is 32.
references: func or torch.Tensor, optional: If a function is passed in, the function must accept (X, n=…) and (when random_state is not None) (X, n=…, random_state=…). It is called with n=1 per shuffle and once per (example, shuffle_idx) pair when seeded, or once per batch when not. It should return a tensor shaped (batch, 1, *X.shape[1:]) (the second axis is squeezed off). The function should transform a sequence into some form of signal-null background, such as by shuffling it. If a torch.Tensor is passed in, that tensor must have shape (len(X), n_shuffles, *X.shape[1:]), in that for each sequence a number of shuffles are provided. Default is the function dinucleotide_shuffle.
n_shuffles: int, optional: The number of shuffles to use if a function is given for references. If a torch.Tensor is provided, this number is ignored. Default is 20.
return_references: bool, optional: Whether to return the references that were generated during this process. Only use if references is not a torch.Tensor. Default is False.
hypothetical: bool, optional: Whether to return attributions for all possible characters at each position (True) or only for the character that is actually in the sequence (False). When False, the per-character attributions are multiplied by the one-hot encoded input so that only the observed character has a non-zero attribution at each position. Default is False.
warning_threshold: float, optional: A threshold on the convergence delta that will always raise a warning if the delta is larger than it. Normal deltas are in the range of 1e-6 to 1e-8. Note that convergence deltas are calculated on the gradients prior to the aggr_func being applied to them. Default is 0.001.
additional_nonlinear_ops: dict or None, optional: If additional nonlinear ops need to be added to the dictionary of operations that can be handled by DeepLIFT/SHAP, pass a dictionary here where the keys are class types and the values are the name of the function that handle that sort of class. Make sure that the signature matches those of _nonlinear and _maxpool above. This can also be used to overwrite the hard-coded operations by passing in a dictionary with overlapping key names. If None, do not add any additional operations. Default is None.
print_convergence_deltas: bool, optional: Whether to print the convergence deltas for each example when using DeepLiftShap. Default is False.
raw_outputs: bool, optional: Whether to return the raw outputs from the method – in this case, the multipliers for each example-reference pair – or the processed attribution values. Default is False.
only_warn: bool, optional: Whether to only warn when a violation is recorded instead of raise a terminating error. This applies to input validation on X and (if passed) references; it does NOT suppress the runtime convergence-delta warnings emitted from the DeepLIFT computation itself. Default is False.
dtype: str or torch.dtype or None, optional: The dtype to use with mixed precision autocasting. If None, use the dtype of the model. This allows you to use int8 to represent large data sets and only convert batches to the higher precision, saving memory. Default is None.
device: str or torch.device or None, optional: The device to move the model and batches to when making predictions. If None, use CUDA when available and fall back to CPU otherwise. Default is None.
random_state: int or None, optional: The random seed to use to ensure determinism. Must be an int (or None); the value is added to per-shuffle offsets when calling references, so numpy.random.RandomState instances are not supported here. If None, the process is not deterministic. Default is None.
verbose: bool, optional: Whether to display a progress bar. Default is False.

Returns

attributions: torch.tensor: If raw_outputs=False (default), the attribution values with shape equal to X. If raw_outputs=True, the multipliers for each example- reference pair with shape equal to (X.shape[0], n_shuffles, X.shape[1], X.shape[2]).
references: torch.tensor, optional: The references used for each input sequence, with the shape (n_input_sequences, n_shuffles, 4, length). Only returned if return_references = True.

tangermeme.deep_lift_shap.hypothetical_attributions(multipliers: tuple[Tensor], X: tuple[Tensor], references: tuple[Tensor]) → tuple[Tensor]

A function for aggregating contributions into hypothetical attributions.

When handling categorical data, like one-hot encodings, the gradients returned by a method like DeepLIFT/SHAP may need to be modified because the choice of one character at a position explicitly means that the other characters are not there. So, one needs to account for each character change actually being the addition of one character AND the subtraction of another character. Basically, once you’ve calculated the multipliers, you need to subtract out the contribution of the nucleotide actually present and then add in the contribution of the nucleotide you are becoming.

Each element in the tensor is considered an independent example.

As an implementation note: to be compatible with Captum, each input must be a tuple of length 1 and the returned value will be a tuple of length 1. I know this sounds silly but it’s the most convenient implementation choice to make the function compatible across DeepLiftShap implementations.

Parameters

multipliers: tuple of one torch.tensor, shape=(n_baselines, 4, length): The multipliers/gradient calculated by a method like DeepLIFT/SHAP. These should include values for both the observed characters and the unobserved characters at each position
X: tuple of one torch.tensor, shape=(n_baselines, 4, length): The one-hot encoded sequence being explained
references: tuple of one torch.tensor, shape=(n_baselines, 4, length): The one-hot encoded reference sequences, usually a shuffled version of the corresponding sequence in X.

Returns

projected_contribs: tuple of one torch.tensor, shape=(1, 4, length): The attribution values for each nucleotide in the input.