deep_lift_shap
- tangermeme.deep_lift_shap._captum_deep_lift_shap(model, X, args=None, target=0, batch_size=32, references=<function dinucleotide_shuffle>, n_shuffles=20, return_references=False, hypothetical=False, device='cuda', random_state=None, verbose=False)
Calculate attributions using DeepLift/Shap and a given model.
This function will calculate DeepLift/Shap attributions on a set of sequences. It assumes that the model returns “logits” in the first output, not softmax probabilities, and count predictions in the second output. It will create GC-matched negatives to use as a reference and proceed using the given batch size.
This is an internal/debugging function that is mostly meant to be used to check for differences with the deep_lift_shap method.
Parameters
- model: torch.nn.Module
A PyTorch model to use for making predictions. These models can take in any number of inputs and make any number of outputs. The additional inputs must be specified in the args parameter.
- X: torch.tensor, shape=(-1, len(alphabet), length)
A set of one-hot encoded sequences to calculate attribution values for.
- args: tuple or None, optional
An optional set of additional arguments to pass into the model. If provided, each element in the tuple or list is one input to the model and the element must be formatted to be the same batch size as X. If None, no additional arguments are passed into the forward function. Default is None.
- target: int, optional
The output of the model to calculate gradients/attributions for. This will index the last dimension of the predictions. Default is 0.
- batch_size: int, optional
The number of sequence-reference pairs to pass through DeepLiftShap at a time. Importantly, this is not the number of elements in X that are processed simultaneously (alongside ALL their references) but the total number of X-reference pairs that are processed. This means that if you are in a memory-limited setting where you cannot process all references for even a single sequence simultaneously that the work is broken down into doing only a few references at a time. Default is 32.
- references: func or torch.Tensor, optional
If a function is passed in, this function is applied to each sequence with the provided random state and number of shuffles. This function should serve to transform a sequence into some form of signal-null background, such as by shuffling it. If a torch.Tensor is passed in, that tensor must have shape (len(X), n_shuffles, *X.shape[1:]), in that for each sequence a number of shuffles are provided. Default is the function dinucleotide_shuffle.
- n_shuffles: int, optional
The number of shuffles to use if a function is given for references. If a torch.Tensor is provided, this number is ignored. Default is 20.
- return_references: bool, optional
Whether to return the references that were generated during this process. Only use if references is not a torch.Tensor. Default is False.
- hypothetical: bool, optional
Whether to return attributions for all possible characters at each position or only for the character that is actually at the sequence. Practically, whether to return the returned attributions from captum with the one-hot encoded sequence. Default is False.
- device: str or torch.device, optional
The device to move the model and batches to when making predictions. If set to ‘cuda’ without a GPU, this function will crash and must be set to ‘cpu’. Default is ‘cuda’.
- random_state: int or None or numpy.random.RandomState, optional
The random seed to use to ensure determinism. If None, the process is not deterministic. Default is None.
- verbose: bool, optional
Whether to display a progress bar. Default is False.
Returns
- attributions: torch.tensor
The attributions calculated for each input sequence, with the same shape as the input sequences.
- references: torch.tensor, optional
The references used for each input sequence, with the shape (n_input_sequences, n_shuffles, 4, length). Only returned if return_references = True.
- tangermeme.deep_lift_shap.deep_lift_shap(model, X, args=None, target=0, batch_size=32, references=<function dinucleotide_shuffle>, n_shuffles=20, return_references=False, hypothetical=False, warning_threshold=0.001, additional_nonlinear_ops=None, print_convergence_deltas=False, raw_outputs=False, dtype=None, device='cuda', random_state=None, verbose=False)
Calculate attributions for a set of sequences using DeepLIFT/SHAP.
This function will calculate the DeepLIFT/SHAP attributions on a set of sequences given a model. These attributions have the additive property that the sum of the attributions is ~equal to the difference in prediction between the original sequence and the reference sequences.
As an implementation note, the batch size refers to the number of example-reference pairs that are being run simultaneously. When the batch size is smaller than the number of references, multiple batches will be run per example and the attributions will only be averaged only the references after they have all been covered. You may want to do this if the model or examples are so large that only a few can fit in memory at a time. The result will be identical to if all examples could fit in memory and each batch contained all the references.
Convergence deltas are calculated automatically for each example-reference pair. Theoretically, these should be zero, but may in practice just be a small number due to machine precision issues with non-linear models. If these deltas exceed a warning threshold, a non-terminating warning will be issued to let you know that the deltas have been exceeded.
NOTE: predictions MUST yield a (batch_size, n_targets) tensor, even if n_targets is 1. If your model yields something more complicated you must wrap the model in a small class that operates on the outputs in a manner that yields such a tensor, e.g., by slicing the output or summing along a relevant axis.
Parameters
- model: torch.nn.Module
A PyTorch model to use for making predictions. These models can take in any number of inputs and make any number of outputs. The additional inputs must be specified in the args parameter.
- X: torch.tensor, shape=(-1, len(alphabet), length)
A set of one-hot encoded sequences to calculate attribution values for.
- args: tuple or None, optional
An optional set of additional arguments to pass into the model. If provided, each element in the tuple or list is one input to the model and the element must be formatted to be the same batch size as X. If None, no additional arguments are passed into the forward function. Default is None.
- target: int, optional
The output of the model to calculate gradients/attributions for. This will index the last dimension of the predictions. Default is 0.
- batch_size: int, optional
The number of sequence-reference pairs to pass through DeepLiftShap at a time. Importantly, this is not the number of elements in X that are processed simultaneously (alongside ALL their references) but the total number of X-reference pairs that are processed. This means that if you are in a memory-limited setting where you cannot process all references for even a single sequence simultaneously that the work is broken down into doing only a few references at a time. Default is 32.
- references: func or torch.Tensor, optional
If a function is passed in, this function is applied to each sequence with the provided random state and number of shuffles. This function should serve to transform a sequence into some form of signal-null background, such as by shuffling it. If a torch.Tensor is passed in, that tensor must have shape (len(X), n_shuffles, *X.shape[1:]), in that for each sequence a number of shuffles are provided. Default is the function dinucleotide_shuffle.
- n_shuffles: int, optional
The number of shuffles to use if a function is given for references. If a torch.Tensor is provided, this number is ignored. Default is 20.
- return_references: bool, optional
Whether to return the references that were generated during this process. Only use if references is not a torch.Tensor. Default is False.
- hypothetical: bool, optional
Whether to return attributions for all possible characters at each position or only for the character that is actually at the sequence. Practically, whether to return the returned attributions from captum with the one-hot encoded sequence. Default is False.
- warning_threshold: float, optional
A threshold on the convergence delta that will always raise a warning if the delta is larger than it. Normal deltas are in the range of 1e-6 to 1e-8. Note that convergence deltas are calculated on the gradients prior to the aggr_func being applied to them. Default is 0.001.
- additional_nonlinear_ops: dict or None, optional
If additional nonlinear ops need to be added to the dictionary of operations that can be handled by DeepLIFT/SHAP, pass a dictionary here where the keys are class types and the values are the name of the function that handle that sort of class. Make sure that the signature matches those of _nonlinear and _maxpool above. This can also be used to overwrite the hard-coded operations by passing in a dictionary with overlapping key names. If None, do not add any additional operations. Default is None.
- print_convergence_deltas: bool, optional
Whether to print the convergence deltas for each example when using DeepLiftShap. Default is False.
- raw_outputs: bool, optional
Whether to return the raw outputs from the method – in this case, the multipliers for each example-reference pair – or the processed attribution values. Default is False.
- dtype: str or torch.dtype or None, optional
The dtype to use with mixed precision autocasting. If None, use the dtype of the model. This allows you to use int8 to represent large data sets and only convert batches to the higher precision, saving memory. Defailt is None.
- device: str or torch.device, optional
The device to move the model and batches to when making predictions. If set to ‘cuda’ without a GPU, this function will crash and must be set to ‘cpu’. Default is ‘cuda’.
- random_state: int or None or numpy.random.RandomState, optional
The random seed to use to ensure determinism. If None, the process is not deterministic. Default is None.
- verbose: bool, optional
Whether to display a progress bar. Default is False.
Returns
- attributions: torch.tensor
If raw_outputs=False (default), the attribution values with shape equal to X. If raw_outputs=True, the multipliers for each example- reference pair with shape equal to (X.shape[0], n_shuffles, X.shape[1], X.shape[2]).
- references: torch.tensor, optional
The references used for each input sequence, with the shape (n_input_sequences, n_shuffles, 4, length). Only returned if return_references = True.
- tangermeme.deep_lift_shap.hypothetical_attributions(multipliers, X, references)
A function for aggregating contributions into hypothetical attributions.
When handling categorical data, like one-hot encodings, the gradients returned by a method like DeepLIFT/SHAP may need to be modified because the choice of one character at a position explicitly means that the other characters are not there. So, one needs to account for each character change actually being the addition of one character AND the subtraction of another character. Basically, once you’ve calculated the multipliers, you need to subtract out the contribution of the nucleotide actually present and then add in the contribution of the nucleotide you are becomming.
Each element in the tensor is considered an independent example
As an implementation note: to be compatible with Captum, each input must be a tuple of length 1 and the returned value will be a tuple of length 1. I know this sounds silly but it’s the most convenient implementation choice to make the function compatible across DeepLiftShap implementations.
Parameters
- multipliers: tuple of one torch.tensor, shape=(n_baselines, 4, length)
The multipliers/gradient calculated by a method like DeepLIFT/SHAP. These should include values for both the observed characters and the unobserved characters at each position
- X: tuple of one torch.tensor, shape=(n_baselines, 4, length)
The one-hot encoded sequence being explained
- references: tuple of one torch.tensor, shape=(n_baselines, 4, length)
The one-hot encoded reference sequences, usually a shuffled version of the corresponding sequence in X.
Returns
- projected_contribs: tuple of one torch.tensor, shape=(1, 4, length)
The attribution values for each nucleotide in the input.