Release History

Version 1.4.0

design

Adds design.beam_substitution, a beam-search generalization of greedy_substitution. Instead of committing to the single best edit each round, it keeps the beam_size lowest-loss complete sequences and expands all of them, so it can recover good multi-edit combinations that the greedy search prunes after a locally-suboptimal first edit. beam_size=1 reproduces greedy_substitution exactly. Current beam members are carried forward (the beam never regresses), candidates are ranked by absolute loss, and identical sequences are de-duplicated to avoid the beam collapsing onto a single sequence. n_best returns the lowest-loss sequences ranked low-to-high. Unlike the greedy functions, max_iter=-1 means no iteration limit (with tol as the stop), matching screen.

Reorganizes tangermeme.design from a single module into a subpackage with one module per algorithm (screen, greedy_substitution, beam_substitution, greedy_marginalize, plus a private _substitute numba kernel). This is purely structural: every function is re-exported from tangermeme.design, so imports such as from tangermeme.design import greedy_substitution are unchanged.

Documentation

Corrects the stale greedy_substitution and greedy_marginalize call signatures in the README and the design tutorial; the current order is (model, X, y, motifs, ...) with the output_mask= keyword. The design tutorial (Tutorial B6) was re-executed against the Beluga model and gains a beam-search section comparing beam_substitution to greedy_substitution.

Renders screen and greedy_marginalize on the design API page; they were previously omitted from the autodoc members list.

Version 1.3.0

Claude Code skill

Bundles a Claude Code Agent Skill (a SKILL.md router plus on-demand reference files) that documents tangermeme’s API contracts, footguns, and multi-step workflows for coding agents.

Adds the tangermeme-install-skills console script, which copies the bundled skill into ~/.claude/skills/ so it is available to Claude Code in every project. Use --force to refresh after upgrading or --print-path for the CLAUDE_SKILLS_PATH route.

saturation_mutagenesis

Fixes saturation_mutagenesis for models that return multiple output tensors: the per-output reshape previously transposed the alphabet and length axes, returning scrambled values on the default span and raising when start/end subset the sequence.

Skips the identity substitution at each position (the “edit” that re-applies the existing base), reconstructing those slots from the reference prediction y0 instead of recomputing them. This removes ~25% of the model forward passes with no change to the output, for a wall-clock speedup that approaches 25% as the model becomes inference-bound.

Adds a func= argument, forwarded to predict and applied identically to the reference and perturbed predictions (e.g. to select an output head or apply a final non-linearity).

Validates that 0 <= start < end <= length rather than silently producing out-of-bounds edits, and raises a clear error when a multi-output model is used without raw_outputs=True.

Warns (TangermemeWarning) when X holds non-integer values, which the internal int8 cast would otherwise truncate toward zero.

plot

Adds plot.interactive_logo, an interactive counterpart to plot_logo. Annotations are drawn as translucent, pastel boxes (colored by any annot_cmap) behind the logo glyphs, with the motif name in the box corner and a hover tooltip listing the length and every column of the annotation (e.g. seqlet p-value, annotation p-value, summed attribution). Interactivity is provided by mpld3, available via the optional interactive extra (pip install tangermeme[interactive]).

Extends the color argument of plot.plot_logo to accept a per-position array-like in addition to the existing None/str/dict (per-character) forms. Pass a length-matched sequence of either color specifications (names, hex strings, or RGB(A) values), used verbatim, or numeric values, mapped through color_cmap with optional color_vmin/color_vmax bounds. The array-like is sliced alongside X_attr; an array-like whose length does not match the sequence raises a TangermemeWarning and falls back to per-character coloring. plot.interactive_logo accepts the same color forms and forwards color_cmap/color_vmin/color_vmax.

Version 1.2.0

Highlights

Perturbation-style entry points now return public NamedTuple objects so results can be accessed by attribute (result.y_before) while remaining tuple-compatible with the existing positional-unpacking API.

Every public module ships type hints.

predict, deep_lift_shap, pisa, product.*, saturation_mutagenesis, and design.* default device=None, which resolves to CUDA if available and falls back to CPU. The caller’s model device and training mode are restored after the call.

deep_lift_shap and pisa now preserve model state and clean up registered hooks even when the call raises.

additional_func_kwargs is now copied defensively in ablate, marginalize, space, product.*, variant_effect.*, and design.screen; passing a dict no longer mutates the caller’s object.

Substantial bug fixes, new diagnostics, broad test backfill, and a full docstring sweep.

Breaking changes

plot.plot_pwm no longer manages its own figure; callers must pass an ax= matplotlib axis. The function previously created (and sometimes leaked) its own figure.

results (new module)

Adds PerturbationResult (ablate, marginalize, variant_effect.*), PerturbationAnnotationsResult (ablate_annotations, marginalize_annotations), AttributionReferencesResult (deep_lift_shap, pisa with return_references=True), SpaceResult (space), and SaturationMutagenesisRawResult (saturation_mutagenesis(raw_outputs=True)).

All of these subclass tuple, so positional unpacking and isinstance(_, tuple) continue to work unchanged.

predict

device=None auto-resolves to CUDA / CPU; caller’s model device and training mode are preserved.

Empty-input rejection with a clear error.

deep_lift_shap / pisa

Return AttributionReferencesResult when return_references=True.

device=None auto-resolves; model state and hooks are restored on success and on exception.

pisa now threads args through every shuffle iteration (previously dropped).

ablate / marginalize / space

Return PerturbationResult / SpaceResult.

additional_func_kwargs is copied defensively (no longer mutated).

New device-mismatch errors instead of silent failures when motifs/args sit on a different device than X.

plot_attributions (and related plot helpers) now honor the func= argument.

variant_effect

substitution_effect / deletion_effect / insertion_effect return PerturbationResult.

deletion_effect rejects X shorter than the maximum deletion.

additional_func_kwargs is copied defensively in all three sub-functions.

saturation_mutagenesis

raw_outputs=True returns SaturationMutagenesisRawResult.

device=None auto-resolves to CUDA / CPU.

design

device=None auto-resolves; additional_func_kwargs is copied defensively in screen.

product

apply_pairwise / apply_product auto-detect device and preserve model state.

apply_pairwise rejects mismatched args lengths and empty inputs with a clear error.

ersatz

dinucleotide_shuffle now works on CUDA-resident inputs.

randomize accepts end == X.shape[-1].

Switched print calls to warnings.warn and dropped unused imports.

io

extract_loci keeps loci whose window ends exactly at the chromosome end.

extract_loci accepts a pre-opened pyfaidx.Fasta and respects ownership semantics.

_extract_locus_signal uses warnings.warn and narrows bare except clauses.

annotate

pairwise_annotations_spacing fixes an IndexError at max_distance and rejects empty annotations with a clear error.

match

GC-bin 0 no longer receives spillover from higher bins.

Narrowed bare except clauses in the bigwig extraction helper.

kmers

gapped_kmers properly handles scores=None.

seqlet

Empty-input validation added to public entry points.

plot

plot_pwm gains an ax= parameter and no longer manages its own figure (see Breaking changes).

plot_attributions honors the func= argument.

plot_categorical_scatter respects a user-supplied ax=.

place_new_box / place_new_bar no longer mutate the caller’s Bbox.

Narrowed bare ImportError clauses.

utils

New diagnostics: set_seed, gc_content, entropy, information_content.

_validate_input accepts all-zero one-hot columns when allow_N=True.

print calls replaced with warnings.warn.

Documentation / tests

Type hints added across the entire public surface.

Comprehensive new test coverage for func= plug-points, CUDA paths, verbose=True smoke, args= plumbing, dtype matrices, edge cases, and regression values.

Cross-module integration tests for the func= plug-point in ablate, marginalize, space, product.*, and variant_effect.*.

Sweeping docstring corrections across ablate, annotate, deep_lift_shap, design, ersatz, io, kmers, marginalize, match, pisa, plot, predict, product, saturation_mutagenesis, seqlet, space, utils, and variant_effect (return types, device assumptions, validation behavior, kwarg collisions, dtype coercion footguns).

Version 1.1.0

Highlights

Migrated the build/install workflow from setup.py to pyproject.toml with the hatchling build backend.

Added first-class support for uv for development and reproducible environments. uv sync --extra dev now sets up the contributor environment from uv.lock.

End users can still pip install tangermeme exactly as before. The wheel and sdist are standard PyPI artifacts; the migration is invisible at install time.

Packaging

The minimum supported Python version is now 3.10. The CI matrix runs on 3.10, 3.11, 3.12, and 3.13.

Dependency floors have been tightened to reflect what the code actually uses, replacing pre-2022 minima inherited from the original setup.py:

numpy >= 1.23

scipy >= 1.10

pandas >= 2.0

torch >= 2.0

scikit-learn >= 1.3

numba >= 0.58

pybigtools >= 0.2

memelite >= 0.2

New [dev] extra bundles the contributor toolchain (pytest, captum, ruff, build, twine).

New [docs] extra bundles the Sphinx documentation toolchain. ReadTheDocs now installs via this extra instead of a separate docs/requirements.txt file.

The package version is now sourced dynamically from tangermeme/__init__.py so the literal lives in one place.

CI / Tooling

The GitHub Actions workflow now uses astral-sh/setup-uv with caching, reducing matrix install time by roughly an order of magnitude.

The flake8 CI step (whose checks were already commented out) has been removed; ruff is available via the [dev] extra for contributors who want to lint locally.

A [tool.pytest.ini_options] block registers the cmd marker and pins the not cmd default so the documented test invocation works without command-line flags.

Documentation

docs/conf.py now sources the displayed version from the installed package metadata rather than a hard-coded literal.

docs/api/variant_effect.rst has been updated to reference the current function names (substitution_effect, deletion_effect, insertion_effect).

docs/api/ism.rst has been renamed to docs/api/saturation_mutagenesis.rst to match the actual module name.

docs/api/deep_lift_shap.rst, docs/api/plot.rst, and docs/api/saturation_mutagenesis.rst are now included in the API toctree.

Removed stale references to Tutorial_B8_Seqlets, Tutorial_D1_FIMO, and Tutorial_D2_TOMTOM from docs/index.rst.

Version 1.0.3

Highlights

Sped up saturation_mutagenesis moderately by replacing _edit_distance_one with a numba function.

Fixed a bug in saturation_mutagenesis when the sequence passed in is not on the CPU.

Removed the upper bound on numpy requirements, so people can use numpy >= 2.0.0

Added in a only_warn option to _validate_input and deep_lift_shap to override warnings if needed.

Version 1.0.2

Highlights

Fixed extract_matching_loci not respecting the chrom parameter

Version 1.0.1

Highlights

The slowest unit tests have been refocused, bringing total unit test time from ~75s to ~33s.

deep_lift_shap

Add conversion to model dtype to improve usability, and appropriate unit test.

design

Changed greedy_substitution without y to not make a pseudo target and instead truly just try to maximize predictions. In principle, the pseudo target works identically, but can lead to overflow of values in some settings and is generally less precise.

match

Set default n_jobs from -1 to 1 to avoid Child Process Errors on small tasks.

Version 1.0.0

Highlights

Our first major release, corresponding to the paper publication.

Changes the recursive_seqlet calling algorithm slightly to be more principled

Adds in new design methods and features

seqlets

The recursive_seqlet algorithm has been slightly altered to make the calculated p-values more faithful. Rather than calculating a null as the empirically observed attribution sum across different lengths, where the “p-value” is just the probability that the observed attribution is higher, null distributions for different lengths are inferred from the previous lengths

design

screen is added in as a new design method that randomly generates sequences and chooses the one with the best predictions. Each batch is fast because nothing special is done, but also each batch is independent from the others and so there is no guarantee that each iteration yields better results

Design methods now allow you to not pass in a y target value and instead will try to just maximize the predictions.

Version 0.5.1

Highlights

Add summits to extract_loci to center on summits when a BED10 file is provided

Improve casting of indexes for variant effect predictions

Slight improvement to the usability of deep_lift_shap

Version 0.5.0

Highlights

The Tomtom and FIMO tools have been moved to memesuite-lite so they can be used without a PyTorch dependency

All internals tools that used Tomtom and FIMO now call the memesuite-lite versions

annotate

The call to tomtom now goes to memesuite-lite

io

read_meme now calls the memesuite-lite function and wraps the numpy arrays into torch tensors.

return_filtered has been added as an optional parameter to extract_loci where, if set to true, returns a list of indexes for the loci that are kept or discarded. Note that the indexes are into the INTERLEAVED LOCI, not the original set of indices.

plot

Improved the placement of annotation labels in plot_logo. Thanks Nikolaus Mandlburger!

Fixed a bug where annotations were extended an additional basepair to the right

seqlet

The recursive_seqlet algorithm has been slightly modified to more closely match the provided description. This change involves using the calculated p-values instead of the maximum p-value for each position across all seqlets of smaller size. As a consequence, motifs should no longer be shifted to the right.

utils

Added a example_to_fasta_coordinates which will convert the relative coordinates in examples to exact coordinates on the genome when provided with a BED file of examples and a FASTA file. This is useful if you have seqlet coordinates for each example and need to convert them to positions on the genome.

Version 0.4.4

io

Added one_hot_to_fasta which takes a 3D one-hot encoded tensor and an optional list of headers and outputs a FASTA file with those sequences.

plot

Added plot_attributions which wraps the calculation and the visualization of attributions between multiple models and multiple sequences.

Added show_score to plot_logo where you can optionally hide the score from the visualization

predict

Added dtype to predict, which will autocast the model and the data to the desired dtype to increase speed. Currently only supports the dtypes supported by torch.autocast. This allows datasets to be represented as torch.uint8 and only converted to higher precision in each batch, yielding significant memory savings.

tools/fimo

Fixed a bug to allow dict[str: numpy.ndarray] to be used for the motifs. Thanks @SeppeDeWinter!

Version 0.4.3

ersatz

Substitute now accepts Ns or all-zero positions as inputs and, at those positions, will not alter the original sequence. If only one motif is given, this will be the same across all background sequences. If one motif is given per background sequence, this is done on a per-background example.

The above change means that higher-level functions like marginalize can now be run with motifs that contain missing characters, without any changes needed.

The default start and end of dinucleotide_shuffle have been set to None because using 0 and -1 meant that the last provided position never got shuffled.

design

Changed mask parameter to output_mask

Added input_mask which restricts what positions can be the start of motifs, so design can be restricted to subsets of the sequence or certain important elements can be ignored.

Significantly sped up the creation of sequences with tiled motifs implanted using a numba function, which can speed up design 3-10x.

Added in greedy_marginalize which design constructs using marginalizations

Version 0.4.1

Highlights

plots

Fixed a bug where plot_logo raises an error when start and end are not provided but annotations are.

Fixed a bug where plot_logo plots annotations using calls to plt instead of directly on the provided artboard.

tools

Sped up tomtom by using more compact dtypes and avoiding cache misses

Added symmetric_tomtom which takes in a set of items and orders them such that the smaller item is always the query and the larger one is always the target. This reduces the number of background distributions that need to be made from a quadratic number to a linear one, significantly speeding up the algorithm.

utils

Added reverse_complement function that can convert one-hot encodings and strings. Thanks @Al-Murphy!

Version 0.4.0

Highlights

At a high level, this release focuses on quick ways to understand what a model has learned. This means extending seqlet calling functionality as well as introducing handling of annotations, which are any sort of notation of span along the genome – seqlet calls, motif matches, and hit calls.

annotate

Added in a new file for handling annotations.

Includes a count_annotations function for converting a sparse list of annotations into a dense matrix of counts.

Also includes a pairwise_annotations function for looking at pairs of motifs that are learned.

Also includes a pairwise_annotations_space function for looking at spacing between pairs of functions.

Also includes an annotate_seqlet function for annotating seqlets using TOMTOM and a reference database.

seqlet

Added in recursive_seqlets, which calls seqlets using a recursive definition that all spans within a seqlet must also be independently called as seqlets.

plot

Added in plot_pwm that takes in a PWM whose rows sum to 1 and plots the information content weighted characters as well as the reverse complement.

utils

Added in a pwm_consensus function that takes in a single PWM and returns a one-hot encoded version of the consensus sequence.

Added in an extract_signal function for extracting sums over variable-length spans from tensors.

Version 0.3.0

Highlights

Added in a new TOMTOM implementation and a revamped FIMO implementation

TOMTOM and FIMO both have command-line tools in tangermeme

FIMO

The PyTorch implementation has been exchanged for a numba based one.

The new signature is a single function called fimo

A command-line tool can be used with the signature tangermeme fimo …

TOMTOM

A numba-based implementation has been added in the function tomtom

A command-line tool can be used with the signature tangermeme tomtom …

utils

chunk and unchunk have been added in to chunk long sequences into blocks that can be operated on by methods with fixed-window inputs, such as machine learning models, and for converting the predictions from these approaches back into a contiguous format.

match

Implemented updates to substantially reduce memory use and runtime of extract_matching_loci. This was mainly achieved by:

Avoid using io.extract_loci, which one hot encodes all loci into a single large tensor. Instead, the locus sequences are extracted one by one, keeping only one in memory at a time. The N and GC percentages are calculated directly from the sequence, and only those values are stored.

Calculate genome wide N and GC percentages by taking slices of the chromosomal DNA sequences and using the count method of python strings. This is significantly faster than the previous approach using numpy isin, and avoids keeping several copies of the sequence in memory at the same time.

Various other changes:

Counts from regions that cannot be extracted from a provided bigwig file (such as for a missing chromosome) are now set to nan rather than 0. This will affect the threshold value used for filtering background regions.

Small change to the binning strategy for gc values, which could mean that matching loci generated in a previous version will not be reproduced exactly in all cases, even when using the same random seed.

Enable the handling of ‘N’ in sequences or [0,0,0,0], i.e., ambiguous genomic positions. Updated the characters() and the _validate_input() in utils module to enable this.

Version 0.2.3

match

Expanded the ignore parameter to ignore all non-ACGT characters.

Version 0.2.2

plot

Fixed issue in plot_logo raised by @sandyfloren where passing in annotations without passing in start or end would raise an error. Now, start defaults to 0 and end defaults to the length of the sequence.

tools

FIMO is now base 2 instead of base e, to better match the MEME-suite tool. p-values should remain the same but scores will change.

FIMO hits will now return p-values, and will no longer return an uninformative attr column

product

apply_pairwise has been added along with documentation and unit tests

match

Fixes an issue with trying to calculate the mean over an array of integers by changing the array to be dtype float. via @adamyhe

Version 0.2.1

deep_lift_shap

Removed the autocasting to 32-bit floats, enabling attributions to be calculated at other resolutions

Removes ~100 LOC and the DeepLiftShap object, integrating that code directly into the deep_lift_shap function

Only assigns hooks once at the beginning of the function and clears them upon an error or completion of function, instead of assigning and clearing hooks every batch

Version 0.2.0

Highlights

Alters the API of several functions to make them more general, with the option of taking in a function to apply instead of defaulting to predict, while still backwards compatible

Adds in deep_lift_shap and seqlet to operate on attributions

deep_lift_shap

Added in a stand-alone implementation of deep_lift_shap

This implementation resolves several issues with Captum, e.g., with pooling layers

Allows batching of example-reference pairs across examples (so batch_size can be > than n_shuffles)

Allows batch_size to be much smaller than n_shuffles with the results aggregated once all references have been processed to allow large models to be run

Allows additional non_linear operations to be registered by passing in a dictionary

Allows the raw multipliers to be returned with raw_output=True or the aggregated attribution scores

ism

Changes the default output from the raw output (which you can get with raw_output=True) to aggregated attribution values to make the API compatible with the rest of the library

marginalize

Change the signature to take in an optional function that gets applied before/after the substitution, default is predict

Change the signature to take in **kwargs that get passed into the optional function

Change the signature to take in additional_func_kwargs that is an alternative and safer way to pass arguments into the function

ablate

Change the signature to take in an optional function that gets applied before/after the ablation, default is predict

Change the signature to take in **kwargs that get passed into the optional function

Change the signature to take in additional_func_kwargs that is an alternative and safer way to pass arguments into the function

space

Change the signature to take in an optional function that gets applied before/after the substitutions, default is predict

Change the signature to take in **kwargs that get passed into the optional function

Change the signature to take in additional_func_kwargs that is an alternative and safer way to pass arguments into the function

variant_effect

Change the name of marginal_substitution_effect to substitution_effect

Change the API of substitution_effect to take in a tensor of original sequences and a tensor of substitutions

Change the API of substitution_effect to take in an optional function and **kwargs and additional_func_kwargs to pass into func

Change the name of marginal_deletion_effect to deletion_effect

Change the API of deletion_effect to take in a tensor of original sequences and a tensor of deletions

Change the API of deletion_effect to take in an optional function and **kwargs and additional_func_kwargs to pass into func

Change the name of marginal_insertion_effect to insertion_effect

Change the API of insertion_effect to take in a tensor of original sequences and a tensor of insertions

Change the API of insertion_effect to take in an optional function and **kwargs and additional_func_kwargs to pass into func

seqlet

Add a new file for the identification of seqlets

Add tfmodisco_seqlets which is a simplified and documented version of the seqlet calling in tfmodisco that returns dataframes

Version 0.1.0

Highlights

This is the first major release of tangermeme and contains the first version of the core functionality.

ersatz

This module implements common sequence manipulation methods such as substitutions, insertions, deletions, and shufflings of sequences.

predict

This module implements efficient batched prediction that can handle models that accept multiple inputs or multiple outputs.

marginalize

This module implements marginalization experiments, where predictions are made for a set of sequences, a motif is substituted into the middle, and then new predictions are made for the new sequences.

space

This module implements spacing experiments where predictions are made for a set of sequences, a set of motifs are inserted with a given spacing, and then new predictions are made for the new sequences.

io

This module implements I/O functions for common data types as well as for extracting examples for machine learning models.

ism

This module implements in silico saturated mutagenesis (ISM).

variant_effect

This module implements functions for evaluating the marginal effect of variants on model predictions.

Version 0.0.1

Highlights

Initial release