tmap2

ESM Atlas · TMAP

ESM Atlas proteins, mapped with TMAP

I have been testing the new ESMFold2 and ESMC embeddings with the latest version of TMAP (a preprint is coming soon). It runs on protein embeddings out of the box, so this was mostly a matter of pointing it at the data and waiting.

Below are 50,000 metagenomic proteins from the ESM Atlas. Each one is embedded with ESMC-600M and folded with ESMFold2, then laid out as a single map. You can hover a point for its dominant function, click it to see the predicted structure, and color, search, filter or lasso-select from there.

I made two versions of the same set. One comes from the raw ESMC embeddings, the other from the sparse autoencoder features, which are 16,384 interpretable directions pooled per protein. They lay the proteins out quite differently, and that contrast is the part worth poking at. Switch between them below.

hover · click to fold · search · lasso
Map Araw ESMC-600M embeddings, 1,152-d open full screen ↗
Map A From the raw ESMC-600M embeddings. Proteins sit close together when their full embeddings are similar.

What I like about TMAP is that each point is almost always connected to its true nearest neighbor in the full embedding space. The connections actually mean something, more than they do in UMAP, and there is no parameter tuning to get there. About ten lines of code, running on a laptop, all local.

The code

The whole run is one script. The core that produces a map is short:

esm_atlas_tmap.py
from tmap import TMAP

# 1,152-d ESMC-600M embeddings  ->  Map A   (16,384-d SAE features -> Map B)
X = esmc_embeddings                          # shape (N, 1152)

viz = (TMAP(metric="cosine", n_neighbors=20,
            layout_iterations=1000, seed=42)
        .fit(X)                # LSH-forest k-NN, then a minimum spanning tree
        .to_tmapviz())

viz.add_3d_structures(cif_urls, fmt="cif")      # ESMFold2 folds, click to view
viz.add_color_layout("pLDDT", plddt, color="viridis")
viz.write_static("esm_atlas_out/")             # one self-contained page