Instructions to use AnanyaPathak/esmc-300m-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AnanyaPathak/esmc-300m-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="AnanyaPathak/esmc-300m-gguf",
	filename="esmc-300m-Q4_K_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use AnanyaPathak/esmc-300m-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M

Use Docker

docker model run hf.co/AnanyaPathak/esmc-300m-gguf:Q4_K_M

LM Studio
Jan
Ollama
How to use AnanyaPathak/esmc-300m-gguf with Ollama:
```
ollama run hf.co/AnanyaPathak/esmc-300m-gguf:Q4_K_M
```

Unsloth Studio

How to use AnanyaPathak/esmc-300m-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AnanyaPathak/esmc-300m-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AnanyaPathak/esmc-300m-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for AnanyaPathak/esmc-300m-gguf to start chatting

Docker Model Runner
How to use AnanyaPathak/esmc-300m-gguf with Docker Model Runner:
```
docker model run hf.co/AnanyaPathak/esmc-300m-gguf:Q4_K_M
```

Lemonade

How to use AnanyaPathak/esmc-300m-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull AnanyaPathak/esmc-300m-gguf:Q4_K_M

Run and chat with the model

lemonade run user.esmc-300m-gguf-Q4_K_M

List all available models

lemonade list

ESM-C 300M — GGUF (esmc.cpp)

GGUF conversions of ESM Cambrian (ESM-C) 300M, an encoder-only protein language model, for fast, low-memory per-residue and per-sequence embeddings on CPU and Apple Metal — with no Python or PyTorch needed at inference time.

Runtime: esmc.cpp (C/C++ on ggml / llama.cpp)
Upstream model: EvolutionaryScale/esmc-300m-2024-12
Task: feature extraction (protein embeddings)

These files use a custom GGUF architecture (general.architecture = "esmc") and are not loadable by stock llama.cpp / llama-cli. Use the esmc.cpp runtime (the esmc-embed tool) shown below.

Which file should I download?

File	Size (MiB)	sha256 (first 16)	When to use
esmc-300m-Q4_K_M.gguf	237.5	96c08911822906dc	Smallest with good quality; best 4-bit choice.
esmc-300m-Q4_K_S.gguf	228.1	02328ea3555903ef	Smallest footprint; lowest peak RAM.
esmc-300m-Q8_0.gguf	336.9	d7a57a5ab21c172b	Recommended default — near-F16 quality at ~half the size.
esmc-300m-f16.gguf	633.5	7c37c24e156920bd	Highest fidelity; numerical reference.
esmc-300m-f32.gguf	1266.4	7e3e319c9bd00abb	Full precision; mainly the quantization source (largest).

If unsure, start with esmc-300m-Q8_0.gguf (near-identical to PyTorch at ~half the size). Use Q4_K_M for the smallest deployment with good quality, or F16 when you want the closest possible match to the reference.

Quick start

1. Build the esmc.cpp runtime

git clone --recursive https://github.com/AnanyaP-WDW/esmc.cpp
cd esmc.cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j8

2. Download a model

pip install -U huggingface_hub
huggingface-cli download AnanyaPathak/esmc-300m-gguf esmc-300m-Q8_0.gguf --local-dir ./models

3. Embed a protein sequence

# Mean-pooled sequence embedding -> one vector per sequence ([n_embd])
./build/esmc-embed -m ./models/esmc-300m-Q8_0.gguf \
    -s "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGY" \
    --pool mean --output embedding.npy

# Per-residue embeddings -> matrix ([n_tokens, n_embd])
./build/esmc-embed -m ./models/esmc-300m-Q8_0.gguf \
    -s "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGY" \
    --pool none --output residues.npy

# Force CPU (skip the Metal/GPU backend)
./build/esmc-embed -m ./models/esmc-300m-Q8_0.gguf -s "..." --pool mean --no-metal

Outputs are NumPy .npy arrays. Mean pooling strips the <cls>/<eos> tokens.

4. Load the embedding in Python

import numpy as np

emb = np.load("embedding.npy")   # mean pool: shape (960,)
res = np.load("residues.npy")    # per-residue: shape (n_tokens, 960)
print(emb.shape, res.shape)

Benchmarks (300M)

Measured on an Apple M1 (16 GB) against the official PyTorch ESM-C 300M. Full methodology and per-sequence data are in the esmc.cpp repository.

Numerical fidelity vs PyTorch (per-residue cosine, 100 Swiss-Prot sequences)

Precision	Aggregate mean cosine	Worst min cosine	Max mean-pool L2	Pass rate
F16	0.99999	0.9997	0.0030	100/100
Q8_0	0.99971	0.9943	0.0164	100/100
Q4_K_M	0.99597	0.9401	0.0656	91/100
Q4_K_S	0.99523	0.9281	0.0709	75/100

F16 and Q8_0 clear per-sequence mean cosine > 0.999; Q4_K_M / Q4_K_S clear the aggregate > 0.995 (4-bit misses concentrate in very short sequences).

Throughput (seq/s, best esmc.cpp config vs PyTorch)

Bucket	Tokens	Best esmc.cpp	seq/s	PyTorch CPU	PyTorch MPS	vs CPU
short	47	metal/q4_k_s	14.54	10.31	29.29	1.41x
medium	235	metal/q4_k_m	5.62	4.56	10.11	1.23x
long	850	metal/q8_0	1.33	1.74	2.83	0.76x

Peak memory (long sequences, 16 GiB budget)

Lowest peak RAM: pytorch/pytorch_mps/f32 at 282 MiB (long sequences).
Highest peak RAM: esmc.cpp/cpu/f16 at 7426 MiB.
All 12/12 measured configurations fit within a 16 GiB machine.

Downstream variant-effect preservation (ProteinGym, 10 assays x 1000 variants)

Precision	Assays	Mean abs Spearman delta	Max abs Spearman delta	Metric rows pass
F16	10	0.0006	0.0014	50/50
Q8_0	10	0.0031	0.0092	45/50
Q4_K_M	10	0.0068	0.0231	38/50
Q4_K_S	10	0.0110	0.0258	32/50

Variants are scored by the cosine between mean-pooled mutant and wild-type embeddings; deltas are versus the PyTorch reference (preservation probe).

Model details

Architecture: encoder-only transformer; 30 layers, d_model 960, 15 heads (head dim 64), SwiGLU FFN (width 2560), pre-LayerNorm, RoPE-NeoX (theta 10000), query/key LayerNorm, no biases, context length 2048.
Tokenizer: 33-token amino-acid alphabet; <cls> prepended and <eos> appended (direct character lookup, no subword splitting).
Provenance: converted from the upstream safetensors checkpoint to GGUF (fused QKV and SwiGLU projections split); quantized variants use ggml block quantization. Weight values are otherwise unchanged from the upstream release.

Verify downloads

shasum -a 256 models/*.gguf   # compare against the sha256 column above

Reproduce

The full replication guide (convert, quantize, validate, benchmark) is in the esmc.cpp README.

License

Built with ESM.

These GGUF files are Derivative Works of the ESM-C 300M Open Model and are distributed under the EvolutionaryScale Cambrian Open License Agreement (the permissive license that governs ESM-C 300M), subject to the Acceptable Use Policy. The ESMC 300M Model is licensed under the EvolutionaryScale Cambrian Open License Agreement.

Citation

If you use these models, please cite the ESM Cambrian work by EvolutionaryScale and link the esmc.cpp runtime.

Downloads last month: 155

GGUF

Model size

0.3B params

Architecture

esmc

Hardware compatibility

4-bit

8-bit

16-bit

Model tree for AnanyaPathak/esmc-300m-gguf

Base model

biohub/esmc-300m-2024-12

Quantized

(1)

this model