Instructions to use AnanyaPathak/esmc-300m-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use AnanyaPathak/esmc-300m-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="AnanyaPathak/esmc-300m-gguf", filename="esmc-300m-Q4_K_M.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use AnanyaPathak/esmc-300m-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf AnanyaPathak/esmc-300m-gguf:Q4_K_M
Use Docker
docker model run hf.co/AnanyaPathak/esmc-300m-gguf:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use AnanyaPathak/esmc-300m-gguf with Ollama:
ollama run hf.co/AnanyaPathak/esmc-300m-gguf:Q4_K_M
- Unsloth Studio
How to use AnanyaPathak/esmc-300m-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AnanyaPathak/esmc-300m-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AnanyaPathak/esmc-300m-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AnanyaPathak/esmc-300m-gguf to start chatting
- Docker Model Runner
How to use AnanyaPathak/esmc-300m-gguf with Docker Model Runner:
docker model run hf.co/AnanyaPathak/esmc-300m-gguf:Q4_K_M
- Lemonade
How to use AnanyaPathak/esmc-300m-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull AnanyaPathak/esmc-300m-gguf:Q4_K_M
Run and chat with the model
lemonade run user.esmc-300m-gguf-Q4_K_M
List all available models
lemonade list
ESM-C 300M โ GGUF (esmc.cpp)
GGUF conversions of ESM Cambrian (ESM-C) 300M, an encoder-only protein language model, for fast, low-memory per-residue and per-sequence embeddings on CPU and Apple Metal โ with no Python or PyTorch needed at inference time.
- Runtime:
esmc.cpp(C/C++ on ggml / llama.cpp) - Upstream model: EvolutionaryScale/esmc-300m-2024-12
- Task: feature extraction (protein embeddings)
These files use a custom GGUF architecture (
general.architecture = "esmc") and are not loadable by stockllama.cpp/llama-cli. Use theesmc.cppruntime (theesmc-embedtool) shown below.
Which file should I download?
| File | Size (MiB) | sha256 (first 16) | When to use |
|---|---|---|---|
| esmc-300m-Q4_K_M.gguf | 237.5 | 96c08911822906dc | Smallest with good quality; best 4-bit choice. |
| esmc-300m-Q4_K_S.gguf | 228.1 | 02328ea3555903ef | Smallest footprint; lowest peak RAM. |
| esmc-300m-Q8_0.gguf | 336.9 | d7a57a5ab21c172b | Recommended default โ near-F16 quality at ~half the size. |
| esmc-300m-f16.gguf | 633.5 | 7c37c24e156920bd | Highest fidelity; numerical reference. |
| esmc-300m-f32.gguf | 1266.4 | 7e3e319c9bd00abb | Full precision; mainly the quantization source (largest). |
If unsure, start with esmc-300m-Q8_0.gguf (near-identical to PyTorch at ~half the size). Use Q4_K_M for the smallest deployment with good quality, or F16 when you want the closest possible match to the reference.
Quick start
1. Build the esmc.cpp runtime
git clone --recursive https://github.com/AnanyaP-WDW/esmc.cpp
cd esmc.cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j8
2. Download a model
pip install -U huggingface_hub
huggingface-cli download AnanyaPathak/esmc-300m-gguf esmc-300m-Q8_0.gguf --local-dir ./models
3. Embed a protein sequence
# Mean-pooled sequence embedding -> one vector per sequence ([n_embd])
./build/esmc-embed -m ./models/esmc-300m-Q8_0.gguf \
-s "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGY" \
--pool mean --output embedding.npy
# Per-residue embeddings -> matrix ([n_tokens, n_embd])
./build/esmc-embed -m ./models/esmc-300m-Q8_0.gguf \
-s "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGY" \
--pool none --output residues.npy
# Force CPU (skip the Metal/GPU backend)
./build/esmc-embed -m ./models/esmc-300m-Q8_0.gguf -s "..." --pool mean --no-metal
Outputs are NumPy .npy arrays. Mean pooling strips the <cls>/<eos> tokens.
4. Load the embedding in Python
import numpy as np
emb = np.load("embedding.npy") # mean pool: shape (960,)
res = np.load("residues.npy") # per-residue: shape (n_tokens, 960)
print(emb.shape, res.shape)
Benchmarks (300M)
Measured on an Apple M1 (16 GB) against the official PyTorch ESM-C 300M. Full methodology and per-sequence data are in the esmc.cpp repository.
Numerical fidelity vs PyTorch (per-residue cosine, 100 Swiss-Prot sequences)
| Precision | Aggregate mean cosine | Worst min cosine | Max mean-pool L2 | Pass rate |
|---|---|---|---|---|
| F16 | 0.99999 | 0.9997 | 0.0030 | 100/100 |
| Q8_0 | 0.99971 | 0.9943 | 0.0164 | 100/100 |
| Q4_K_M | 0.99597 | 0.9401 | 0.0656 | 91/100 |
| Q4_K_S | 0.99523 | 0.9281 | 0.0709 | 75/100 |
F16 and Q8_0 clear per-sequence mean cosine > 0.999; Q4_K_M / Q4_K_S clear the aggregate > 0.995 (4-bit misses concentrate in very short sequences).
Throughput (seq/s, best esmc.cpp config vs PyTorch)
| Bucket | Tokens | Best esmc.cpp | seq/s | PyTorch CPU | PyTorch MPS | vs CPU |
|---|---|---|---|---|---|---|
| short | 47 | metal/q4_k_s | 14.54 | 10.31 | 29.29 | 1.41x |
| medium | 235 | metal/q4_k_m | 5.62 | 4.56 | 10.11 | 1.23x |
| long | 850 | metal/q8_0 | 1.33 | 1.74 | 2.83 | 0.76x |
Peak memory (long sequences, 16 GiB budget)
- Lowest peak RAM:
pytorch/pytorch_mps/f32at 282 MiB (long sequences). - Highest peak RAM:
esmc.cpp/cpu/f16at 7426 MiB. - All 12/12 measured configurations fit within a 16 GiB machine.
Downstream variant-effect preservation (ProteinGym, 10 assays x 1000 variants)
| Precision | Assays | Mean abs Spearman delta | Max abs Spearman delta | Metric rows pass |
|---|---|---|---|---|
| F16 | 10 | 0.0006 | 0.0014 | 50/50 |
| Q8_0 | 10 | 0.0031 | 0.0092 | 45/50 |
| Q4_K_M | 10 | 0.0068 | 0.0231 | 38/50 |
| Q4_K_S | 10 | 0.0110 | 0.0258 | 32/50 |
Variants are scored by the cosine between mean-pooled mutant and wild-type embeddings; deltas are versus the PyTorch reference (preservation probe).
Model details
- Architecture: encoder-only transformer; 30 layers, d_model 960, 15 heads (head dim 64), SwiGLU FFN (width 2560), pre-LayerNorm, RoPE-NeoX (theta 10000), query/key LayerNorm, no biases, context length 2048.
- Tokenizer: 33-token amino-acid alphabet;
<cls>prepended and<eos>appended (direct character lookup, no subword splitting). - Provenance: converted from the upstream safetensors checkpoint to GGUF (fused QKV and SwiGLU projections split); quantized variants use ggml block quantization. Weight values are otherwise unchanged from the upstream release.
Verify downloads
shasum -a 256 models/*.gguf # compare against the sha256 column above
Reproduce
The full replication guide (convert, quantize, validate, benchmark) is in the esmc.cpp README.
License
Built with ESM.
These GGUF files are Derivative Works of the ESM-C 300M Open Model and are distributed under the EvolutionaryScale Cambrian Open License Agreement (the permissive license that governs ESM-C 300M), subject to the Acceptable Use Policy. The ESMC 300M Model is licensed under the EvolutionaryScale Cambrian Open License Agreement.
Citation
If you use these models, please cite the ESM Cambrian work by EvolutionaryScale and link the esmc.cpp runtime.
- Downloads last month
- 155
4-bit
8-bit
16-bit
Model tree for AnanyaPathak/esmc-300m-gguf
Base model
biohub/esmc-300m-2024-12