Instructions to use Taykhoom/gLM-650M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Taykhoom/gLM-650M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="Taykhoom/gLM-650M", trust_remote_code=True)# Load model directly from transformers import AutoModelForMaskedLM model = AutoModelForMaskedLM.from_pretrained("Taykhoom/gLM-650M", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
gLM-650M
Minimal HuggingFace port of the 650M parameter variant of gLM2 -- a mixed-modality genomic language model that encodes a genomic scaffold using both amino-acid and DNA tokens. Pretrained with masked language modeling on the OMG dataset.
Architecture
| Parameter | Value |
|---|---|
| Layers | 33 |
| Attention heads | 20 |
| Embedding dimension | 1280 |
| FFN hidden dimension | 3584 (SwiGLU, multiple_of=256) |
| Vocabulary size | 37 |
| Positional encoding | RoPE (base=10000, non-interleaved) |
| Normalization | RMSNorm |
| Architecture | Pre-LN Transformer with SwiGLU FFN |
| Max sequence length | 4096 |
Vocabulary: <cls>, <pad>, <eos>, <unk>, the 26 IUPAC amino-acid
letters (L A G V S E R T I D P K Q N F Y M H W C X B U Z O, uppercase),
the 4 DNA nucleotides (a t c g, lowercase), strand markers <+> / <->,
and <mask> / <sep>. Amino-acid and nucleotide tokens share the alphabet
by case (uppercase = amino acid, lowercase = nucleotide).
Pretraining
- Objective: Masked language modeling (30% mask rate)
- Data: OMG dataset (open metagenomic corpus, semantically-deduplicated)
- Pretraining tokens: 315B (bfloat16, context length 4096)
- Source checkpoint:
tattabio/gLM2_650M
Parity Verification
All 34 representation levels (embedding + 33 transformer blocks) verified to
be bit-exact (max abs diff = 0.00) against the original tattabio/gLM2_650M
weights with attn_implementation="sdpa". The added eager and
flash_attention_2 backends agree within fp32 kernel drift (atol = 1e-3) and
bf16 cosine similarity >= 0.999 respectively. Verified on GPU with PyTorch
2.7 / CUDA 12.
Related Models
See the full gLM2 collection.
Usage
Embedding generation
import torch
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Taykhoom/gLM-650M", trust_remote_code=True)
model = AutoModel.from_pretrained("Taykhoom/gLM-650M", trust_remote_code=True)
model.eval()
# Canonical gLM2 input: amino acids (uppercase) + DNA (lowercase) + strand markers.
sequence = (
"<+>MALTKVEKRNRIKRRVRGKISGTQASPRLSVYKSNK"
"<+>aatttaaggaa"
"<->MLGIDNIERVKPGGLELVDRLVAVNRVTKVTKGGRAFGFSAIVVVGNED"
)
enc = tokenizer([sequence], return_tensors="pt")
with torch.no_grad():
out = model(**enc)
cls_emb = out.last_hidden_state[:, 0, :] # (batch, 1280) -- CLS token
token_emb = out.last_hidden_state # (batch, seq_len, 1280)
# Intermediate layers
out_all = model(**enc, output_hidden_states=True)
layer16_emb = out_all.hidden_states[16] # after block 16
The tokenizer also accepts plain DNA strings (no strand marker) and
auto-prepares them by lowercasing, replacing U/u with t, and prepending
<+>. The three calls below produce identical token sequences:
tokenizer(["ATCGATCG", "atcgatcg", "AUCGAUCG"], return_tensors="pt")
MLM logits
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("Taykhoom/gLM-650M", trust_remote_code=True)
model = AutoModelForMaskedLM.from_pretrained("Taykhoom/gLM-650M", trust_remote_code=True)
model.eval()
enc = tokenizer(["<+>MA<mask>K"], return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits # (1, seq_len, 37)
Faster attention backends
# SDPA (PyTorch 2.0+, default upstream backend) -- recommended for fp32
model = AutoModel.from_pretrained("Taykhoom/gLM-650M", trust_remote_code=True,
attn_implementation="sdpa")
# Flash Attention 2 (requires flash-attn package) -- fastest on long sequences
model = AutoModel.from_pretrained("Taykhoom/gLM-650M", trust_remote_code=True,
attn_implementation="flash_attention_2",
dtype=torch.bfloat16)
Fine-tuning
Standard HF conventions. For sequence-level tasks, pool over non-padding positions or use the CLS token embedding as input to a prediction head.
Implementation Notes
The original gLM2 implementation uses PyTorch SDPA as the only attention
backend. This HF port adds eager and flash_attention_2 as separate
implementations selectable via attn_implementation, with eager falling back
automatically when output_attentions=True is requested.
The eager kernel computes the QK matmul and softmax in fp32 even when the
model is loaded in bf16, matching the numerical behaviour of SDPA and
flash_attention_2 in mixed precision.
Citation
@article{cornman2024_glm2,
title = {The {OMG} dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling},
author = {Cornman, Andre and West-Roberts, Jacob and Camargo, Antonio Pedro and Roux, Simon and Beracochea, Martin and Mirdita, Milot and Ovchinnikov, Sergey and Hwang, Yunha},
journal = {bioRxiv},
year = {2024},
doi = {10.1101/2024.08.14.607850}
}
Credits
Original model and code by Cornman et al. (Tatta Bio). Source:
GitHub,
tattabio/gLM2_650M on the Hub.
The HF conversion code was authored primarily by Claude Code
and reviewed manually by Taykhoom Dalal.
License
Apache 2.0, following the original repository.
- Downloads last month
- 37