ChemSage 32B v5 — Round 5 fused model

ChemSage is a QLoRA-tuned Qwen2.5-32B chemistry assistant for drug discovery, fine-tuned on Apple Silicon with MLX-LM. It reasons about small molecules, emits verified RDKit and PyMOL tool calls, and is grounded by a RAG layer over a curated cheminformatics corpus.

This is the Round 5 fused model — the current production checkpoint, fused from iter 1600 (best available checkpoint; val loss 0.055).

Model details

Base model mlx-community/Qwen2.5-32B-Instruct-4bit
Fine-tune method QLoRA (RSLoRA, rank 64, scale 90.0)
Adapted layers 32 of 64 transformer layers
Trainable parameters 268M (0.82% of 32.8B) — double R4
Training examples 16,000 (from 20,000 total; 80/10/10 split)
Behaviour classes 78 (19 new vs R4)
Training iters 2,000 of 3,000 (early stop — val plateau at 0.055 from iter 1600)
Fused checkpoint iter 1600 (best saved; true best 0.054 at iters 1650/1950 fell between saves)
Best val loss 0.055
Final train loss 0.054
Peak memory 30.3 GB
Fused model size ~17.2 GB
Training duration ~37 h (2026-06-27 to 2026-06-29)

Evaluation (5-round comparative, 2026-06-29)

Evaluated on 100 shared R5 test examples (seed=42) via eval/compare/eval_compare.py. Scores = examples where all instances correct / 100 examples (per-example pass/fail). Full report: eval/compare/results/compare_20260629_1928.html.

Metric R1 R2 R3 R4 R5 (this)
SMILES validity 100% 100% 100% 100% 100%
SMARTS validity N/A 0% 100% 55% 100%
Tool executability 0% 28% 79% 71% 95%
Code attempted 14% 41% 100% 97% 100%
Python extended 36% 34% 78% 69% 99%
Code-then-quote N/A 0% 47% 19% 61%
Numerical fidelity N/A 18% 57% 47% 89%
Rounding precision 100% 100% 98% 98% 99%
Refusal accuracy 98% 98% 97% 98% 100%
QED range 100% 100% 100% 100% 100%
PDB ID validity 100% 100% 100% 100% 100%
PyMOL syntax 77% 97% 89% 89% 90%
Degeneration-free 93% 96% 91% 98% 100%
Overall 72% 63% 87% 80% 95%

Usage

# Download from HuggingFace
huggingface-cli download Dellboy/chem_sage_32b_v5 --local-dir models/chem_sage_32b_v5

mlx_lm.server --model models/chem_sage_32b_v5 --port 8081
.venv/bin/python scripts/chat.py --model models/chem_sage_32b_v5

Round 5 additions (78 behaviour classes, 19 new vs R4)

New generators targeting R4 exec/fidelity gaps and breadth:

Category New classes
Exec drills pyexec_drill (×6), code_then_quote_v2 (×5), rounding_explicit (×5), fidelity_multistep (×5)
ADMET / SAR herg_liability, selectivity_profile, prodrug_bcs, sar_delta
Cheminformatics mdanalysis, conformer_3d, mcs_search, reaction_smarts, recap_fragmentation
Structural biology dssp, ppi_interface, electron_density, biological_assembly
Drug targets drug_target_family (kinases, GPCRs, NHRs)
External APIs uniprot_api

Training history

Round Val loss Improvement
Round 1 (7B) 0.389
Round 2 (32B) 0.347 10.8% over R1
Round 3 (32B) 0.054 6.4x over R2
Round 4 (32B) 0.041 24% over R3
Round 5 (32B, this model) 0.055 early stop; harder task (78 classes, 20k examples)

R5's higher val floor (0.055 vs R4's 0.041) reflects the harder task: 78 behaviour classes vs 59, 20k examples vs 6.4k, rank=64 adapter. The model is more capable but the task is larger.

Built by

Marc C. Deller, D.Phil. · marcdeller.com · marc@marcdeller.com

Downloads last month
-
Safetensors
Model size
33B params
Tensor type
F16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Dellboy/chem_sage_32b_v5

Base model

Qwen/Qwen2.5-32B
Quantized
(4)
this model