NanoG β Code release
NanoG β pushing the limits of biology through AI: a cancer foundation model that simulates outcomes inside its own reasoning.
Two-model multimodal cancer foundation-model family (NanoG0 ~95M scout, NanoG1 ~1B production), built on the single-primitive Quatrix routing architecture (Q-Compass / SAVO / MH-QVC). The signature capability: mid-CoT hypothetical simulation β the model emits structured <simulate> blocks during its own reasoning, predicting drug-response curves, histopath appearances, 3D tumour evolution, protein conformations, and spatial-tx states, then conditions downstream reasoning on those simulated outcomes. All in one autoregressive pass, no external tools.
Companion dataset: Abd0r/nanog-cancer-data β TCGA mutations + expression + clinical, COSMIC SBS96, Reactome pathways, AlphaFold structures, PubMed oncology corpus, WSI H&E tiles, TCIA volumetric scans, Visium spatial-tx, plus the 100K mid-CoT trace corpus.
What's in this repo
| Path | What |
|---|---|
quatrix/ |
Q-Compass primitive, encoders, world model, MH-QVC blocks, mHC, FiLM modulation, QACC adaptive depth, MTP head, Muon optimiser |
flash-qcompass/ |
Triton kernel for MH-QVC routing (fp16/bf16 fast path, 6Γ faster than eager) |
bio/ |
TCGA + COSMIC + Reactome + AlphaFold + IDC + Visium loaders + tokenisers |
paper/NanoG1/code/ |
nanog1_model.py (model + decoders), train_nanog1.py (4-phase trainer), multimodal_trace_gen.py (CoT trace synthesis), train_bpe_tokeniser.py, multimodal_tokeniser.py, eval_nanog1.py, opd_train.py, visualise.py |
paper/NanoG1/data/ |
multimodal_cot_traces_100k.jsonl (100K synthetic reasoning traces, 5.9% multimodal-grounded) |
Quickstart
hf download Abd0r/nanog-cancer-code --local-dir ./nanog-code
hf download Abd0r/nanog-cancer-data --repo-type dataset --local-dir ./nanog-data
cd nanog-code
pip install -e ./quatrix
pip install -e ./flash-qcompass
# Sanity-check the architecture
python3 paper/NanoG1/code/nanog1_model.py --preset nanog1_150m # NanoG0
python3 paper/NanoG1/code/nanog1_model.py --preset nanog1_1b # NanoG1
# Train (1-epoch unified pretraining over the full corpus)
python3 paper/NanoG1/code/train_nanog1.py \
--phase 1 --steps 1975000 \
--preset nanog1_150m \
--batch 2 --grad_accum 8 --workers 4 \
--traces_jsonl paper/NanoG1/data/unified_pretrain.jsonl \
--out_dir bio/cancer_checkpoints/nanog0
See paper/NanoG1/README.md for the full architecture description, eval gates G1βG16, and data plan. See paper/NanoG1/PROJECT_DETAIL_README.md for deep technical details.
Model family
| NanoG0 | NanoG1 | |
|---|---|---|
| Params | ~95M | ~1B |
| Hidden / layers / heads | 512 / 20 / 8 | 1280 / 56 / 20 |
| Context | 8 192 | 32 768 |
| Train data | ~2 B tokens (Chinchilla-optimal) | ~20β40 B tokens |
| Compute | Vast.ai 4090, ~$60, 8β9 days | Vast.ai 4090, ~$200, 10β15 days |
| Inference target | RTX 4050 (6 GB) at INT4/INT8 | RTX 4050 (6 GB) at INT4/INT8 |
License
- Code: MIT
- Weights + synthetic CoT traces: OpenRAIL-M (use-based behavioural restrictions)
- All upstream data sources verified compatible
Cite
@article{nanog1_2026,
title = {NanoG: A Unified Quatrix Cancer Foundation Model with Mid-CoT Hypothetical Simulation},
author = {Ali, Syed Abdur Rehman},
journal = {Nature Biotechnology (under review)},
year = {2026}
}
@misc{quatrix_2026,
title = {Quatrix: An Empirical Evaluation of Q-Compass and SAVO on Multimodal Sequence Modeling},
author = {Ali, Syed Abdur Rehman},
year = {2026},
doi = {10.5281/zenodo.19839718}
}