Instructions to use insilicomedicine/longevity-llm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use insilicomedicine/longevity-llm with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="insilicomedicine/longevity-llm") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("insilicomedicine/longevity-llm") model = AutoModelForImageTextToText.from_pretrained("insilicomedicine/longevity-llm") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use insilicomedicine/longevity-llm with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "insilicomedicine/longevity-llm" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "insilicomedicine/longevity-llm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/insilicomedicine/longevity-llm
- SGLang
How to use insilicomedicine/longevity-llm with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "insilicomedicine/longevity-llm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "insilicomedicine/longevity-llm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "insilicomedicine/longevity-llm" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "insilicomedicine/longevity-llm", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use insilicomedicine/longevity-llm with Docker Model Runner:
docker model run hf.co/insilicomedicine/longevity-llm
[This model is part of a manuscript in review. Public access to the model will be granted upon manuscript acceptance. If you wish to use this model earlier than that, please submit you request to one of HF's Insilico Medicine team members on this website]
Longevity-LLM (L-LLM)
A domain-adapted Qwen3.5-9B for aging and longevity biology. L-LLM is the result of continued pretraining + supervised fine-tuning + a reasoning-augmented continuation pass on a multi-domain corpus spanning clinical aging, epigenomics, transcriptomics, proteomics, and genetics. The two trained LoRA adapters were concatenated into a single rank-64 LoRA and merged into the base weights to produce this standalone bf16 checkpoint.
Methods
L-LLM was built by LoRA fine-tuning Qwen3.5-9B, a 9B-parameter hybrid transformer that interleaves Gated DeltaNet linear-attention layers with standard self-attention in a 3:1 ratio. Training data was assembled across three main domains:
| Domain | Sources |
|---|---|
| Knowledge priors | UniProt protein/gene annotations, Gene Ontology, protein–protein interactions, pathway membership; published aging-clock formulas and CpG-site coefficients (Biolearn) |
| Clinical & epidemiology | NHANES (age, mortality) |
| Epigenomics | GEO DNA-methylation cohorts, CpG methylation profiles, aging-clock proxy tasks |
| Transcriptomics | GTEx (tissue age), TCGA (cancer survival), expression-profile generation |
| Proteomics | Olink plasma-proteomics panels, proteomic clock proxy tasks |
| Genetics & longevity | OpenGenes (expression directionality), SynergyAge (lifespan), CellAge (senescence), anti-aging target classification |
| Reasoning corpus | Prediction tasks augmented with frontier-model chain-of-thought traces |
Approximate scale across all domains: ≈10⁶ training prompts at the order of several billion tokens total. Exact composition, prompt counts, and token counts will be reported in the forthcoming preprint.
Training proceeded in three stages on Qwen3.5-9B:
- Continued pretraining — knowledge priors only, raw text packed into 4,096-token blocks. Rank-32 LoRA with rsLoRA scaling, LR 2 × 10⁻⁵, 3 epochs.
- Supervised fine-tuning — aging prediction tasks in conversation format. Rank-32 LoRA initialized from the phase-1 adapter, standard α/r scaling, LR 1 × 10⁻⁴, 3 epochs.
- Reasoning continuation — continued the SFT adapter on the reasoning corpus, LR 3 × 10⁻⁵, ≈1 epoch, context length 32,768 with example packing.
All adapters targeted the 12 linear projections including the GatedDeltaNet modules. After training, the CPT and continued-SFT adapters were concatenated into a single rank-64 LoRA and merged into the base weights to produce this checkpoint. DeepSpeed ZeRO-2 on 2× NVIDIA H100 NVL 94 GB, bf16, flash-attention 2 on self-attention layers. Full details in the forthcoming preprint.
Example usage
from transformers.models.qwen3_5 import Qwen3_5ForConditionalGeneration
from transformers import AutoProcessor
model = Qwen3_5ForConditionalGeneration.from_pretrained(
"insilicomedicine/longevity-llm",
torch_dtype="bfloat16",
trust_remote_code=True,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(
"insilicomedicine/longevity-llm", trust_remote_code=True
)
messages = [
{"role": "system", "content":
"You are a helpful assistant with expertise in aging biology."},
{"role": "user", "content": "What are the hallmarks of aging?"},
]
inputs = processor.apply_chat_template(
messages, return_tensors="pt", add_generation_prompt=True,
enable_thinking=False,
).to(model.device)
out = model.generate(inputs, max_new_tokens=400, do_sample=False)
print(processor.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
For vLLM serving (OpenAI-compatible) use --trust-remote-code --dtype bfloat16. Thinking mode is not a server flag — pass per-request
chat_template_kwargs={"enable_thinking": true} to opt in.
License
The fine-tuning additions in this repository are released under CC-BY-ND-4.0. The underlying Qwen3.5-9B weights remain under their original Apache-2.0 license.
Citation
@misc{insilicomedicine_longevity_llm_2026,
title = {Longevity-LLM: a domain-adapted Qwen3.5-9B for aging biology},
author = {Insilico Medicine},
year = {2026},
howpublished = {\url{https://huggingface.co/insilicomedicine/longevity-llm}}
}
- Downloads last month
- 50