Instructions to use insilicomedicine/longevity-llm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use insilicomedicine/longevity-llm with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="insilicomedicine/longevity-llm")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("insilicomedicine/longevity-llm")
model = AutoModelForImageTextToText.from_pretrained("insilicomedicine/longevity-llm")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use insilicomedicine/longevity-llm with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "insilicomedicine/longevity-llm"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "insilicomedicine/longevity-llm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/insilicomedicine/longevity-llm

SGLang

How to use insilicomedicine/longevity-llm with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "insilicomedicine/longevity-llm" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "insilicomedicine/longevity-llm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "insilicomedicine/longevity-llm" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "insilicomedicine/longevity-llm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use insilicomedicine/longevity-llm with Docker Model Runner:
```
docker model run hf.co/insilicomedicine/longevity-llm
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

[This model is part of a manuscript in review. Public access to the model will be granted upon manuscript acceptance. If you wish to use this model earlier than that, please submit you request to one of HF's Insilico Medicine team members on this website]

Longevity-LLM (L-LLM)

A domain-adapted Qwen3.5-9B for aging and longevity biology. L-LLM is the result of continued pretraining + supervised fine-tuning + a reasoning-augmented continuation pass on a multi-domain corpus spanning clinical aging, epigenomics, transcriptomics, proteomics, and genetics. The two trained LoRA adapters were concatenated into a single rank-64 LoRA and merged into the base weights to produce this standalone bf16 checkpoint.

Methods

L-LLM was built by LoRA fine-tuning Qwen3.5-9B, a 9B-parameter hybrid transformer that interleaves Gated DeltaNet linear-attention layers with standard self-attention in a 3:1 ratio. Training data was assembled across three main domains:

Domain	Sources
Knowledge priors	UniProt protein/gene annotations, Gene Ontology, protein–protein interactions, pathway membership; published aging-clock formulas and CpG-site coefficients (Biolearn)
Clinical & epidemiology	NHANES (age, mortality)
Epigenomics	GEO DNA-methylation cohorts, CpG methylation profiles, aging-clock proxy tasks
Transcriptomics	GTEx (tissue age), TCGA (cancer survival), expression-profile generation
Proteomics	Olink plasma-proteomics panels, proteomic clock proxy tasks
Genetics & longevity	OpenGenes (expression directionality), SynergyAge (lifespan), CellAge (senescence), anti-aging target classification
Reasoning corpus	Prediction tasks augmented with frontier-model chain-of-thought traces

Approximate scale across all domains: ≈10⁶ training prompts at the order of several billion tokens total. Exact composition, prompt counts, and token counts will be reported in the forthcoming preprint.

Training proceeded in three stages on Qwen3.5-9B:

Continued pretraining — knowledge priors only, raw text packed into 4,096-token blocks. Rank-32 LoRA with rsLoRA scaling, LR 2 × 10⁻⁵, 3 epochs.
Supervised fine-tuning — aging prediction tasks in conversation format. Rank-32 LoRA initialized from the phase-1 adapter, standard α/r scaling, LR 1 × 10⁻⁴, 3 epochs.
Reasoning continuation — continued the SFT adapter on the reasoning corpus, LR 3 × 10⁻⁵, ≈1 epoch, context length 32,768 with example packing.

All adapters targeted the 12 linear projections including the GatedDeltaNet modules. After training, the CPT and continued-SFT adapters were concatenated into a single rank-64 LoRA and merged into the base weights to produce this checkpoint. DeepSpeed ZeRO-2 on 2× NVIDIA H100 NVL 94 GB, bf16, flash-attention 2 on self-attention layers. Full details in the forthcoming preprint.

Example usage

from transformers.models.qwen3_5 import Qwen3_5ForConditionalGeneration
from transformers import AutoProcessor

model = Qwen3_5ForConditionalGeneration.from_pretrained(
    "insilicomedicine/longevity-llm",
    torch_dtype="bfloat16",
    trust_remote_code=True,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(
    "insilicomedicine/longevity-llm", trust_remote_code=True
)
messages = [
    {"role": "system", "content":
     "You are a helpful assistant with expertise in aging biology."},
    {"role": "user", "content": "What are the hallmarks of aging?"},
]
inputs = processor.apply_chat_template(
    messages, return_tensors="pt", add_generation_prompt=True,
    enable_thinking=False,
).to(model.device)
out = model.generate(inputs, max_new_tokens=400, do_sample=False)
print(processor.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

For vLLM serving (OpenAI-compatible) use --trust-remote-code --dtype bfloat16. Thinking mode is not a server flag — pass per-request chat_template_kwargs={"enable_thinking": true} to opt in.

License

The fine-tuning additions in this repository are released under CC-BY-ND-4.0. The underlying Qwen3.5-9B weights remain under their original Apache-2.0 license.

Citation

@misc{insilicomedicine_longevity_llm_2026,
  title  = {Longevity-LLM: a domain-adapted Qwen3.5-9B for aging biology},
  author = {Insilico Medicine},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/insilicomedicine/longevity-llm}}
}

Downloads last month: 50

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for insilicomedicine/longevity-llm

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

(370)

this model