Instructions to use LocusForge/VariantAssist-Gemma4-31B-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LocusForge/VariantAssist-Gemma4-31B-LoRA with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-31B-it")
model = PeftModel.from_pretrained(base_model, "LocusForge/VariantAssist-Gemma4-31B-LoRA")

Transformers

How to use LocusForge/VariantAssist-Gemma4-31B-LoRA with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LocusForge/VariantAssist-Gemma4-31B-LoRA")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("LocusForge/VariantAssist-Gemma4-31B-LoRA", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use LocusForge/VariantAssist-Gemma4-31B-LoRA with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LocusForge/VariantAssist-Gemma4-31B-LoRA"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LocusForge/VariantAssist-Gemma4-31B-LoRA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LocusForge/VariantAssist-Gemma4-31B-LoRA

SGLang

How to use LocusForge/VariantAssist-Gemma4-31B-LoRA with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LocusForge/VariantAssist-Gemma4-31B-LoRA" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LocusForge/VariantAssist-Gemma4-31B-LoRA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LocusForge/VariantAssist-Gemma4-31B-LoRA" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LocusForge/VariantAssist-Gemma4-31B-LoRA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use LocusForge/VariantAssist-Gemma4-31B-LoRA with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for LocusForge/VariantAssist-Gemma4-31B-LoRA to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for LocusForge/VariantAssist-Gemma4-31B-LoRA to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for LocusForge/VariantAssist-Gemma4-31B-LoRA to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="LocusForge/VariantAssist-Gemma4-31B-LoRA",
    max_seq_length=2048,
)

Docker Model Runner
How to use LocusForge/VariantAssist-Gemma4-31B-LoRA with Docker Model Runner:
```
docker model run hf.co/LocusForge/VariantAssist-Gemma4-31B-LoRA
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

variantassist.com · GitHub · License

Compatibility note: this VariantAssist-tuned adapter is currently intended only for Level-1 Annotation. For other VariantAssist workflow stages, use the original Q8 model rather than this tuned adapter.

VariantAssist Gemma 4 31B LoRA

VariantAssist Gemma 4 31B LoRA is a PEFT adapter for adapting Gemma 4 31B IT to the VariantAssist clinical variant-review workflow.

The adapter is trained to improve structured, local-first variant interpretation support: stable input/output format following, parseable JSON generation, and preservation of evidence structure for expert review. It is not a diagnostic device and must not be used as a replacement for a clinician, medical geneticist, laboratory director, or ACMG/AMP-trained reviewer.

Base Model

Upstream base model:

google/gemma-4-31B-it

Training/export used the Unsloth distribution of the same Gemma 4 31B IT model:

unsloth/gemma-4-31B-it

The Unsloth repository was used for training compatibility and export workflow. The model lineage for this adapter should be treated as google/gemma-4-31B-it -> VariantAssist LoRA adapter.

Intended Use

Use this adapter for:

VariantAssist-style structured evidence review;
producing machine-checkable JSON draft outputs;
local or private deployments after merging/converting the adapter;
research and reproducibility around the VariantAssist GGUF release.

For normal local inference, most users should use the linked GGUF quantization repository instead of loading the LoRA adapter directly.

Out of Scope

Do not use this adapter for:

autonomous clinical diagnosis;
direct patient-facing medical advice;
final ACMG/AMP classification without expert review;
interpretation outside the supplied evidence context;
high-stakes clinical workflows without local validation.

Adapter Details

LoRA adapter, r=32, alpha=32, dropout 0.0, PEFT 0.19.1. The public adapter repository intentionally excludes optimizer, scheduler, RNG state, and other training-only files.

Minimal Usage

from transformers import AutoModelForCausalLM, AutoProcessor
from peft import PeftModel

base_model = "google/gemma-4-31B-it"
adapter = "LocusForge/VariantAssist-Gemma4-31B-LoRA"

processor = AutoProcessor.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    device_map="auto",
    dtype="auto",
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()

For production/local serving, merge the adapter into the base model and use the GGUF release.

Prompts, Schema, And Benchmark

For reproducible prompting, use the public ATP7B prompt archive:

That directory contains the system prompt, expected output schema, annotation rules, and per-variant prompt archives. It is the best starting point for users who want to send correctly formatted prompts to the adapter or compare another model against the same benchmark inputs.

Benchmark dataset:

ATP7B classification benchmark
100 ATP7B variants;
consensus labels from five independent expert annotations;
strict majority consensus as primary ground truth;
metrics include exact match, one-step errors, and strong errors.

Benchmark results and figures are maintained with the GGUF/local-inference release:

VariantAssist Gemma 4 31B GGUF benchmark results

Use the benchmark repository for inputs and ground truth; use the GGUF release page for result plots and practical local-runtime comparisons.

Training Data

The full fine-tuning corpus is not distributed with this release because it may include clinical-context and literature-derived materials requiring separate privacy and licensing review. Public benchmark data, prompt templates, response schema, and de-identified examples are provided separately to support reproducible evaluation.

Relationship To GGUF Release

The GGUF quantization repository should be treated as the primary user-facing local inference release. Those files are produced by merging this adapter with the base model and converting/quantizing the merged model for llama.cpp-compatible runtimes.

Safety Notes

Outputs should be treated as structured draft material. Every claim should be checked against the supplied evidence, the underlying annotation databases, and applicable clinical laboratory procedures.

Recommended safeguards:

validate generated JSON against the expected schema;
preserve source evidence and provenance;
keep patient-specific context private;
log model version, adapter version, prompt, and runtime parameters;
require expert review before any clinical conclusion.

Citation

Citation will be added upon publication.

Model tree for LocusForge/VariantAssist-Gemma4-31B-LoRA

Base model

google/gemma-4-31B

Finetuned

google/gemma-4-31B-it

Adapter

(98)

this model

Quantizations

1 model

LocusForge
/

VariantAssist-Gemma4-31B-LoRA