Instructions to use LocusForge/VariantAssist-Gemma4-31B-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use LocusForge/VariantAssist-Gemma4-31B-LoRA with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-31B-it") model = PeftModel.from_pretrained(base_model, "LocusForge/VariantAssist-Gemma4-31B-LoRA") - Transformers
How to use LocusForge/VariantAssist-Gemma4-31B-LoRA with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LocusForge/VariantAssist-Gemma4-31B-LoRA") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("LocusForge/VariantAssist-Gemma4-31B-LoRA", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LocusForge/VariantAssist-Gemma4-31B-LoRA with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LocusForge/VariantAssist-Gemma4-31B-LoRA" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LocusForge/VariantAssist-Gemma4-31B-LoRA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/LocusForge/VariantAssist-Gemma4-31B-LoRA
- SGLang
How to use LocusForge/VariantAssist-Gemma4-31B-LoRA with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LocusForge/VariantAssist-Gemma4-31B-LoRA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LocusForge/VariantAssist-Gemma4-31B-LoRA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LocusForge/VariantAssist-Gemma4-31B-LoRA" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LocusForge/VariantAssist-Gemma4-31B-LoRA", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use LocusForge/VariantAssist-Gemma4-31B-LoRA with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LocusForge/VariantAssist-Gemma4-31B-LoRA to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LocusForge/VariantAssist-Gemma4-31B-LoRA to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for LocusForge/VariantAssist-Gemma4-31B-LoRA to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="LocusForge/VariantAssist-Gemma4-31B-LoRA", max_seq_length=2048, ) - Docker Model Runner
How to use LocusForge/VariantAssist-Gemma4-31B-LoRA with Docker Model Runner:
docker model run hf.co/LocusForge/VariantAssist-Gemma4-31B-LoRA
variantassist.com · GitHub · License
VariantAssist Gemma 4 31B LoRA
VariantAssist Gemma 4 31B LoRA is a PEFT adapter for adapting Gemma 4 31B IT to the VariantAssist clinical variant-review workflow.
The adapter is trained to improve structured, local-first variant interpretation support: stable input/output format following, parseable JSON generation, and preservation of evidence structure for expert review. It is not a diagnostic device and must not be used as a replacement for a clinician, medical geneticist, laboratory director, or ACMG/AMP-trained reviewer.
Base Model
Upstream base model:
google/gemma-4-31B-it
Training/export used the Unsloth distribution of the same Gemma 4 31B IT model:
unsloth/gemma-4-31B-it
The Unsloth repository was used for training compatibility and export workflow.
The model lineage for this adapter should be treated as google/gemma-4-31B-it -> VariantAssist LoRA adapter.
Intended Use
Use this adapter for:
- VariantAssist-style structured evidence review;
- producing machine-checkable JSON draft outputs;
- local or private deployments after merging/converting the adapter;
- research and reproducibility around the VariantAssist GGUF release.
For normal local inference, most users should use the linked GGUF quantization repository instead of loading the LoRA adapter directly.
Out of Scope
Do not use this adapter for:
- autonomous clinical diagnosis;
- direct patient-facing medical advice;
- final ACMG/AMP classification without expert review;
- interpretation outside the supplied evidence context;
- high-stakes clinical workflows without local validation.
Adapter Details
LoRA adapter, r=32, alpha=32, dropout 0.0, PEFT 0.19.1.
The public adapter repository intentionally excludes optimizer, scheduler, RNG state, and other training-only files.
Minimal Usage
from transformers import AutoModelForCausalLM, AutoProcessor
from peft import PeftModel
base_model = "google/gemma-4-31B-it"
adapter = "LocusForge/VariantAssist-Gemma4-31B-LoRA"
processor = AutoProcessor.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
base_model,
device_map="auto",
dtype="auto",
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()
For production/local serving, merge the adapter into the base model and use the GGUF release.
Prompts, Schema, And Benchmark
For reproducible prompting, use the public ATP7B prompt archive:
That directory contains the system prompt, expected output schema, annotation rules, and per-variant prompt archives. It is the best starting point for users who want to send correctly formatted prompts to the adapter or compare another model against the same benchmark inputs.
Benchmark dataset:
- ATP7B classification benchmark
- 100 ATP7B variants;
- consensus labels from five independent expert annotations;
- strict majority consensus as primary ground truth;
- metrics include exact match, one-step errors, and strong errors.
Benchmark results and figures are maintained with the GGUF/local-inference release:
Use the benchmark repository for inputs and ground truth; use the GGUF release page for result plots and practical local-runtime comparisons.
Training Data
The full fine-tuning corpus is not distributed with this release because it may include clinical-context and literature-derived materials requiring separate privacy and licensing review. Public benchmark data, prompt templates, response schema, and de-identified examples are provided separately to support reproducible evaluation.
Relationship To GGUF Release
The GGUF quantization repository should be treated as the primary user-facing local inference release. Those files are produced by merging this adapter with the base model and converting/quantizing the merged model for llama.cpp-compatible runtimes.
Safety Notes
Outputs should be treated as structured draft material. Every claim should be checked against the supplied evidence, the underlying annotation databases, and applicable clinical laboratory procedures.
Recommended safeguards:
- validate generated JSON against the expected schema;
- preserve source evidence and provenance;
- keep patient-specific context private;
- log model version, adapter version, prompt, and runtime parameters;
- require expert review before any clinical conclusion.
Citation
Citation will be added upon publication.
Links
- Main project: https://github.com/LocusForge/VariantAssist
- Supplementary benchmark: https://github.com/LocusForge/VariantAssist-supplement/tree/main/benchmark
- Upstream base model: https://huggingface.co/google/gemma-4-31B-it
- Training/export base distribution: https://huggingface.co/unsloth/gemma-4-31B-it
- GGUF quantization release: LocusForge/VariantAssist-Gemma4-31B-GGUF
- Website: https://variantassist.com/
- License: Apache License 2.0
- Notice: NOTICE.md
- Downloads last month
- 13