LLM4Variants-Qwen3-4B
Dual-head sentence classifier for ACMG evidence-code + strength prediction on ClinVar submission comments. This is rank #1 of the grid search (ranked by joint test accuracy).
The model wraps the backbone Qwen/Qwen3-4B with two heads on top of mean-pooled
hidden states:
- code head — 28-way ACMG evidence code (
PVS1, PS1–PS4, PM1–PM6, PP1–PP5, BA1, BS1–BS4, BP1–BP7+NO_KEYWORD) - strength head — 6-way strength, conditioned on a learned embedding of the
predicted code (
Supporting, Moderate, Strong, VeryStrong, NotMet, NoStrength)
Test metrics
| Metric | Value |
|---|---|
| Code accuracy | 0.9309 |
| Strength accuracy | 0.9374 |
| Joint accuracy | 0.8822 |
| Strength acc | correct code | 0.9477 |
| Code weighted-F1 | 0.9307 |
| Strength weighted-F1 | 0.9351 |
Training configuration
| Hyperparameter | Value |
|---|---|
| Learning rate | 0.0001 |
| Effective batch size | 128 |
| Epochs | 8 |
| Max length | 256 |
| λ (strength loss) | 1.0 |
| Code emb dim | 64 |
| Negative ratio | 0.25 |
| Seed | 42 |
| Train / val / test size | 19161 / 1278 / 5110 |
Files
model.safetensors— full state dict (backbone +code_head+code_embeddings+strength_head).label_mappings.json—keyword2id/strength2id(and reverse).- tokenizer files +
chat_template.jinja.
Loading
This is a custom nn.Module (DualHeadLLM), not a transformers
AutoModel. Reconstruct the module (see train_dual_head.py), then load the
weights:
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download
model = DualHeadLLM("Qwen/Qwen3-4B", num_keywords=28, num_strengths=6)
state = load_file(hf_hub_download("HFXM/LLM4Variants-Qwen3-4B", "model.safetensors"))
model.load_state_dict(state)