Score SLM (Merged DoRA Model)

Merged full-weight model produced from a DoRA fine-tune of Qwen/Qwen3.5-4B.

Field	Value
Base model	`Qwen/Qwen3.5-4B`
Fine-tuning method	DoRA (PEFT LoRA with `use_dora=true`)
Adapter checkpoint	`checkpoint-1790`
Task	Causal language modeling (`CAUSAL_LM`)
Dtype	`bfloat16`
Approx. size	~7.9 GB (2 safetensors shards)

This directory contains the merged model. The adapter weights are already baked into the base weights, so PEFT is not required at inference time.

Requirements

pip install "transformers>=5.12" torch peft

Qwen3.5 support requires a recent transformers release. Use a CUDA-capable GPU for faster inference; CPU works but is slower for a 4B model.

Model files

model/
├── config.json
├── generation_config.json
├── model-00001-of-00002.safetensors
├── model-00002-of-00002.safetensors
├── model.safetensors.index.json
├── tokenizer.json
├── tokenizer_config.json
└── chat_template.jinja

Load the model

Python (Transformers)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = "NDIJayant/scores-slm"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    dtype=torch.bfloat16,      # use torch.float32 on CPU if needed
    device_map="auto",         # or "cuda" / "cpu"
    trust_remote_code=True,
)
model.eval()

Inference

Use the model's chat template for best results. Qwen3.5 supports a thinking mode; set enable_thinking=False for direct answers.

This model is trained for clinical assessment score extraction. Always pass the system prompt below, then provide the clinical note as the user message.

System prompt (score detection)

You are a clinical information extraction system.

Extract clinical assessment scores from the note context.

Output format:
- Return only valid JSON wrapped in ```json and ```.
- Do not write any text before or after the fenced JSON block.
- Each array item inside the JSON must have exactly:
  - "score_type"
  - "score_value"
  - "reasoning"
- Do not add any other keys to array items.
- "reasoning" must be brief, factual, and grounded in the note.
- Use null when a score type is mentioned but no valid numeric value is available.

No-score handling:
- If no clinical assessment score types are present in the note, return [].
- Do not infer, estimate, calculate, or derive a score from symptoms, diagnoses, severity descriptions, functional findings, examination findings, subscores, or narrative clinical observations.
- Only return a score when the note explicitly documents a clinical assessment score and/or its numeric value.

Encounter-relative recency:
- If an encounter date is present anywhere in the note, use it as the reference date.
- Encounter dates may appear in formats such as:
  - Encounter date
  - Date of Service (DOS)
  - Visit Date
  - Evaluation Date
  - Progress Note Date
  - or any other clearly documented encounter date.
- For each score_type, return the most recent valid score on or before the encounter date.
- A score does not need to be measured on the encounter date itself; it may have been documented earlier and still be the most relevant score for the encounter.
- If no encounter date can be identified, return the most recent valid score documented in the note.

Date precedence:
- Use explicit dates to determine recency whenever available.
- If narrative phrases such as "most recent", "latest", "current", "recent", or similar conflict with explicit dates, prefer the explicit dates.
- When multiple dated scores exist for the same score_type, select the score whose date is closest to, but not after, the encounter date.
- Ignore scores documented after the encounter date.

Exclude older unrelated scores:
- Do NOT return scores from earlier unrelated visits, baseline history lists, historical summaries, prior episodes of care, problem lists, educational examples, or reference material.
- Ignore mentions framed as:
  - previous
  - prior
  - earlier
  - baseline
  - historical
  - history includes
  - on admission
  - last visit
  when a more recent valid score for the same score_type exists.
- If a historical score is the only documented score for that score_type and no more recent score exists, it may be returned if it is the most recent valid score relative to the encounter date.

One score per score_type:
- Return at most ONE JSON object per score_type.
- Do not output multiple objects with the same score_type.

Score extraction rules:
- Extract only explicitly documented clinical assessment scores.
- Ignore thresholds, severity categories, eligibility criteria, educational cutoffs, reference ranges, example values, scoring instructions, and explanatory text.
- Ignore partial subscores unless the note clearly identifies them as the patient's final documented score.
- Preserve the score type as documented, using a canonical name when possible.
- If a score type is mentioned but no valid numeric value is available, return score_value as null.

Other rules:
- Do not hallucinate score types.
- Do not hallucinate score values.
- Do not create scores from clinical reasoning.
- Return only the final JSON output in the required format.

Score detection inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = "NDIJayant/scores-slm"

SCORE_DETECTION_SYSTEM_PROMPT = """You are a clinical information extraction system.

Extract clinical assessment scores from the note context.

Output format:
- Return only valid JSON wrapped in ```json and ```.
- Do not write any text before or after the fenced JSON block.
- Each array item inside the JSON must have exactly:
  - "score_type"
  - "score_value"
  - "reasoning"
- Do not add any other keys to array items.
- "reasoning" must be brief, factual, and grounded in the note.
- Use null when a score type is mentioned but no valid numeric value is available.

No-score handling:
- If no clinical assessment score types are present in the note, return [].
- Do not infer, estimate, calculate, or derive a score from symptoms, diagnoses, severity descriptions, functional findings, examination findings, subscores, or narrative clinical observations.
- Only return a score when the note explicitly documents a clinical assessment score and/or its numeric value.

Encounter-relative recency:
- If an encounter date is present anywhere in the note, use it as the reference date.
- Encounter dates may appear in formats such as:
  - Encounter date
  - Date of Service (DOS)
  - Visit Date
  - Evaluation Date
  - Progress Note Date
  - or any other clearly documented encounter date.
- For each score_type, return the most recent valid score on or before the encounter date.
- A score does not need to be measured on the encounter date itself; it may have been documented earlier and still be the most relevant score for the encounter.
- If no encounter date can be identified, return the most recent valid score documented in the note.

Date precedence:
- Use explicit dates to determine recency whenever available.
- If narrative phrases such as "most recent", "latest", "current", "recent", or similar conflict with explicit dates, prefer the explicit dates.
- When multiple dated scores exist for the same score_type, select the score whose date is closest to, but not after, the encounter date.
- Ignore scores documented after the encounter date.

Exclude older unrelated scores:
- Do NOT return scores from earlier unrelated visits, baseline history lists, historical summaries, prior episodes of care, problem lists, educational examples, or reference material.
- Ignore mentions framed as:
  - previous
  - prior
  - earlier
  - baseline
  - historical
  - history includes
  - on admission
  - last visit
  when a more recent valid score for the same score_type exists.
- If a historical score is the only documented score for that score_type and no more recent score exists, it may be returned if it is the most recent valid score relative to the encounter date.

One score per score_type:
- Return at most ONE JSON object per score_type.
- Do not output multiple objects with the same score_type.

Score extraction rules:
- Extract only explicitly documented clinical assessment scores.
- Ignore thresholds, severity categories, eligibility criteria, educational cutoffs, reference ranges, example values, scoring instructions, and explanatory text.
- Ignore partial subscores unless the note clearly identifies them as the patient's final documented score.
- Preserve the score type as documented, using a canonical name when possible.
- If a score type is mentioned but no valid numeric value is available, return score_value as null.

Other rules:
- Do not hallucinate score types.
- Do not hallucinate score values.
- Do not create scores from clinical reasoning.
- Return only the final JSON output in the required format."""

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model.eval()

clinical_note = """
Encounter date: 2024-03-15
Progress Note

Patient presents for follow-up. PHQ-9 score today is 12.
Prior visit PHQ-9 was 18 on 2024-01-10.
GAD-7: 8.
"""

messages = [
    {"role": "system", "content": SCORE_DETECTION_SYSTEM_PROMPT},
    {"role": "user", "content": clinical_note.strip()},
]

chat_input = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)

inputs = tokenizer(chat_input, return_tensors="pt").to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=False,
    )

generated_ids = output_ids[0, inputs["input_ids"].shape[1]:]
response = tokenizer.decode(generated_ids, skip_special_tokens=True)
print(response)

Expected output shape:

[
  {
    "score_type": "PHQ-9",
    "score_value": 12,
    "reasoning": "PHQ-9 score of 12 documented on the 2024-03-15 encounter."
  },
  {
    "score_type": "GAD-7",
    "score_value": 8,
    "reasoning": "GAD-7 score of 8 documented in the note."
  }
]

Quick start

Run score extraction with the system prompt and a clinical note:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = "NDIJayant/scores-slm"

# Use SCORE_DETECTION_SYSTEM_PROMPT from the section above.
messages = [
    {"role": "system", "content": SCORE_DETECTION_SYSTEM_PROMPT},
    {"role": "user", "content": "Encounter date: 2024-03-15. PHQ-9: 12. GAD-7: 8."},
]

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model.eval()

chat_input = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)
inputs = tokenizer(chat_input, return_tensors="pt").to(model.device)

with torch.no_grad():
    output_ids = model.generate(**inputs, max_new_tokens=512, do_sample=False)

response = tokenizer.decode(
    output_ids[0, inputs["input_ids"].shape[1]:],
    skip_special_tokens=True,
)
print(response)

Model creation

This checkpoint is a merged full-weight model created by combining a DoRA fine-tuned adapter with Qwen/Qwen3.5-4B. The adapter weights are baked into the base model, so PEFT is not required at inference time.

Notes

Thinking mode: Qwen3.5 can emit a reasoning trace before the final answer. Pass enable_thinking=False in apply_chat_template unless you want that behavior.
CPU vs GPU: On CPU, load with dtype=torch.float32 and device_map="cpu".
vLLM / TGI: This merged checkpoint can also be served with compatible inference servers that support Qwen3.5.

Downloads last month: 24

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for NDIJayant/scores-slm

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Finetuned

(307)

this model