Instructions to use dams2005/gemma-3-4b-it-triplet-extractor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dams2005/gemma-3-4b-it-triplet-extractor with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("google/gemma-3-4b-it")
model = PeftModel.from_pretrained(base_model, "dams2005/gemma-3-4b-it-triplet-extractor")

Transformers

How to use dams2005/gemma-3-4b-it-triplet-extractor with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dams2005/gemma-3-4b-it-triplet-extractor")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("dams2005/gemma-3-4b-it-triplet-extractor", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use dams2005/gemma-3-4b-it-triplet-extractor with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dams2005/gemma-3-4b-it-triplet-extractor"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dams2005/gemma-3-4b-it-triplet-extractor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dams2005/gemma-3-4b-it-triplet-extractor

SGLang

How to use dams2005/gemma-3-4b-it-triplet-extractor with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dams2005/gemma-3-4b-it-triplet-extractor" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dams2005/gemma-3-4b-it-triplet-extractor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dams2005/gemma-3-4b-it-triplet-extractor" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dams2005/gemma-3-4b-it-triplet-extractor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use dams2005/gemma-3-4b-it-triplet-extractor with Docker Model Runner:
```
docker model run hf.co/dams2005/gemma-3-4b-it-triplet-extractor
```

Gemma 3 4B IT Triplet Extractor LoRA

This repository contains a google/gemma-3-4b-it adapter tuned for paragraph-level factual triplet extraction.

The release is designed to turn short passages into strict JSON knowledge triples, with optional qualifiers when the source text clearly provides useful context such as dates, locations, roles, scores, or units.

Highlights

Built on top of google/gemma-3-4b-it
Trained as a lightweight LoRA adapter with peft
Tuned for strict JSON (subject, predicate, object) extraction
Includes quantized inference variants in the gptq-4bit/ and gptq-8bit/ subfolders of this repository

What This Release Is

This is the strongest supervised checkpoint produced in the project's final large-data pass:

base model: google/gemma-3-4b-it
adaptation method: LoRA via peft
task: paragraph-level factual triplet extraction
release checkpoint: gemma3_4b_it_qlora_triplets_20260531_doubled_from_continued2

This release was initialized from a smaller earlier triplet-extraction adapter, then continued on a substantially larger matched paragraph-to-triples corpus.

Repository Layout

This repository is organized as a single release point:

root: LoRA adapter files
gptq-4bit/: standalone 4-bit GPTQ export
gptq-8bit/: standalone 8-bit GPTQ export

If you want the smallest deployment artifact, use one of the GPTQ folders. If you want the most flexible setup for continued experimentation, use the adapter at the repository root.

Target Output Format

The intended output is a single JSON object shaped like:

{
  "triples": [
    {
      "subject": "...",
      "predicate": "...",
      "object": "...",
      "qualifiers": {}
    }
  ]
}

The training prompts asked the model to generate triples from short passages using slightly varied extraction instructions rather than one frozen instruction string.

In practice, the model is most comfortable when asked to extract grounded factual triples from a single paragraph or short chunk at a time.

Training Data

The final continuation run behind this release used a project-specific matched text-to-triples corpus with:

4577 total examples
4119 training examples
458 validation examples

Data sources were preprocessed Wikipedia-like and Dolma-like chunks already assembled in the project, with the largest final data expansion coming from a deduplicated new Dolma batch to avoid simply recycling previously used documents.

Labels were teacher-generated structured triples from the project's extraction pipeline. The labels are useful, but they should still be thought of as model-generated supervision rather than human gold annotations.

Training Recipe

base model: google/gemma-3-4b-it
fine-tuning style: standard LoRA, not QLoRA
rank r: 16
alpha: 32
dropout: 0.05
target modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
learning rate for this continuation: 5e-5
epochs in the final large-data continuation: 1

Only a small fraction of parameters are trainable relative to the base model, which keeps the adapter practical to store and reuse.

Validation Signal

For this exact release checkpoint, the directly measured validation metric available from training is:

validation loss: 0.3998

This is a token-level imitation metric on the held-out validation split, not the same thing as extraction F1.

For historical context, an earlier smaller-data checkpoint in the same project improved strict held-out micro exact SPO F1 from roughly 0.056 on base Gemma to roughly 0.111 after fine-tuning. That earlier exact-match benchmark is useful context for the project direction, but it is not claimed as the exact benchmark number for this final larger-data checkpoint.

Recommended Use Pattern

This release works best when:

the input is already chunked to paragraph scale
the prompt asks for JSON and nothing else
the downstream consumer can tolerate some redundancy or lightly post-process duplicates

It is a good fit for:

knowledge graph bootstrapping
extraction demos
structured IE baselines
small open-model comparison work

Intended Use

This adapter is intended for:

knowledge graph bootstrapping experiments
structured information extraction demos
paragraph-level relation extraction research
testing how well a compact open model can be adapted for JSON factual extraction

It is especially useful when you want a lightweight adapter release rather than a fully merged full-precision checkpoint.

Limitations

This model is a research artifact and still has important failure modes:

it can duplicate facts
it can miss qualifiers
it can compress nuanced facts into flatter triples
it can sometimes overfit to extraction framing
strict exact-match evaluation can underrate semantically reasonable paraphrases

Because the supervision comes from a teacher model pipeline, some annotation artifacts and teacher biases may also be reflected here.

Usage

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_id = "google/gemma-3-4b-it"
adapter_id = "dams2005/gemma-3-4b-it-triplet-extractor-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

Suggested system prompt:

You extract factual knowledge triples from source text.
Return exactly one JSON object with key 'triples'.
Each triple must have string fields 'subject', 'predicate', and 'object'.
You may include a 'qualifiers' object for useful context like dates, locations, scores, roles, conditions, or event names.
Only include facts explicitly grounded in the text.
Do not include commentary before or after the JSON.

Example user request:

Extract up to 20 factual (subject, predicate, object) triples from the text below.
Use strict JSON in this shape:
{"triples":[{"subject":"...","predicate":"...","object":"...","qualifiers":{}}]}

Title: Marie Curie

Text:
Marie Curie discovered polonium and radium. She was born in Warsaw. She won the Nobel Prize in Physics in 1903 and the Nobel Prize in Chemistry in 1911.

Expected style of output:

{
  "triples": [
    {
      "subject": "Marie Curie",
      "predicate": "discovered",
      "object": "polonium",
      "qualifiers": {}
    },
    {
      "subject": "Marie Curie",
      "predicate": "won",
      "object": "Nobel Prize in Physics",
      "qualifiers": {
        "year": "1903"
      }
    }
  ]
}

Quantized Variants

This repository also contains two standalone GPTQ exports:

gptq-4bit/: smallest deployment option
gptq-8bit/: less aggressive compression, still much smaller than the merged bf16 checkpoint

Those folders are intended for inference and distribution convenience. The root adapter remains the best entry point if you want to keep working in the standard transformers + peft workflow.

Bias and Risk Notes

This release inherits biases and error patterns from:

the Gemma 3 base model
the teacher-generated labels
project-specific chunking and preprocessing

It should be treated as a helpful extractor for research and prototyping, not as a ground-truth fact engine for high-stakes decisions.

Downloads last month: 40

Model tree for dams2005/gemma-3-4b-it-triplet-extractor

Base model

google/gemma-3-4b-pt

Finetuned

google/gemma-3-4b-it

Adapter

(381)

this model