Instructions to use dams2005/gemma-3-4b-it-triplet-extractor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use dams2005/gemma-3-4b-it-triplet-extractor with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-3-4b-it") model = PeftModel.from_pretrained(base_model, "dams2005/gemma-3-4b-it-triplet-extractor") - Transformers
How to use dams2005/gemma-3-4b-it-triplet-extractor with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="dams2005/gemma-3-4b-it-triplet-extractor") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("dams2005/gemma-3-4b-it-triplet-extractor", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use dams2005/gemma-3-4b-it-triplet-extractor with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dams2005/gemma-3-4b-it-triplet-extractor" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dams2005/gemma-3-4b-it-triplet-extractor", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dams2005/gemma-3-4b-it-triplet-extractor
- SGLang
How to use dams2005/gemma-3-4b-it-triplet-extractor with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dams2005/gemma-3-4b-it-triplet-extractor" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dams2005/gemma-3-4b-it-triplet-extractor", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dams2005/gemma-3-4b-it-triplet-extractor" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dams2005/gemma-3-4b-it-triplet-extractor", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use dams2005/gemma-3-4b-it-triplet-extractor with Docker Model Runner:
docker model run hf.co/dams2005/gemma-3-4b-it-triplet-extractor
Gemma 3 4B IT Triplet Extractor LoRA
This repository contains a google/gemma-3-4b-it adapter tuned for paragraph-level factual triplet extraction.
The release is designed to turn short passages into strict JSON knowledge triples, with optional qualifiers when the source text clearly provides useful context such as dates, locations, roles, scores, or units.
Highlights
- Built on top of
google/gemma-3-4b-it - Trained as a lightweight LoRA adapter with
peft - Tuned for strict JSON
(subject, predicate, object)extraction - Includes quantized inference variants in the
gptq-4bit/andgptq-8bit/subfolders of this repository
What This Release Is
This is the strongest supervised checkpoint produced in the project's final large-data pass:
- base model:
google/gemma-3-4b-it - adaptation method: LoRA via
peft - task: paragraph-level factual triplet extraction
- release checkpoint:
gemma3_4b_it_qlora_triplets_20260531_doubled_from_continued2
This release was initialized from a smaller earlier triplet-extraction adapter, then continued on a substantially larger matched paragraph-to-triples corpus.
Repository Layout
This repository is organized as a single release point:
- root: LoRA adapter files
gptq-4bit/: standalone 4-bit GPTQ exportgptq-8bit/: standalone 8-bit GPTQ export
If you want the smallest deployment artifact, use one of the GPTQ folders. If you want the most flexible setup for continued experimentation, use the adapter at the repository root.
Target Output Format
The intended output is a single JSON object shaped like:
{
"triples": [
{
"subject": "...",
"predicate": "...",
"object": "...",
"qualifiers": {}
}
]
}
The training prompts asked the model to generate triples from short passages using slightly varied extraction instructions rather than one frozen instruction string.
In practice, the model is most comfortable when asked to extract grounded factual triples from a single paragraph or short chunk at a time.
Training Data
The final continuation run behind this release used a project-specific matched text-to-triples corpus with:
4577total examples4119training examples458validation examples
Data sources were preprocessed Wikipedia-like and Dolma-like chunks already assembled in the project, with the largest final data expansion coming from a deduplicated new Dolma batch to avoid simply recycling previously used documents.
Labels were teacher-generated structured triples from the project's extraction pipeline. The labels are useful, but they should still be thought of as model-generated supervision rather than human gold annotations.
Training Recipe
- base model:
google/gemma-3-4b-it - fine-tuning style: standard LoRA, not QLoRA
- rank
r:16 - alpha:
32 - dropout:
0.05 - target modules:
q_projk_projv_projo_projgate_projup_projdown_proj
- learning rate for this continuation:
5e-5 - epochs in the final large-data continuation:
1
Only a small fraction of parameters are trainable relative to the base model, which keeps the adapter practical to store and reuse.
Validation Signal
For this exact release checkpoint, the directly measured validation metric available from training is:
- validation loss:
0.3998
This is a token-level imitation metric on the held-out validation split, not the same thing as extraction F1.
For historical context, an earlier smaller-data checkpoint in the same project improved strict held-out micro exact SPO F1 from roughly 0.056 on base Gemma to roughly 0.111 after fine-tuning. That earlier exact-match benchmark is useful context for the project direction, but it is not claimed as the exact benchmark number for this final larger-data checkpoint.
Recommended Use Pattern
This release works best when:
- the input is already chunked to paragraph scale
- the prompt asks for JSON and nothing else
- the downstream consumer can tolerate some redundancy or lightly post-process duplicates
It is a good fit for:
- knowledge graph bootstrapping
- extraction demos
- structured IE baselines
- small open-model comparison work
Intended Use
This adapter is intended for:
- knowledge graph bootstrapping experiments
- structured information extraction demos
- paragraph-level relation extraction research
- testing how well a compact open model can be adapted for JSON factual extraction
It is especially useful when you want a lightweight adapter release rather than a fully merged full-precision checkpoint.
Limitations
This model is a research artifact and still has important failure modes:
- it can duplicate facts
- it can miss qualifiers
- it can compress nuanced facts into flatter triples
- it can sometimes overfit to extraction framing
- strict exact-match evaluation can underrate semantically reasonable paraphrases
Because the supervision comes from a teacher model pipeline, some annotation artifacts and teacher biases may also be reflected here.
Usage
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_id = "google/gemma-3-4b-it"
adapter_id = "dams2005/gemma-3-4b-it-triplet-extractor-lora"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
if tokenizer.pad_token_id is None:
tokenizer.pad_token_id = tokenizer.eos_token_id
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()
Suggested system prompt:
You extract factual knowledge triples from source text.
Return exactly one JSON object with key 'triples'.
Each triple must have string fields 'subject', 'predicate', and 'object'.
You may include a 'qualifiers' object for useful context like dates, locations, scores, roles, conditions, or event names.
Only include facts explicitly grounded in the text.
Do not include commentary before or after the JSON.
Example user request:
Extract up to 20 factual (subject, predicate, object) triples from the text below.
Use strict JSON in this shape:
{"triples":[{"subject":"...","predicate":"...","object":"...","qualifiers":{}}]}
Title: Marie Curie
Text:
Marie Curie discovered polonium and radium. She was born in Warsaw. She won the Nobel Prize in Physics in 1903 and the Nobel Prize in Chemistry in 1911.
Expected style of output:
{
"triples": [
{
"subject": "Marie Curie",
"predicate": "discovered",
"object": "polonium",
"qualifiers": {}
},
{
"subject": "Marie Curie",
"predicate": "won",
"object": "Nobel Prize in Physics",
"qualifiers": {
"year": "1903"
}
}
]
}
Quantized Variants
This repository also contains two standalone GPTQ exports:
gptq-4bit/: smallest deployment optiongptq-8bit/: less aggressive compression, still much smaller than the merged bf16 checkpoint
Those folders are intended for inference and distribution convenience. The root adapter remains the best entry point if you want to keep working in the standard transformers + peft workflow.
Bias and Risk Notes
This release inherits biases and error patterns from:
- the Gemma 3 base model
- the teacher-generated labels
- project-specific chunking and preprocessing
It should be treated as a helpful extractor for research and prototyping, not as a ground-truth fact engine for high-stakes decisions.
- Downloads last month
- 40