A Text-to-Triple Model

Base Model: Flan-T5-Large by Google

Base Dataset: WikiOFGraph (containing 5.85M pairs of high-quality text-triples)

Wandb Training Report (Dec 12, 2024)

News

[2025.02.05] Our new model trained based on LLaMA-3.2-3B is released! Use it if fast inference / long-input generation quality is desired.

Example Input:

"William Gerald Standridge (November 27, 1953 – April 12, 2014) was an American stock car racing driver. He was a competitor in the NASCAR Winston Cup Series and Busch Series."

Output:

How to Run?

from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

def generate_triples(input_text: str, model_path: str = "pat-jj/text2triple-flan-t5"):
    # Initialize tokenizer and model
    tokenizer = T5Tokenizer.from_pretrained(model_path)
    model = T5ForConditionalGeneration.from_pretrained(
        model_path,
        device_map="auto",
        torch_dtype=torch.bfloat16  # Use bfloat16 for efficiency
    )
    
    # Tokenize input with proper padding and attention mask
    inputs = tokenizer(
        input_text,
        max_length=512,
        padding='max_length',
        truncation=True,
        return_tensors="pt"
    )
    
    # Move inputs to the same device as model
    input_ids = inputs['input_ids'].to(model.device)
    attention_mask = inputs['attention_mask'].to(model.device)

    # Generate with better parameters
    with torch.no_grad():
        outputs = model.generate(
            input_ids=input_ids,
            attention_mask=attention_mask,
            max_length=512,
            num_beams=4,  # Use beam search
            early_stopping=True,
            length_penalty=0.6,  # Penalize very long outputs
            use_cache=True  # Use KV cache for faster generation
        )
    
    # Decode and return the generated triples
    generated_triples = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_triples

Example usage

input_text = """Albert Einstein was born in Ulm, Germany in 1879. He developed the theory of relativity and won the Nobel Prize in Physics in 1921.
Einstein worked as a professor at Princeton University until his death in 1955."""

generated_triples = generate_triples(input_text)
print("Generated triples:", generated_triples)

Output:

Generated triples: (S> Albert einstein| P> Birth place| O> Ulm, germany), (S> Albert einstein| P> Birth year| O> 1879), (S> Albert einstein| P> Award| O> Nobel prize in physics), (S> Albert einstein| P> Death year| O> 1955), (S> Albert einstein| P> Occupation| O> Professor), (S> Albert einstein| P> Workplace| O> Princeton university)

Paper of WikiOfGraph dataset:

Daehee Kim et al., "Ontology-Free General-Domain Knowledge Graph-to-Text Generation Dataset Synthesis using Large Language Model", 2024.

Cite Our Paper

@misc{jiang2025rasretrievalandstructuringknowledgeintensivellm,
      title={RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation}, 
      author={Pengcheng Jiang and Lang Cao and Ruike Zhu and Minhao Jiang and Yunyi Zhang and Jimeng Sun and Jiawei Han},
      year={2025},
      eprint={2502.10996},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.10996}, 
}

pat-jj
/

text2triple-flan-t5

A Text-to-Triple Model

News

Example Input:

Output:

How to Run?

Example usage

Output:

Paper of WikiOfGraph dataset:

Cite Our Paper

Model tree for pat-jj/text2triple-flan-t5