This is a version of flan-t5-xl fine-tuned on the KELM Corpus to take in sentences and output triplets of the form subject-relation-object to be used for knowledge graph generation.

The model uses custom tokens to delimit triplets:

special_tokens = ['<triplet>', '</triplet>', '<relation>', '<object>']
tokenizer.add_tokens(special_tokens)

You can use it like this:

model = model.to(device)
model.eval()

new_input = "Hugging Face, Inc. is an American company that develops tools for building applications using machine learning.",
inputs = tokenizer(new_input, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"))
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=False)[0])

Output: <pad><triplet> Hugging Face <relation> instance of <object> Business </triplet></s>

This model still isn't perfect, and may make mistakes! I'm working on fine-tuning it for longer and on a more diverse set of data.

Downloads last month
17
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text2text-generation models for peft library.

Model tree for bew/t5_sentence_to_triplet_xl

Base model

google/flan-t5-xl
Adapter
(39)
this model

Dataset used to train bew/t5_sentence_to_triplet_xl