Overview

The model is a LoRa Adaptor based on LLaMA-3-8B-Instruct. The model has been trained on a re-annotated version of the CaRB dataset.

The model produces multi-valent Open IE tuples, i.e. relations with various numbers of arguments (1, 2, or more). We provide an example below:

Consider the following sentence (taken from the CaRB dev set):

Earlier this year , President Bush made a final `` take - it - or - leave it '' offer on the minimum wage

Our model would extract the following relation from the sentence:

<President Bush, made, a final "take-it-or-leave-it" offer, on the minimum wage, earlier this year>

where we include President Bush as the subject, made as the object, a final "take-it-or-leave-it" offer as thedirect object, and on the minimum wage and earlier this year> as salient complements.

We briefly describe how to use our model in the below, and provide further details in our MulVOIEL repository on Github

Getting Started

Model Output Format

Given a sentence, the model produces textual predictions in the following format:

<subj> ,, (<auxi> ###) <predicate> ,, (<prep1> ###) <obj1>, (<prep2> ###) <obj2>, ...

How to Use

  1. Install the relevant libraries as well as the MulVOIEL package:

    pip install transformers datasets peft torch
    git clone https://github.com/Teddy-Li/MulVOIEL
    cd MulVOIEL
    
  2. Load the model and perform inference (example):

    from transformers import AutoModelForCausalLM, AutoTokenizer
    from peft import PeftModel
    import torch
    from llamaOIE import parse_outstr_to_triples
    from llamaOIE_dataset import prepare_input
    
    base_model_name = "meta-llama/Meta-Llama-3-8B-Instruct"
    peft_adapter_name = "Teddy487/LLaMA3-8b-for-OpenIE"
    
    model = AutoModelForCausalLM.from_pretrained(base_model_name)
    model = PeftModel.from_pretrained(model, peft_adapter_name)
    tokenizer = AutoTokenizer.from_pretrained(base_model_name)
    
    input_text = "Earlier this year , President Bush made a final `` take - it - or - leave it '' offer on the minimum wage"
    input_text, _ = prepare_input({'s': input_text}, tokenizer, has_labels=False)
    
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids
    
    outputs = model.generate(input_ids)
    outstr = tokenizer.decode(outputs[0][len(input_ids):], skip_special_tokens=True)
    triples = parse_outstr_to_triples(outstr)
    
    for tpl in triples:
        print(tpl)
    

    🍺

Model Performance

The primary benefit of our model is the ability to extract finer-grained information for predicates. On the other hand, we also report performance on a roughly comparable basis with prior SOTA open IE models, where our method is comparable and even superior to prior models, while producing finer-grained and more complex outputs. We report evaluation results in (macro) F-1 metric, as well as in the average Levenshtein Distance between gold and predicted relations:

Model Levenshtein Distance Macro F-1
LoRA LLaMA2-7b 5.85 50.2
LoRA LLaMA3-8b 5.04 55.3
RNN OIE * - 49.0
IMOJIE * - 53.5
Open IE 6 * - 54.0/52.7

Note that the precision and recall values are not directly comparable, because we evaluate the model prediction at a finer granularity, and we use different train/dev/test arrangements as the original CaRB dataset, hence the asterisk.

Framework versions

  • PEFT 0.10.0

  • PEFT 0.5.0

Downloads last month
17
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for Teddy487/LLaMA3-8b-for-OpenIE

Adapter
(756)
this model