Model Card for Model ID

Fine tuned REBEL model on 1k GPT labeled documents to perform Named Entity Extraction and Relation Extraction

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: [More Information Needed]
Funded by [optional]: [More Information Needed]
Shared by [optional]: [More Information Needed]
Model type: [More Information Needed]
Language(s) (NLP): [More Information Needed]
License: [More Information Needed]
Finetuned from model [optional]: https://huggingface.co/Babelscape/rebel-large

Model Sources [optional]

Repository: [More Information Needed]
Paper [optional]: https://aclanthology.org/2021.findings-emnlp.204/
Demo [optional]: [More Information Needed]

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch

gen_kwargs = {
    "max_length": 128,
    "length_penalty": 0,
    "num_beams": 3,
    "no_repeat_ngram_size": 0,
    "early_stopping": False
}
MODEL = "text2tech/REBEL_NER_RE_001_1000docs"

tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL)
if torch.cuda.is_available():
    _ = model.to("cuda:3") # comment if no GPU available
_ = model.eval()

def extract_triplets(text, mapping_types={"<mat>": "material", "<meth>": "method", "<tsys>": "technological_system", "<org>": "organization", "<tfield>": "technical_field"}):
    triplets = []
    relation = ''
    text = text.strip()
    current = 'x'
    subject, relation, object_, object_type, subject_type = '','','','',''

    for token in text.replace("<s>", "").replace("<pad>", "").replace("</s>", "").split():
        if token == "<triplet>":
            current = 't'
            if relation != '':
                triplets.append({'head': subject.strip(), 'head_type': subject_type, 'type': relation.strip(),'tail': object_.strip(), 'tail_type': object_type})
                relation = ''
            subject = ''
        elif token in mapping_types:
            if current == 't' or current == 'o':
                current = 's'
                if relation != '':
                    triplets.append({'head': subject.strip(), 'head_type': subject_type, 'type': relation.strip(),'tail': object_.strip(), 'tail_type': object_type})
                object_ = ''
                subject_type = mapping_types[token]
            else:
                current = 'o'
                object_type = mapping_types[token]
                relation = ''
        else:
            if current == 't':
                subject += ' ' + token
            elif current == 's':
                object_ += ' ' + token
            elif current == 'o':
                relation += ' ' + token
    if subject != '' and relation != '' and object_ != '' and object_type != '' and subject_type != '':
        triplets.append({'head': subject.strip(), 'head_type': subject_type, 'type': relation.strip(),'tail': object_.strip(), 'tail_type': object_type})
    return triplets

def generate(text):
    model_inputs = tokenizer(text, max_length=1024, padding=True, truncation=True, return_tensors = 'pt')
    generated_tokens = model.generate(
        model_inputs["input_ids"].to(model.device),
        attention_mask=model_inputs["attention_mask"].to(model.device),
        **gen_kwargs,
    )
    decoded_preds = tokenizer.batch_decode(generated_tokens, skip_special_tokens=False)
    return decoded_preds[0]

text = """As much mystique as they've been given by companies with vested interest, diamonds are little more than lumps of carbon. In science applications, diamond is useful as a tough protective coating and for optical devices, but its relative rarity on Earth makes it difficult to get. Now, researchers at North Carolina State University have demonstrated a new way to convert carbon nanofibers and nanotubes into diamond fibers that can be performed in a lab more easily than existing techniques. In nature, diamonds are forged deep in the Earth, where carbon is subjected to high pressure and temperatures, so it makes sense that artificial methods of manufacturing them requires similar conditions. And the equipment involved in that process can be quite cumbersome and energy - intensive. By contrast, the new technique developed by the NCSU team can apparently be done at room temperature and normal pressure levels. First, carbon nanofibers are hit by a laser pulse lasting just 100 nanoseconds, which instantly heats the carbon to about 3,727°C (6,740°F) and melts it. Normally that heat would be enough to vaporize the carbon, which obviously is n't the desired outcome. To stop that, the team uses a substrate of sapphire, glass or plastic polymer, which restricts the heat flow enough to prevent the phase change. Then the material is quickly cooled, causing it to crystallize into diamond. This process can create diamond nanofibers for use in electronics and even quantum computers, or to seed carbon nanofibers with tiny diamonds. In the case of the latter, larger diamond structures can then be made using more traditional techniques like chemical vapor deposition. The structures created this way could end up as coatings to toughen up tools or for jewelry. The research was published in the journal Nanoscale. Source : North Carolina State UniversityView gallery - 2 images"""

output = generate(text)
encoded = extract_triplets(output)
print(encoded)

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]