Phi-2 model fine-tuned for named entity recognition task

The model was fine-tuned using one quarter of the ConLL 2012 OntoNotes v5 dataset.

The prompts and expected outputs were constructed as described in [1].

Example input:

Instruct: I am an excelent linquist. The task is to label organization entities in the given sentence. Below are some examples

Input: A spokesman for B. A. T said of the amended filings that,`` It would appear that nothing substantive has changed.
Output: A spokesman for @@B. A. T## said of the amended filings that,`` It would appear that nothing substantive has changed.

Input: Since NBC's interest in the Qintex bid for MGM / UA was disclosed, Mr. Wright has n't been available for comment.
Output: Since @@NBC##'s interest in the @@Qintex## bid for @@MGM / UA## was disclosed, Mr. Wright has n't been available for comment.

Input: You know news organizations demand total transparency whether you're General Motors or United States government /.
Output: You know news organizations demand total transparency whether you're @@General Motors## or United States government /.

Input: We respectfully invite you to watch a special edition of Across China.
Output:

Expected output:

We respectfully invite you to watch a special edition of @@Across China##.

This model is trained to recognize the named entity categories

  • person
  • nationalities or religious or political groups
  • facility
  • organization
  • geopolitical entity
  • location
  • product
  • date
  • time expression
  • percentage
  • monetary value
  • quantity
  • event
  • work of art
  • law/legal reference
  • language name

Model Trained Using AutoTrain

This model was trained using SFT AutoTrain trainer. For more information, please visit AutoTrain.

Hyperparameters:

{
    "model": "microsoft/phi-2",
    "valid_split": null,
    "add_eos_token": false,
    "block_size": 1024,
    "model_max_length": 1024,
    "padding": "right",
    "trainer": "sft",
    "use_flash_attention_2": false,
    "disable_gradient_checkpointing": false,
    "evaluation_strategy": "epoch",
    "save_total_limit": 1,
    "save_strategy": "epoch",
    "auto_find_batch_size": false,
    "mixed_precision": "bf16",
    "lr": 0.0002,
    "epochs": 1,
    "batch_size": 1,
    "warmup_ratio": 0.1,
    "gradient_accumulation": 4,
    "optimizer": "adamw_torch",
    "scheduler": "linear",
    "weight_decay": 0.01,
    "max_grad_norm": 1.0,
    "seed": 42,
    "apply_chat_template": false,
    "quantization": "int4",
    "target_modules": null,
    "merge_adapter": false,
    "peft": true,
    "lora_r": 16,
    "lora_alpha": 32,
    "lora_dropout": 0.05,
    "dpo_beta": 0.1,
}

Usage


from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "pahautelman/phi2-ner-v1"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path
).eval()

prompt = 'Label the person entities in the given sentence: Russian President Vladimir Putin is due to arrive in Havana a few hours from now to become the first post-Soviet leader to visit Cuba.'

inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors='pt')
outputs = model.generate(
    inputs.to(model.device),
    max_new_tokens=9,
    do_sample=False,
)
output = tokenizer.batch_decode(outputs)[0]

# Model response: "Output: Russian President, Vladimir Putin"
print(output)

References:

[1] Wang et al., GPT-NER: Named entity recognition via large language models 2023

Downloads last month
20
Safetensors
Model size
1.56B params
Tensor type
F32
·
FP16
·
U8
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train pahautelman/phi2-ner-v1