Edit model card

billm-mistral-7b-conll03-ner

https://arxiv.org/abs/2310.01208 https://arxiv.org/abs/2311.05296

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2046
  • Precision: 0.9273
  • Recall: 0.9393
  • F1: 0.9333
  • Accuracy: 0.9864

Inference

python -m pip install -U billm==0.1.1
from transformers import AutoTokenizer, pipeline
from peft import PeftModel, PeftConfig
from billm import MistralForTokenClassification


label2id = {'O': 0, 'B-PER': 1, 'I-PER': 2, 'B-ORG': 3, 'I-ORG': 4, 'B-LOC': 5, 'I-LOC': 6, 'B-MISC': 7, 'I-MISC': 8}
id2label = {v: k for k, v in label2id.items()}
model_id = 'WhereIsAI/billm-mistral-7b-conll03-ner'
tokenizer = AutoTokenizer.from_pretrained(model_id)
peft_config = PeftConfig.from_pretrained(model_id)
model = MistralForTokenClassification.from_pretrained(
    peft_config.base_model_name_or_path,
    num_labels=len(label2id), id2label=id2label, label2id=label2id
)
model = PeftModel.from_pretrained(model, model_id)
# merge_and_unload is necessary for inference
model = model.merge_and_unload()

token_classifier = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
sentence = "I live in Hong Kong. I am a student at Hong Kong PolyU."
tokens = token_classifier(sentence)
print(tokens)

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
0.0499 1.0 1756 0.1085 0.9196 0.9287 0.9241 0.9845
0.0233 2.0 3512 0.0997 0.9249 0.9226 0.9237 0.9845
0.0097 3.0 5268 0.1343 0.9292 0.9386 0.9339 0.9870
0.0036 4.0 7024 0.1651 0.9245 0.9386 0.9315 0.9864
0.0012 5.0 8780 0.1839 0.9257 0.9373 0.9315 0.9863
0.0005 6.0 10536 0.2027 0.9258 0.9386 0.9321 0.9864
0.0002 7.0 12292 0.2022 0.9276 0.9384 0.9330 0.9864
0.0002 8.0 14048 0.2040 0.9274 0.9388 0.9331 0.9864
0.0001 9.0 15804 0.2048 0.9270 0.9393 0.9331 0.9864
0.0001 10.0 17560 0.2046 0.9273 0.9393 0.9333 0.9864

Framework versions

  • PEFT 0.9.0
  • Transformers 4.38.2
  • Pytorch 2.0.1
  • Datasets 2.16.0
  • Tokenizers 0.15.0

Citation

@inproceedings{li2024bellm,
    title = "BeLLM: Backward Dependency Enhanced Large Language Model for Sentence Embeddings",
    author = "Li, Xianming and Li, Jing",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics",
    year = "2024",
    publisher = "Association for Computational Linguistics"
}

@article{li2023label,
  title={Label supervised llama finetuning},
  author={Li, Zongxi and Li, Xianming and Liu, Yuzhang and Xie, Haoran and Li, Jing and Wang, Fu-lee and Li, Qing and Zhong, Xiaoqin},
  journal={arXiv preprint arXiv:2310.01208},
  year={2023}
}
Downloads last month
98
Unable to determine this model’s pipeline type. Check the docs .

Adapter for