README.md · ganchengguang/OIELLM-8B-Instruction at main

metadata

license: cc-by-nc-4.0
language:
  - en
  - ja
  - zh
metrics:
  - f1
tags:
  - Information Extraction
  - NER

This is a model of paper. Base in LLaMA3-8B-Instruction. Meta https://huggingface.co/meta-llama/Meta-Llama-3-8B

Please must use following format to use OIELLM. And extraction information from input text or sentence.

The OIELLM support 3 languages (English, Chinese and Japanese). And you must use task instruct words to define kind of task.

The following is input and output format: { "input": "In 1953, filming of "On the Waterfront" starring Marlon Brando began, and Kazan struggled with Spiegel's persistent budget cuts and managed to complete the film, which was released the following year in 1954 and became a huge hit with support from the laborer class./NER", "output": "Literature/NER/:Person;Marlon Brando:Product Name;On the Waterfront:Person;Kazan:Person;Spiegel" }

The from_pretrain class is use AutoTokenizer and AutoModelForCausalLM.

If you have any question. You can leave the words in this commutiy. Or contact me from paper's E-mail directly.

Let me conclude by thanking the contributors to the MMM dataset for contributing the fundamental dataset. And the pioneering researchers who selflessly contributed.

1. Japanese Wikipedia NER dataset Takahiro Omi https://github.com/stockmarkteam/ner-wikipedia-dataset

2. JGLUE: Japanese General Language Understanding Evaluation Kentaro Kurihara, Daisuke Kawahara, Tomohide Shibata https://github.com/yahoojapan/JGLUE?tab=readme-ov-file

3. livedoor news corpus 関口宏司 https://www.rondhuit.com/download.html

4. UniversalNER Wenxuan Zhou https://arxiv.org/abs/2308.03279

Paper address and cite information: https://arxiv.org/abs/2407.10953

@misc{gan2024mmmmultilingualmutualreinforcement, title={MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models}, author={Chengguang Gan and Qingyu Yin and Xinyang He and Hanjun Wei and Yunhao Liang and Younghun Lim and Shijian Wang and Hexiang Huang and Qinghao Zhang and Shiwen Ni and Tatsunori Mori}, year={2024}, eprint={2407.10953}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2407.10953}, }