metadata
title: Mirror
emoji: πͺ
colorFrom: blue
colorTo: yellow
sdk: docker
pinned: true
license: apache-2.0
πͺ Mirror: A Universal Framework for Various Information Extraction Tasks
Image generated by DALLE 3
[Paper] | [Demo]
π Our paper has been accepted to EMNLP23 main conference, check it out!
π: This is the official implementation of πͺMirror which supports almost all the Information Extraction tasks.
The name, Mirror, comes from the classical story Snow White and the Seven Dwarfs, where a magic mirror knows everything in the world. We aim to build such a powerful tool for the IE community.
π₯ Supported Tasks
- Named Entity Recognition
- Entity Relationship Extraction (Triplet Extraction)
- Event Extraction
- Aspect-based Sentiment Analysis
- Multi-span Extraction (e.g. Discontinuous NER)
- N-ary Extraction (e.g. Hyper Relation Extraction)
- Extractive Machine Reading Comprehension (MRC) and Question Answering
- Classification & Multi-choice MRC
π΄ Dependencies
Python>=3.10
pip install -r requirements.txt
π QuickStart
Pretrained Model Weights & Datasets
Download the pretrained model weights & datasets from [OSF] .
No worries, it's an anonymous link just for double blind peer reviewing.
Pretraining
- Download and unzip the pretraining corpus into
resources/Mirror/v1.4_sampled_v3/merged/all_excluded
- Start to run
CUDA_VISIBLE_DEVICES=0 rex train -m src.task -dc conf/Pretrain_excluded.yaml
Fine-tuning
β οΈ Due to data license constraints, some datasets are unavailable to provide directly (e.g. ACE04, ACE05).
- Download and unzip the pretraining corpus into
resources/Mirror/v1.4_sampled_v3/merged/all_excluded
- Download and unzip the fine-tuning datasets into
resources/Mirror/uie/
- Start to fine-tuning
# UIE tasks
CUDA_VISIBLE_DEVICES=0 bash scripts/single_task_wPTAllExcluded_wInstruction/run1.sh
CUDA_VISIBLE_DEVICES=1 bash scripts/single_task_wPTAllExcluded_wInstruction/run2.sh
CUDA_VISIBLE_DEVICES=2 bash scripts/single_task_wPTAllExcluded_wInstruction/run3.sh
CUDA_VISIBLE_DEVICES=3 bash scripts/single_task_wPTAllExcluded_wInstruction/run4.sh
# Multi-span and N-ary extraction
CUDA_VISIBLE_DEVICES=4 bash scripts/single_task_wPTAllExcluded_wInstruction/run_new_tasks.sh
# GLUE datasets
CUDA_VISIBLE_DEVICES=5 bash scripts/single_task_wPTAllExcluded_wInstruction/glue.sh
Analysis Experiments
- Few-shot experiments :
scripts/run_fewshot.sh
. Collecting results:python mirror_fewshot_outputs/get_avg_results.py
- Mirror w/ PT w/o Inst. :
scripts/single_task_wPTAllExcluded_woInstruction
- Mirror w/o PT w/ Inst. :
scripts/single_task_wo_pretrain
- Mirror w/o PT w/o Inst. :
scripts/single_task_wo_pretrain_wo_instruction
Evaluation
- Change
task_dir
anddata_pairs
you want to evaluate. The default setting is to get results of Mirrordirect on all downstream tasks. CUDA_VISIBLE_DEVICES=0 python -m src.eval
Demo
- Download and unzip the pretrained task dump into
mirror_outputs/Mirror_Pretrain_AllExcluded_2
- Try our demo:
CUDA_VISIBLE_DEVICES=0 python -m src.app.api_backend
π Citation
@misc{zhu_mirror_2023,
shorttitle = {Mirror},
title = {Mirror: A Universal Framework for Various Information Extraction Tasks},
author = {Zhu, Tong and Ren, Junfei and Yu, Zijian and Wu, Mengsong and Zhang, Guoliang and Qu, Xiaoye and Chen, Wenliang and Wang, Zhefeng and Huai, Baoxing and Zhang, Min},
url = {http://arxiv.org/abs/2311.05419},
doi = {10.48550/arXiv.2311.05419},
urldate = {2023-11-10},
publisher = {arXiv},
month = nov,
year = {2023},
note = {arXiv:2311.05419 [cs]},
keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language},
}
π£οΈ Roadmap
- Convert current model into Huggingface version, supporting loading from
transformers
like other newly released LLMs. - Remove
Background
area, mergeTL
,TP
into a singleT
token - Add more task data: keyword extraction, coreference resolution, FrameNet, WikiNER, T-Rex relation extraction dataset, etc.
- Pre-train on all the data (including benchmarks) to build a nice out-of-the-box toolkit for universal IE.
π Yours sincerely
This project is licensed under Apache-2.0. We hope you enjoy it ~
Mirror Team w/ π