EDGAR-flan-t5-base / README.md
him1411's picture
Update README.md
7c51c10
|
raw
history blame
2.62 kB
---
license: mit
datasets:
- him1411/EDGAR10-Q
language:
- en
metrics:
- rouge
---
license: mit
language:
- en
tags:
- finance
- ContextNER
- language models
datasets:
- him1411/EDGAR10-Q
metrics:
- rouge
---
EDGAR-flan-t5-base
=============
Flan T5 base model finetuned on [EDGAR10-Q dataset](https://huggingface.co/datasets/him1411/EDGAR10-Q)
You may want to check out
* Our paper: [CONTEXT-NER: Contextual Phrase Generation at Scale](https://arxiv.org/abs/2109.08079/)
* GitHub: [Click Here](https://github.com/him1411/edgar10q-dataset)
Direct Use
=============
It is possible to use this model to generate text, which is useful for experimentation and understanding its capabilities. **It should not be directly used for production or work that may directly impact people.**
How to Use
=============
You can very easily load the models with Transformers, instead of downloading them manually. The [flan-t5-base model](https://huggingface.co/google/flan-t5-base) is the backbone of our model. Here is how to use the model in PyTorch:
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("him1411/EDGAR-flan-t5-base")
model = AutoModelForSeq2SeqLM.from_pretrained("him1411/EDGAR-flan-t5-base")
```
Or just clone the model repo
```
git lfs install
git clone https://huggingface.co/him1411/EDGAR-flan-t5-base
```
Inference Example
=============
Here, we provide an example for the "ContextNER" task. Below is an example of one instance.
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("him1411/EDGAR-flan-t5-base")
model = AutoModelForSeq2SeqLM.from_pretrained("him1411/EDGAR-flan-t5-base")
# Input shows how we have appended instruction from our file for HoC dataset with instance.
input = "14.5 years . The definite lived intangible assets related to the contracts and trade names had estimated weighted average useful lives of 5.9 years and 14.5 years, respectively, at acquisition."
tokenized_input= tokenizer(input)
# Ideal output for this input is 'Definite lived intangible assets weighted average remaining useful life'
output = model(tokenized_input)
```
BibTeX Entry and Citation Info
===============
If you are using our model, please cite our paper:
```bibtex
@article{gupta2021context,
title={Context-NER: Contextual Phrase Generation at Scale},
author={Gupta, Himanshu and Verma, Shreyas and Kumar, Tarun and Mishra, Swaroop and Agrawal, Tamanna and Badugu, Amogh and Bhatt, Himanshu Sharad},
journal={arXiv preprint arXiv:2109.08079},
year={2021}
}
```