--- license: mit datasets: - him1411/EDGAR10-Q language: - en metrics: - rouge --- license: mit language: - en tags: - finance - ContextNER - language models datasets: - him1411/EDGAR10-Q metrics: - rouge --- EDGAR-flan-t5-base ============= Flan T5 base model finetuned on [EDGAR10-Q dataset](https://huggingface.co/datasets/him1411/EDGAR10-Q) You may want to check out * Our paper: [CONTEXT-NER: Contextual Phrase Generation at Scale](https://arxiv.org/abs/2109.08079/) * GitHub: [Click Here](https://github.com/him1411/edgar10q-dataset) Direct Use ============= It is possible to use this model to generate text, which is useful for experimentation and understanding its capabilities. **It should not be directly used for production or work that may directly impact people.** How to Use ============= You can very easily load the models with Transformers, instead of downloading them manually. The [flan-t5-base model](https://huggingface.co/google/flan-t5-base) is the backbone of our model. Here is how to use the model in PyTorch: ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("him1411/EDGAR-flan-t5-base") model = AutoModelForSeq2SeqLM.from_pretrained("him1411/EDGAR-flan-t5-base") ``` Or just clone the model repo ``` git lfs install git clone https://huggingface.co/him1411/EDGAR-flan-t5-base ``` Inference Example ============= Here, we provide an example for the "ContextNER" task. Below is an example of one instance. ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("him1411/EDGAR-flan-t5-base") model = AutoModelForSeq2SeqLM.from_pretrained("him1411/EDGAR-flan-t5-base") # Input shows how we have appended instruction from our file for HoC dataset with instance. input = "14.5 years . The definite lived intangible assets related to the contracts and trade names had estimated weighted average useful lives of 5.9 years and 14.5 years, respectively, at acquisition." tokenized_input= tokenizer(input) # Ideal output for this input is 'Definite lived intangible assets weighted average remaining useful life' output = model(tokenized_input) ``` BibTeX Entry and Citation Info =============== If you are using our model, please cite our paper: ```bibtex @article{gupta2021context, title={Context-NER: Contextual Phrase Generation at Scale}, author={Gupta, Himanshu and Verma, Shreyas and Kumar, Tarun and Mishra, Swaroop and Agrawal, Tamanna and Badugu, Amogh and Bhatt, Himanshu Sharad}, journal={arXiv preprint arXiv:2109.08079}, year={2021} } ```