him1411
/

EDGAR-BART-Base

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

EDGAR-BART-Base / README.md

him1411's picture

Update README.md

c67be52 about 1 year ago

|

No virus

2.59 kB

	---
	license: mit
	datasets:
	- him1411/EDGAR10-Q
	language:
	- en
	metrics:
	- rouge
	---
	license: mit
	language:
	- en
	tags:
	- finance
	- ContextNER
	- language models
	datasets:
	- him1411/EDGAR10-Q
	metrics:
	- rouge
	---

	EDGAR-BART-Base
	=============

	BART base model finetuned on [EDGAR10-Q dataset](https://huggingface.co/datasets/him1411/EDGAR10-Q)

	You may want to check out
	* Our paper: [CONTEXT-NER: Contextual Phrase Generation at Scale](https://arxiv.org/abs/2109.08079/)
	* GitHub: [Click Here](https://github.com/him1411/edgar10q-dataset)



	Direct Use
	=============

	It is possible to use this model to generate text, which is useful for experimentation and understanding its capabilities. It should not be directly used for production or work that may directly impact people.

	How to Use
	=============

	You can very easily load the models with Transformers, instead of downloading them manually. The [bart-base model](https://huggingface.co/facebook/bart-base) is the backbone of our model. Here is how to use the model in PyTorch:

	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	tokenizer = AutoTokenizer.from_pretrained("him1411/EDGAR-BART-Base")
	model = AutoModelForSeq2SeqLM.from_pretrained("him1411/EDGAR-BART-Base")
	```
	Or just clone the model repo
	```
	git lfs install
	git clone https://huggingface.co/him1411/EDGAR-BART-Base
	```

	Inference Example
	=============

	Here, we provide an example for the "ContextNER" task. Below is an example of one instance.

	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	tokenizer = AutoTokenizer.from_pretrained("him1411/EDGAR-BART-Base")
	model = AutoModelForSeq2SeqLM.from_pretrained("him1411/EDGAR-BART-Base")
	# Input shows how we have appended instruction from our file for HoC dataset with instance.
	input = "14.5 years . The definite lived intangible assets related to the contracts and trade names had estimated weighted average useful lives of 5.9 years and 14.5 years, respectively, at acquisition."
	tokenized_input= tokenizer(input)
	# Ideal output for this input is 'Definite lived intangible assets weighted average remaining useful life'
	output = model(tokenized_input)
	```


	BibTeX Entry and Citation Info
	===============
	If you are using our model, please cite our paper:

	```bibtex
	@article{gupta2021context,
	title={Context-NER: Contextual Phrase Generation at Scale},
	author={Gupta, Himanshu and Verma, Shreyas and Kumar, Tarun and Mishra, Swaroop and Agrawal, Tamanna and Badugu, Amogh and Bhatt, Himanshu Sharad},
	journal={arXiv preprint arXiv:2109.08079},
	year={2021}
	}
	```