elmadany commited on
Commit
2f5a95e
1 Parent(s): 135c2e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -1
README.md CHANGED
@@ -1 +1,37 @@
1
- #title-generation-AraT5-base
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AraT5-base-title-generation
2
+ <img src="https://raw.githubusercontent.com/UBC-NLP/araT5/main/AraT5_logo.jpg" alt="drawing" width="30%" height="30%" align="right"/>
3
+
4
+ We introduce the News Title Generation (NGT) task as a new task for Arabic language generation. Given an article, a title generation model needs to output a short grammatical sequence of words suited to the article content. For NGT, we create a novel dataset from an existing news dataset namely **ARGEN<sub>NTG</sub>**. We extract 120K articles along with their titles from [AraNews (Nagoudi et al., 2020)](https://arxiv.org/abs/2011.03092). We only include titles with at least three words in this dataset. We split ARGEN<sub>NTG</sub> data into 80% (93.3K), 10% (11.7K), and 10% (11.7K) for training, development, and test respectively.
5
+
6
+ We fine-tune ARGEN<sub>NTG</sub> on [**AraT5-base**](https://huggingface.co/UBC-NLP/AraT5-base), more details described in our [**AraT5: Text-to-Text Transformers for Arabic Language Understanding and Generation**](https://arxiv.org/abs/2109.12068)
7
+
8
+
9
+
10
+ # AraT5 Models Checkpoints
11
+
12
+ AraT5 Pytorch and TensorFlow checkpoints are available on the Huggingface website for direct download and use ```exclusively for research```. ```For commercial use, please contact the authors via email @ (muhammad.mageed[at]ubc[dot]ca).```
13
+
14
+ | **Model** | **Link** |
15
+ |---------|:------------------:|
16
+ | **AraT5-base** | [https://huggingface.co/UBC-NLP/AraT5-base](https://huggingface.co/UBC-NLP/AraT5-base) |
17
+ | **AraT5-msa-base** | [https://huggingface.co/UBC-NLP/AraT5-msa-base](https://huggingface.co/UBC-NLP/AraT5-msa-base) |
18
+ | **AraT5-tweet-base** | [https://huggingface.co/UBC-NLP/AraT5-tweet-base](https://huggingface.co/UBC-NLP/AraT5-tweet-base) |
19
+ | **AraT5-msa-small** | [https://huggingface.co/UBC-NLP/AraT5-msa-small](https://huggingface.co/UBC-NLP/AraT5-msa-small) |
20
+ | **AraT5-tweet-small**| [https://huggingface.co/UBC-NLP/AraT5-tweet-small](https://huggingface.co/UBC-NLP/AraT5-tweet-small) |
21
+
22
+ # BibTex
23
+
24
+ If you use our models (Arat5-base, Arat5-msa-base, Arat5-tweet-base, Arat5-msa-small, or Arat5-tweet-small ) for your scientific publication, or if you find the resources in this repository useful, please cite our paper as follows (to be updated):
25
+ ```bibtex
26
+ @inproceedings{araT5-2021,
27
+ title = "{AraT5: Text-to-Text Transformers for Arabic Language Understanding and Generation",
28
+ author = "Nagoudi, El Moatez Billah and
29
+ Elmadany, AbdelRahim and
30
+ Abdul-Mageed, Muhammad",
31
+ booktitle = "https://arxiv.org/abs/2109.12068",
32
+ month = aug,
33
+ year = "2021"}
34
+ ```
35
+
36
+ ## Acknowledgments
37
+ We gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, Canadian Foundation for Innovation, [ComputeCanada](www.computecanada.ca) and [UBC ARC-Sockeye](https://doi.org/10.14288/SOCKEYE). We also thank the [Google TensorFlow Research Cloud (TFRC)](https://www.tensorflow.org/tfrc) program for providing us with free TPU access.