UBC-NLP
/

AraT5-base-title-generation

+# AraT5-base-title-generation
+<img src="https://raw.githubusercontent.com/UBC-NLP/araT5/main/AraT5_logo.jpg" alt="drawing" width="30%" height="30%" align="right"/>
+We introduce the News Title Generation (NGT) task as a new task for Arabic language generation. Given an article, a title generation model needs to output a short grammatical sequence of words suited to the article content. For NGT, we create a novel dataset from an existing news dataset namely **ARGEN<sub>NTG</sub>**. We extract 120K articles along with their titles from [AraNews (Nagoudi et al., 2020)](https://arxiv.org/abs/2011.03092). We only include titles with at least three words in this dataset. We split ARGEN<sub>NTG</sub> data into 80% (93.3K), 10% (11.7K), and 10% (11.7K) for training, development, and test respectively.
+We fine-tune ARGEN<sub>NTG</sub> on [**AraT5-base**](https://huggingface.co/UBC-NLP/AraT5-base), more details described in our [**AraT5: Text-to-Text Transformers for Arabic Language Understanding and Generation**](https://arxiv.org/abs/2109.12068)
+# AraT5 Models Checkpoints
+AraT5 Pytorch and TensorFlow checkpoints are available on the Huggingface website for direct download and use ```exclusively for research```. ```For commercial use, please contact the authors via email @ (muhammad.mageed[at]ubc[dot]ca).```
+| **Model**   | **Link** |
+|---------|:------------------:|
+|  **AraT5-base** |     [https://huggingface.co/UBC-NLP/AraT5-base](https://huggingface.co/UBC-NLP/AraT5-base)       |
+| **AraT5-msa-base**  |     [https://huggingface.co/UBC-NLP/AraT5-msa-base](https://huggingface.co/UBC-NLP/AraT5-msa-base)     |
+| **AraT5-tweet-base**  |   [https://huggingface.co/UBC-NLP/AraT5-tweet-base](https://huggingface.co/UBC-NLP/AraT5-tweet-base)    |
+| **AraT5-msa-small** |     [https://huggingface.co/UBC-NLP/AraT5-msa-small](https://huggingface.co/UBC-NLP/AraT5-msa-small)   |
+| **AraT5-tweet-small**|    [https://huggingface.co/UBC-NLP/AraT5-tweet-small](https://huggingface.co/UBC-NLP/AraT5-tweet-small) |
+# BibTex
+If you use our models (Arat5-base, Arat5-msa-base, Arat5-tweet-base, Arat5-msa-small, or Arat5-tweet-small ) for your scientific publication, or if you find the resources in this repository useful, please cite our paper as follows (to be updated):
+```bibtex
+@inproceedings{araT5-2021,
+    title = "{AraT5: Text-to-Text Transformers for Arabic Language Understanding and Generation",
+    author = "Nagoudi, El Moatez Billah  and
+      Elmadany, AbdelRahim  and
+      Abdul-Mageed, Muhammad",
+    booktitle = "https://arxiv.org/abs/2109.12068",
+    month = aug,
+    year = "2021"}
+```
+## Acknowledgments
+We gratefully acknowledge support from the Natural Sciences and Engineering Research Council  of Canada, the  Social  Sciences and  Humanities  Research  Council  of  Canada, Canadian  Foundation for  Innovation,  [ComputeCanada](www.computecanada.ca) and [UBC ARC-Sockeye](https://doi.org/10.14288/SOCKEYE). We  also  thank  the  [Google TensorFlow Research Cloud (TFRC)](https://www.tensorflow.org/tfrc) program for providing us with free TPU access.