Text2Text Generation
Transformers
PyTorch
TensorFlow
Arabic
t5
Arabic T5
MSA
Twitter
Arabic Dialect
Arabic Machine Translation
Arabic Text Summarization
Arabic News Title and Question Generation
Arabic Paraphrasing and Transliteration
Arabic Code-Switched Translation
text-generation-inference
Inference Endpoints
Update README.md
Browse files
README.md
CHANGED
@@ -1 +1,37 @@
|
|
1 |
-
#title-generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# AraT5-base-title-generation
|
2 |
+
<img src="https://raw.githubusercontent.com/UBC-NLP/araT5/main/AraT5_logo.jpg" alt="drawing" width="30%" height="30%" align="right"/>
|
3 |
+
|
4 |
+
We introduce the News Title Generation (NGT) task as a new task for Arabic language generation. Given an article, a title generation model needs to output a short grammatical sequence of words suited to the article content. For NGT, we create a novel dataset from an existing news dataset namely **ARGEN<sub>NTG</sub>**. We extract 120K articles along with their titles from [AraNews (Nagoudi et al., 2020)](https://arxiv.org/abs/2011.03092). We only include titles with at least three words in this dataset. We split ARGEN<sub>NTG</sub> data into 80% (93.3K), 10% (11.7K), and 10% (11.7K) for training, development, and test respectively.
|
5 |
+
|
6 |
+
We fine-tune ARGEN<sub>NTG</sub> on [**AraT5-base**](https://huggingface.co/UBC-NLP/AraT5-base), more details described in our [**AraT5: Text-to-Text Transformers for Arabic Language Understanding and Generation**](https://arxiv.org/abs/2109.12068)
|
7 |
+
|
8 |
+
|
9 |
+
|
10 |
+
# AraT5 Models Checkpoints
|
11 |
+
|
12 |
+
AraT5 Pytorch and TensorFlow checkpoints are available on the Huggingface website for direct download and use ```exclusively for research```. ```For commercial use, please contact the authors via email @ (muhammad.mageed[at]ubc[dot]ca).```
|
13 |
+
|
14 |
+
| **Model** | **Link** |
|
15 |
+
|---------|:------------------:|
|
16 |
+
| **AraT5-base** | [https://huggingface.co/UBC-NLP/AraT5-base](https://huggingface.co/UBC-NLP/AraT5-base) |
|
17 |
+
| **AraT5-msa-base** | [https://huggingface.co/UBC-NLP/AraT5-msa-base](https://huggingface.co/UBC-NLP/AraT5-msa-base) |
|
18 |
+
| **AraT5-tweet-base** | [https://huggingface.co/UBC-NLP/AraT5-tweet-base](https://huggingface.co/UBC-NLP/AraT5-tweet-base) |
|
19 |
+
| **AraT5-msa-small** | [https://huggingface.co/UBC-NLP/AraT5-msa-small](https://huggingface.co/UBC-NLP/AraT5-msa-small) |
|
20 |
+
| **AraT5-tweet-small**| [https://huggingface.co/UBC-NLP/AraT5-tweet-small](https://huggingface.co/UBC-NLP/AraT5-tweet-small) |
|
21 |
+
|
22 |
+
# BibTex
|
23 |
+
|
24 |
+
If you use our models (Arat5-base, Arat5-msa-base, Arat5-tweet-base, Arat5-msa-small, or Arat5-tweet-small ) for your scientific publication, or if you find the resources in this repository useful, please cite our paper as follows (to be updated):
|
25 |
+
```bibtex
|
26 |
+
@inproceedings{araT5-2021,
|
27 |
+
title = "{AraT5: Text-to-Text Transformers for Arabic Language Understanding and Generation",
|
28 |
+
author = "Nagoudi, El Moatez Billah and
|
29 |
+
Elmadany, AbdelRahim and
|
30 |
+
Abdul-Mageed, Muhammad",
|
31 |
+
booktitle = "https://arxiv.org/abs/2109.12068",
|
32 |
+
month = aug,
|
33 |
+
year = "2021"}
|
34 |
+
```
|
35 |
+
|
36 |
+
## Acknowledgments
|
37 |
+
We gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada, the Social Sciences and Humanities Research Council of Canada, Canadian Foundation for Innovation, [ComputeCanada](www.computecanada.ca) and [UBC ARC-Sockeye](https://doi.org/10.14288/SOCKEYE). We also thank the [Google TensorFlow Research Cloud (TFRC)](https://www.tensorflow.org/tfrc) program for providing us with free TPU access.
|