Update README.md
Browse files
README.md
CHANGED
@@ -13,31 +13,17 @@ Tagalog ELECTRA model pretrained with a large corpus scraped from the internet.
|
|
13 |
|
14 |
This is the generator model used to sample synthetic text and pretrain the discriminator. Only use this model for retraining and mask-filling. For the actual model for downstream tasks, please refer to the discriminator models.
|
15 |
|
16 |
-
## Usage
|
17 |
-
The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
|
18 |
-
|
19 |
-
```python
|
20 |
-
from transformers import TFAutoModel, AutoModel, AutoTokenizer
|
21 |
-
|
22 |
-
# TensorFlow
|
23 |
-
model = TFAutoModel.from_pretrained('jcblaise/electra-tagalog-small-uncased-generator', from_pt=True)
|
24 |
-
tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-small-uncased-generator', do_lower_case=False)
|
25 |
-
|
26 |
-
# PyTorch
|
27 |
-
model = AutoModel.from_pretrained('jcblaise/electra-tagalog-small-uncased-generator')
|
28 |
-
tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-small-uncased-generator', do_lower_case=False)
|
29 |
-
```
|
30 |
-
Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
|
31 |
-
|
32 |
## Citations
|
33 |
All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
|
34 |
|
35 |
```
|
36 |
-
@
|
37 |
-
title={
|
38 |
-
author={Jan Christian Blaise
|
39 |
-
|
40 |
-
|
|
|
|
|
41 |
}
|
42 |
```
|
43 |
|
@@ -45,4 +31,4 @@ All model details and training setups can be found in our papers. If you use our
|
|
45 |
Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
|
46 |
|
47 |
## Contact
|
48 |
-
If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at
|
|
|
13 |
|
14 |
This is the generator model used to sample synthetic text and pretrain the discriminator. Only use this model for retraining and mask-filling. For the actual model for downstream tasks, please refer to the discriminator models.
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
## Citations
|
17 |
All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
|
18 |
|
19 |
```
|
20 |
+
@inproceedings{cruz2021exploiting,
|
21 |
+
title={Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets},
|
22 |
+
author={Cruz, Jan Christian Blaise and Resabal, Jose Kristian and Lin, James and Velasco, Dan John and Cheng, Charibeth},
|
23 |
+
booktitle={Pacific Rim International Conference on Artificial Intelligence},
|
24 |
+
pages={86--99},
|
25 |
+
year={2021},
|
26 |
+
organization={Springer}
|
27 |
}
|
28 |
```
|
29 |
|
|
|
31 |
Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
|
32 |
|
33 |
## Contact
|
34 |
+
If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at me@blaisecruz.com
|