jcblaise commited on
Commit
67b78e8
1 Parent(s): e0d87ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -22
README.md CHANGED
@@ -13,31 +13,17 @@ Tagalog ELECTRA model pretrained with a large corpus scraped from the internet.
13
 
14
  This is the generator model used to sample synthetic text and pretrain the discriminator. Only use this model for retraining and mask-filling. For the actual model for downstream tasks, please refer to the discriminator models.
15
 
16
- ## Usage
17
- The model can be loaded and used in both PyTorch and TensorFlow through the HuggingFace Transformers package.
18
-
19
- ```python
20
- from transformers import TFAutoModel, AutoModel, AutoTokenizer
21
-
22
- # TensorFlow
23
- model = TFAutoModel.from_pretrained('jcblaise/electra-tagalog-small-uncased-generator', from_pt=True)
24
- tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-small-uncased-generator', do_lower_case=False)
25
-
26
- # PyTorch
27
- model = AutoModel.from_pretrained('jcblaise/electra-tagalog-small-uncased-generator')
28
- tokenizer = AutoTokenizer.from_pretrained('jcblaise/electra-tagalog-small-uncased-generator', do_lower_case=False)
29
- ```
30
- Finetuning scripts and other utilities we use for our projects can be found in our centralized repository at https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks
31
-
32
  ## Citations
33
  All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
34
 
35
  ```
36
- @article{cruz2020investigating,
37
- title={Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation},
38
- author={Jan Christian Blaise Cruz and Jose Kristian Resabal and James Lin and Dan John Velasco and Charibeth Cheng},
39
- journal={arXiv preprint arXiv:2010.11574},
40
- year={2020}
 
 
41
  }
42
  ```
43
 
@@ -45,4 +31,4 @@ All model details and training setups can be found in our papers. If you use our
45
  Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
46
 
47
  ## Contact
48
- If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at jan_christian_cruz@dlsu.edu.ph
 
13
 
14
  This is the generator model used to sample synthetic text and pretrain the discriminator. Only use this model for retraining and mask-filling. For the actual model for downstream tasks, please refer to the discriminator models.
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ## Citations
17
  All model details and training setups can be found in our papers. If you use our model or find it useful in your projects, please cite our work:
18
 
19
  ```
20
+ @inproceedings{cruz2021exploiting,
21
+ title={Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets},
22
+ author={Cruz, Jan Christian Blaise and Resabal, Jose Kristian and Lin, James and Velasco, Dan John and Cheng, Charibeth},
23
+ booktitle={Pacific Rim International Conference on Artificial Intelligence},
24
+ pages={86--99},
25
+ year={2021},
26
+ organization={Springer}
27
  }
28
  ```
29
 
 
31
  Data used to train this model as well as other benchmark datasets in Filipino can be found in my website at https://blaisecruz.com
32
 
33
  ## Contact
34
+ If you have questions, concerns, or if you just want to chat about NLP and low-resource languages in general, you may reach me through my work email at me@blaisecruz.com