fathan commited on
Commit
c485c41
·
1 Parent(s): 3147e5f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -16,9 +16,9 @@ widget:
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
19
- # Code-mixed IJERoBERTa
20
 
21
- Code-mixed IJERoBERTa is a pre-trained masked language model for code-mixed Indonesian-Javanese-English tweets data.
22
  This model is trained based on [RoBERTa](https://arxiv.org/abs/1907.11692) model utilizing
23
  Hugging Face's [Transformers]((https://huggingface.co/transformers)) library.
24
 
@@ -50,9 +50,9 @@ Finally, we have 28,121,693 sentences for the training process.
50
  This pretraining data will not be opened to public due to Twitter policy.
51
 
52
  ## Model
53
- | Model name | Architecture | Size of training data | Size of validation data |
54
- |-------------------------|-----------------|----------------------------|-------------------------|
55
- | `code-mixed-ijeroberta` | RoBERTa | 2.24 GB of text | 249 MB of text |
56
 
57
  ## Evaluation Results
58
  We train the data with 3 epochs and total steps of 296K for 16 days.
@@ -66,15 +66,15 @@ The following are the results obtained from the training:
66
  ### Load model and tokenizer
67
  ```python
68
  from transformers import AutoTokenizer, AutoModel
69
- tokenizer = AutoTokenizer.from_pretrained("fathan/code-mixed-ijeroberta")
70
- model = AutoModel.from_pretrained("fathan/code-mixed-ijeroberta")
71
 
72
  ```
73
  ### Masked language model
74
  ```python
75
  from transformers import pipeline
76
 
77
- pretrained_model = "fathan/code-mixed-ijeroberta"
78
 
79
  fill_mask = pipeline(
80
  "fill-mask",
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
19
+ # IJERoBERTa: RoBERTa-base
20
 
21
+ This is a pre-trained masked language model for code-mixed Indonesian-Javanese-English tweets data.
22
  This model is trained based on [RoBERTa](https://arxiv.org/abs/1907.11692) model utilizing
23
  Hugging Face's [Transformers]((https://huggingface.co/transformers)) library.
24
 
 
50
  This pretraining data will not be opened to public due to Twitter policy.
51
 
52
  ## Model
53
+ | Model name | Base model | Size of training data | Size of validation data |
54
+ |--------------------------------------|---------------|----------------------------|-------------------------|
55
+ | `ijeroberta-codemixed-roberta-base` | RoBERTa | 2.24 GB of text | 249 MB of text |
56
 
57
  ## Evaluation Results
58
  We train the data with 3 epochs and total steps of 296K for 16 days.
 
66
  ### Load model and tokenizer
67
  ```python
68
  from transformers import AutoTokenizer, AutoModel
69
+ tokenizer = AutoTokenizer.from_pretrained("fathan/ijeroberta-codemixed-roberta-base")
70
+ model = AutoModel.from_pretrained("fathan/ijeroberta-codemixed-roberta-base")
71
 
72
  ```
73
  ### Masked language model
74
  ```python
75
  from transformers import pipeline
76
 
77
+ pretrained_model = "fathan/ijeroberta-codemixed-roberta-base"
78
 
79
  fill_mask = pipeline(
80
  "fill-mask",