julien-c HF staff commited on
Commit
000e6cd
1 Parent(s): c3205eb

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/monologg/koelectra-base-generator/README.md

Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ko
3
+ ---
4
+
5
+ # KoELECTRA (Base Generator)
6
+
7
+ Pretrained ELECTRA Language Model for Korean (`koelectra-base-generator`)
8
+
9
+ For more detail, please see [original repository](https://github.com/monologg/KoELECTRA/blob/master/README_EN.md).
10
+
11
+ ## Usage
12
+
13
+ ### Load model and tokenizer
14
+
15
+ ```python
16
+ >>> from transformers import ElectraModel, ElectraTokenizer
17
+
18
+ >>> model = ElectraModel.from_pretrained("monologg/koelectra-base-generator")
19
+ >>> tokenizer = ElectraTokenizer.from_pretrained("monologg/koelectra-base-generator")
20
+ ```
21
+
22
+ ### Tokenizer example
23
+
24
+ ```python
25
+ >>> from transformers import ElectraTokenizer
26
+ >>> tokenizer = ElectraTokenizer.from_pretrained("monologg/koelectra-base-generator")
27
+ >>> tokenizer.tokenize("[CLS] 한국어 ELECTRA를 공유합니다. [SEP]")
28
+ ['[CLS]', '한국어', 'E', '##L', '##EC', '##T', '##RA', '##를', '공유', '##합니다', '.', '[SEP]']
29
+ >>> tokenizer.convert_tokens_to_ids(['[CLS]', '한국어', 'E', '##L', '##EC', '##T', '##RA', '##를', '공유', '##합니다', '.', '[SEP]'])
30
+ [2, 18429, 41, 6240, 15229, 6204, 20894, 5689, 12622, 10690, 18, 3]
31
+ ```
32
+
33
+ ## Example using ElectraForMaskedLM
34
+
35
+ ```python
36
+ from transformers import pipeline
37
+
38
+ fill_mask = pipeline(
39
+ "fill-mask",
40
+ model="monologg/koelectra-base-generator",
41
+ tokenizer="monologg/koelectra-base-generator"
42
+ )
43
+
44
+ print(fill_mask("나는 {} 밥을 먹었다.".format(fill_mask.tokenizer.mask_token)))
45
+ ```