monologg commited on
Commit
d1ac3a2
1 Parent(s): f9b64da

docs: update readme

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ko
3
+ ---
4
+
5
+ # KoELECTRA v2 (Base Generator)
6
+
7
+ Pretrained ELECTRA Language Model for Korean (`koelectra-base-v2-generator`)
8
+
9
+ For more detail, please see [original repository](https://github.com/monologg/KoELECTRA/blob/master/README_EN.md).
10
+
11
+ ## Usage
12
+
13
+ ### Load model and tokenizer
14
+
15
+ ```python
16
+ >>> from transformers import ElectraModel, ElectraTokenizer
17
+
18
+ >>> model = ElectraModel.from_pretrained("monologg/koelectra-base-v2-generator")
19
+ >>> tokenizer = ElectraTokenizer.from_pretrained("monologg/koelectra-base-v2-generator")
20
+ ```
21
+
22
+ ### Tokenizer example
23
+
24
+ ```python
25
+ >>> from transformers import ElectraTokenizer
26
+ >>> tokenizer = ElectraTokenizer.from_pretrained("monologg/koelectra-base-v2-generator")
27
+ >>> tokenizer.tokenize("[CLS] 한국어 ELECTRA를 공유합니다. [SEP]")
28
+ ['[CLS]', '한국어', 'EL', '##EC', '##TRA', '##를', '공유', '##합니다', '.', '[SEP]']
29
+ >>> tokenizer.convert_tokens_to_ids(['[CLS]', '한국어', 'EL', '##EC', '##TRA', '##를', '공유', '##합니다', '.', '[SEP]'])
30
+ [2, 5084, 16248, 3770, 19059, 29965, 2259, 10431, 5, 3]
31
+ ```
32
+
33
+ ## Example using ElectraForMaskedLM
34
+
35
+ ```python
36
+ from transformers import pipeline
37
+
38
+ fill_mask = pipeline(
39
+ "fill-mask",
40
+ model="monologg/koelectra-base-v2-generator",
41
+ tokenizer="monologg/koelectra-base-v2-generator"
42
+ )
43
+
44
+ print(fill_mask("나는 {} 밥을 먹었다.".format(fill_mask.tokenizer.mask_token)))
45
+ ```