kykim commited on
Commit
68481c6
1 Parent(s): cbfdd09

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ko
3
+ ---
4
+
5
+ # Bert base model for Korean
6
+
7
+ * 70GB Korean text dataset and 42000 lower-cased subwords are used
8
+ * Check the model performance and other language models for Korean in [github](https://github.com/kiyoungkim1/LM-kor)
9
+
10
+ ```python
11
+ from transformers import BertTokenizerFast, GPT2LMHeadModel
12
+ tokenizer_gpt3 = BertTokenizerFast.from_pretrained("kykim/gpt3-kor-small_based_on_gpt2")
13
+ input_ids = tokenizer_gpt3.encode("text to tokenize")[1:] # remove cls token
14
+
15
+ model_gpt3 = GPT2LMHeadModel.from_pretrained("kykim/gpt3-kor-small_based_on_gpt2")
16
+ ```