wangyulong commited on
Commit
4d9849b
1 Parent(s): 0e617ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md CHANGED
@@ -1,3 +1,33 @@
1
  ---
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - zh
4
+ tags:
5
+ - text generation
6
+ - pytorch
7
+ - causal-lm
8
  license: apache-2.0
9
  ---
10
+
11
+ # Mengzi-GPT-neo model (Chinese)
12
+ Pretrained model on 300G Chinese corpus.
13
+
14
+ ## Usage
15
+ ```python
16
+ import torch
17
+ import sentencepiece as spm
18
+ from transformers import GPTNeoForCausalLM
19
+ tokenizer = spm.SentencePieceProcessor(model_file="mengzi_gpt.model")
20
+ model = GPTNeoForCausalLM.from_pretrained("Langboat/mengzi-gpt-neo-base")
21
+
22
+ def lm(prompt, top_k, top_p, max_length, repetition_penalty):
23
+ input_ids = torch.tensor(tokenizer.encode([prompt]), dtype=torch.long, device='cuda')
24
+ gen_tokens = model.generate(
25
+ input_ids,
26
+ do_sample=True,
27
+ top_k=top_k,
28
+ top_p=top_p,
29
+ max_length=max_length+len(prompt),
30
+ repetition_penalty=repetition_penalty)
31
+ result = tokenizer.decode(gen_tokens.tolist())[0]
32
+ return result
33
+ ```