bigmoyan commited on
Commit
05581a1
1 Parent(s): acff406

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -19
README.md CHANGED
@@ -23,14 +23,11 @@ Among its main features are:
23
  ### test environment
24
 
25
  - device: Nvidia A100 40G
26
- - img size: 512x512
27
- - percision:fp16
28
- - steps: 30
29
- - solver: LMSD
30
-
31
- ### text2img
32
-
33
 
 
 
 
 
34
 
35
 
36
  ## Model Sources
@@ -40,35 +37,46 @@ Among its main features are:
40
  ## Uses
41
 
42
  ```python
 
43
  from faster_chat_glm import GLM6B, FasterChatGLM
44
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  # kernel for chat model.
46
- kernel = GLM6B(plan_path=plan_path,
47
- batch_size=BATCH_SIZE,
48
  num_beams=1,
49
- use_cache=USE_CACHE,
50
  num_heads=32,
51
  emb_size_per_heads=128,
52
  decoder_layers=28,
53
  vocab_size=150528,
54
  max_seq_len=MAX_OUT_LEN)
55
-
56
  chat = FasterChatGLM(model_dir=chatglm6b_dir, kernel=kernel).half().cuda()
57
 
58
  # generate
59
  sample_output = chat.generate(inputs=input_ids, max_length=MAX_OUT_LEN)
60
-
 
 
61
  ```
62
  ## Demo output
63
 
64
- ### text2img
65
- ![text2img_demo](./output/text2img_demo.jpg)
66
-
67
- ### img2img
68
-
69
- ![text2img_demo](./output/img2img_input.jpg)
70
 
71
- ![text2img_demo](./output/img2img_demo.jpg)
 
72
 
73
 
74
 
 
23
  ### test environment
24
 
25
  - device: Nvidia A100 40G
 
 
 
 
 
 
 
26
 
27
+ |version|speed|
28
+ |:-:|:-:|
29
+ |original|30 tokens/s|
30
+ |lyraChatGLM|310 tokens/s|
31
 
32
 
33
  ## Model Sources
 
37
  ## Uses
38
 
39
  ```python
40
+ from transformers import AutoTokenizer
41
  from faster_chat_glm import GLM6B, FasterChatGLM
42
 
43
+
44
+ tokenizer = AutoTokenizer.from_pretrained(chatglm6b_dir, trust_remote_code=True)
45
+
46
+ BATCH_SIZE = 8
47
+ MAX_OUT_LEN = 50
48
+
49
+ # prepare input
50
+ input_str = ["音乐推荐应该考虑哪些因素?帮我写一篇不少于800字的方案。 ", ] *
51
+ inputs = tokenizer(input_str, return_tensors="pt", padding=True)
52
+ input_ids = inputs.input_ids.to('cuda:0')
53
+
54
+
55
  # kernel for chat model.
56
+ kernel = GLM6B(plan_path="./models/glm6b-bs{BATCH_SIZE}.ftm",
57
+ batch_size=1,
58
  num_beams=1,
59
+ use_cache=True,
60
  num_heads=32,
61
  emb_size_per_heads=128,
62
  decoder_layers=28,
63
  vocab_size=150528,
64
  max_seq_len=MAX_OUT_LEN)
 
65
  chat = FasterChatGLM(model_dir=chatglm6b_dir, kernel=kernel).half().cuda()
66
 
67
  # generate
68
  sample_output = chat.generate(inputs=input_ids, max_length=MAX_OUT_LEN)
69
+ # de-tokenize model output to text
70
+ res = tokenizer.decode(sample_output[0], skip_special_tokens=True)
71
+ print(res)
72
  ```
73
  ## Demo output
74
 
75
+ ### input
76
+ 音乐推荐应该考虑哪些因素?帮我写一篇不少于800字的方案。
 
 
 
 
77
 
78
+ ### output
79
+ 音乐推荐是音乐爱好者们经常面临的问题。一个好的音乐推荐应该能够根据用户的需求和喜好,推荐出符合他们口味的音乐。本文将探讨音乐
80
 
81
 
82