bigmoyan commited on
Commit
3be3aca
1 Parent(s): 3f70f85

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -55
README.md CHANGED
@@ -9,14 +9,14 @@ tags:
9
  ---
10
  ## Model Card for lyraBELLE
11
 
12
- lyraBelle is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of ChatGLM-6B**.
13
 
14
  The inference speed of lyraChatGLM has achieved **10x** acceleration upon the ealry original version. We are still working hard to further improve the performance.
15
 
16
  Among its main features are:
17
 
18
  - weights: original BELLE-7B-2M weights released by BelleGroup.
19
- - device: Any
20
  - batch_size: compiled with dynamic batch size, max batch_size = 8
21
 
22
  ## Speed
@@ -27,72 +27,38 @@ Among its main features are:
27
  - batch size: 8
28
 
29
 
30
- |version|speed|
31
- |:-:|:-:|
32
- |original|30 tokens/s|
33
- |lyraBelle|310 tokens/s|
34
 
35
 
36
  ## Model Sources
37
 
38
  - **Repository:** [https://huggingface.co/BelleGroup/BELLE-7B-2M?clone=true]
39
 
40
- ## Try Demo in 2 fast steps
41
 
42
- ``` bash
43
- #step 1
44
- git clone https://huggingface.co/TMElyralab/lyraChatGLM
45
- cd lyraChatGLM
46
-
47
- #step 2
48
- docker run --gpus=1 --rm --net=host -v ${PWD}:/workdir yibolu96/lyra-chatglm-env:0.0.1 python3 /workdir/demo.py
49
- ```
50
 
51
  ## Uses
52
 
53
  ```python
54
- from transformers import AutoTokenizer
55
- from faster_chat_glm import GLM6B, FasterChatGLM
56
-
57
-
58
- MAX_OUT_LEN = 100
59
- tokenizer = AutoTokenizer.from_pretrained('./models', trust_remote_code=True)
60
- input_str = ["为什么我们需要对深度学习模型加速?", ]
61
- inputs = tokenizer(input_str, return_tensors="pt", padding=True)
62
- input_ids = inputs.input_ids.to('cuda:0')
63
-
64
-
65
- plan_path = './models/glm6b-bs8.ftm'
66
- # kernel for chat model.
67
- kernel = GLM6B(plan_path=plan_path,
68
- batch_size=1,
69
- num_beams=1,
70
- use_cache=True,
71
- num_heads=32,
72
- emb_size_per_heads=128,
73
- decoder_layers=28,
74
- vocab_size=150528,
75
- max_seq_len=MAX_OUT_LEN)
76
-
77
- chat = FasterChatGLM(model_dir="./models", kernel=kernel).half().cuda()
78
-
79
- # generate
80
- sample_output = chat.generate(inputs=input_ids, max_length=MAX_OUT_LEN)
81
- # de-tokenize model output to text
82
- res = tokenizer.decode(sample_output[0], skip_special_tokens=True)
83
- print(res)
84
  ```
85
  ## Demo output
86
 
87
  ### input
88
- 为什么我们需要对深度学习模型加速?
89
 
90
  ### output
91
- 为什么我们需要对深度学习模型加速? 深度学习模型的训练需要大量计算资源,特别是在训练模型时,需要大量的内存、GPU(图形处理器)和其他计算资源。因此,训练深度学习模型需要一定的时间,并且如果模型不能快速训练,则可能会导致训练进度缓慢或无法训练。
92
-
93
- 以下是一些原因我们需要对深度学习模型加速:
94
-
95
- 1. 训练深度神经网络需要大量的计算资源,特别是在训练深度神经网络时,需要更多的计算资源,因此需要更快的训练速度。
96
 
97
  ### TODO:
98
 
@@ -100,14 +66,14 @@ We plan to implement a FasterTransformer version to publish a much faster releas
100
 
101
  ## Citation
102
  ``` bibtex
103
- @Misc{lyraChatGLM2023,
104
  author = {Kangjian Wu, Zhengtao Wang, Bin Wu},
105
- title = {lyraChatGLM: Accelerating ChatGLM by 10x+},
106
- howpublished = {\url{https://huggingface.co/TMElyralab/lyraChatGLM}},
107
  year = {2023}
108
  }
109
  ```
110
 
111
  ## Report bug
112
- - start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraChatGLM/discussions
113
  - report bug with a `[bug]` mark in the title.
 
9
  ---
10
  ## Model Card for lyraBELLE
11
 
12
+ lyraBelle is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of Belle**.
13
 
14
  The inference speed of lyraChatGLM has achieved **10x** acceleration upon the ealry original version. We are still working hard to further improve the performance.
15
 
16
  Among its main features are:
17
 
18
  - weights: original BELLE-7B-2M weights released by BelleGroup.
19
+ - device: Nvidia Ampere architechture or newer (e.g A100)
20
  - batch_size: compiled with dynamic batch size, max batch_size = 8
21
 
22
  ## Speed
 
27
  - batch size: 8
28
 
29
 
 
 
 
 
30
 
31
 
32
  ## Model Sources
33
 
34
  - **Repository:** [https://huggingface.co/BelleGroup/BELLE-7B-2M?clone=true]
35
 
 
36
 
 
 
 
 
 
 
 
 
37
 
38
  ## Uses
39
 
40
  ```python
41
+
42
+ from lyraBelle import LyraBelle
43
+
44
+ data_type = "fp16"
45
+ prompts = "今天天气大概 25度,有点小雨,吹着风,我想去户外散步,应该穿什么样的衣服裤子鞋子搭配。"
46
+ model_dir = "./model"
47
+ model_name = "1-gpu-fp16.h5"
48
+ max_output_length = 512
49
+
50
+
51
+ model = LyraBelle(model_dir, model_name, data_type, 0)
52
+ output_texts = model.generate(prompts, output_length=max_output_length,top_k=30, top_p=0.85, temperature=0.35, repetition_penalty=1.2, do_sample=True)
53
+ print(output_texts)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  ```
55
  ## Demo output
56
 
57
  ### input
58
+ 今天天气大概 25度,有点小雨,吹着风,我想去户外散步,应该穿什么样的衣服裤子鞋子搭配。
59
 
60
  ### output
61
+ 建议穿着一件轻便的衬衫或T恤、一条牛仔裤和一双运动鞋或休闲鞋。如果下雨了可以带上一把伞。
 
 
 
 
62
 
63
  ### TODO:
64
 
 
66
 
67
  ## Citation
68
  ``` bibtex
69
+ @Misc{lyraBelle2023,
70
  author = {Kangjian Wu, Zhengtao Wang, Bin Wu},
71
+ title = {lyraChatGLM: Accelerating Belle by 10x+},
72
+ howpublished = {\url{https://huggingface.co/TMElyralab/lyraBelle}},
73
  year = {2023}
74
  }
75
  ```
76
 
77
  ## Report bug
78
+ - start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraBELLE/discussions
79
  - report bug with a `[bug]` mark in the title.