TMElyralab
/

lyraBELLE

English

LLM

BELLE

Model card Files Files and versions Community

bigmoyan commited on May 21, 2023

Commit

3be3aca

•

1 Parent(s): 3f70f85

Update README.md

Browse files

Files changed (1) hide show

README.md +21 -55

README.md CHANGED Viewed

@@ -9,14 +9,14 @@ tags:
 ---
 ## Model Card for lyraBELLE
-lyraBelle is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of ChatGLM-6B**.
 The inference speed of lyraChatGLM has achieved **10x** acceleration upon the ealry original version. We are still working hard to further improve the performance.
 Among its main features are:
 - weights: original BELLE-7B-2M weights released by BelleGroup.
-- device: Any
 - batch_size: compiled with dynamic batch size, max batch_size = 8
 ## Speed
@@ -27,72 +27,38 @@ Among its main features are:
 - batch size: 8
-|version|speed|
-|:-:|:-:|
-|original|30 tokens/s|
-|lyraBelle|310 tokens/s|
 ## Model Sources
 - **Repository:** [https://huggingface.co/BelleGroup/BELLE-7B-2M?clone=true]
-## Try Demo in 2 fast steps
-``` bash
-#step 1
-git clone https://huggingface.co/TMElyralab/lyraChatGLM
-cd lyraChatGLM
-#step 2
-docker run --gpus=1 --rm --net=host -v ${PWD}:/workdir yibolu96/lyra-chatglm-env:0.0.1 python3 /workdir/demo.py
-```
 ## Uses
 ```python
-from transformers import AutoTokenizer
-from faster_chat_glm import GLM6B, FasterChatGLM
-MAX_OUT_LEN = 100
-tokenizer = AutoTokenizer.from_pretrained('./models', trust_remote_code=True)
-input_str = ["为什么我们需要对深度学习模型加速？", ]
-inputs = tokenizer(input_str, return_tensors="pt", padding=True)
-input_ids = inputs.input_ids.to('cuda:0')
-plan_path = './models/glm6b-bs8.ftm'
-# kernel for chat model.
-kernel = GLM6B(plan_path=plan_path,
-               batch_size=1,
-               num_beams=1,
-               use_cache=True,
-               num_heads=32,
-               emb_size_per_heads=128,
-               decoder_layers=28,
-               vocab_size=150528,
-               max_seq_len=MAX_OUT_LEN)
-chat = FasterChatGLM(model_dir="./models", kernel=kernel).half().cuda()
-# generate
-sample_output = chat.generate(inputs=input_ids, max_length=MAX_OUT_LEN)
-# de-tokenize model output to text
-res = tokenizer.decode(sample_output[0], skip_special_tokens=True)
-print(res)
 ```
 ## Demo output
 ### input
-为什么我们需要对深度学习模型加速? 。
 ### output
-为什么我们需要对深度学习模型加速? 深度学习模型的训练需要大量计算资源,特别是在训练模型时,需要大量的内存、GPU(图形处理器)和其他计算资源。因此,训练深度学习模型需要一定的时间,并且如果模型不能快速训练,则可能会导致训练进度缓慢或无法训练。
-以下是一些原因我们需要对深度学习模型加速:
-1. 训练深度神经网络需要大量的计算资源,特别是在训练深度神经网络时,需要更多的计算资源,因此需要更快的训练速度。
 ### TODO：
@@ -100,14 +66,14 @@ We plan to implement a FasterTransformer version to publish a much faster releas
 ## Citation
 ``` bibtex
-@Misc{lyraChatGLM2023,
   author =       {Kangjian Wu, Zhengtao Wang, Bin Wu},
-  title =        {lyraChatGLM: Accelerating ChatGLM by 10x+},
-  howpublished = {\url{https://huggingface.co/TMElyralab/lyraChatGLM}},
   year =         {2023}
 }
 ```
 ## Report bug
-- start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraChatGLM/discussions
 - report bug with a `[bug]` mark in the title.

 ---
 ## Model Card for lyraBELLE
+lyraBelle is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of Belle**.
 The inference speed of lyraChatGLM has achieved **10x** acceleration upon the ealry original version. We are still working hard to further improve the performance.
 Among its main features are:
 - weights: original BELLE-7B-2M weights released by BelleGroup.
+- device: Nvidia Ampere architechture or newer (e.g A100)
 - batch_size: compiled with dynamic batch size, max batch_size = 8
 ## Speed
 - batch size: 8
 ## Model Sources
 - **Repository:** [https://huggingface.co/BelleGroup/BELLE-7B-2M?clone=true]
 ## Uses
 ```python
+from lyraBelle import LyraBelle
+data_type = "fp16"
+prompts = "今天天气大概 25度，有点小雨，吹着风，我想去户外散步，应该穿什么样的衣服裤子鞋子搭配。"
+model_dir = "./model"
+model_name = "1-gpu-fp16.h5"
+max_output_length = 512
+model = LyraBelle(model_dir, model_name, data_type, 0)
+output_texts = model.generate(prompts, output_length=max_output_length,top_k=30, top_p=0.85, temperature=0.35, repetition_penalty=1.2, do_sample=True)
+print(output_texts)
 ```
 ## Demo output
 ### input
+今天天气大概 25度，有点小雨，吹着风，我想去户外散步，应该穿什么样的衣服裤子鞋子搭配。
 ### output
+建议穿着一件轻便的衬衫或T恤、一条牛仔裤和一双运动鞋或休闲鞋。如果下雨了可以带上一把伞。
 ### TODO：
 ## Citation
 ``` bibtex
+@Misc{lyraBelle2023,
   author =       {Kangjian Wu, Zhengtao Wang, Bin Wu},
+  title =        {lyraChatGLM: Accelerating Belle by 10x+},
+  howpublished = {\url{https://huggingface.co/TMElyralab/lyraBelle}},
   year =         {2023}
 }
 ```
 ## Report bug
+- start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraBELLE/discussions
 - report bug with a `[bug]` mark in the title.