README.md · TMElyralab/lyraBELLE at abd81f7a053edcc51d81838576ac46b3bbdcf056

metadata

license: creativeml-openrail-m
language:
  - en
tags:
  - LLM
  - tensorRT
  - Belle

Model Card for lyraBELLE

lyraBelle is currently the fastest BELLE model available. To the best of our knowledge, it is the first accelerated version of Belle.

The inference speed of lyraChatGLM has achieved 10x acceleration upon the ealry original version. We are still working hard to further improve the performance.

Among its main features are:

weights: original BELLE-7B-2M weights released by BelleGroup.
device: Nvidia Ampere architechture or newer (e.g A100)
batch_size: compiled with dynamic batch size, max batch_size = 8

Speed

test environment

device: Nvidia A100 40G
batch size: 8

Model Sources

Repository: [https://huggingface.co/BelleGroup/BELLE-7B-2M?clone=true]

Uses


from lyraBelle import LyraBelle

data_type = "fp16"
prompts = "今天天气大概 25度，有点小雨，吹着风，我想去户外散步，应该穿什么样的衣服裤子鞋子搭配。"
model_dir = "./model"
model_name = "1-gpu-fp16.h5"
max_output_length = 512


model = LyraBelle(model_dir, model_name, data_type, 0)
output_texts = model.generate(prompts, output_length=max_output_length,top_k=30, top_p=0.85, temperature=0.35, repetition_penalty=1.2, do_sample=True)
print(output_texts)

Demo output

input

今天天气大概 25度，有点小雨，吹着风，我想去户外散步，应该穿什么样的衣服裤子鞋子搭配。

output

建议穿着一件轻便的衬衫或T恤、一条牛仔裤和一双运动鞋或休闲鞋。如果下雨了可以带上一把伞。

TODO：

We plan to implement a FasterTransformer version to publish a much faster release. Stay tuned!

Citation

@Misc{lyraBelle2023,
  author =       {Kangjian Wu, Zhengtao Wang, Bin Wu},
  title =        {lyraChatGLM: Accelerating Belle by 10x+},
  howpublished = {\url{https://huggingface.co/TMElyralab/lyraBelle},
  year =         {2023}
}

Report bug

start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraBelle/discussions
report bug with a [bug] mark in the title.