File size: 2,580 Bytes
3eff70a
7cc8ad9
3eff70a
 
 
 
194cf5c
3eff70a
194cf5c
3eff70a
b612f0c
3eff70a
de1720f
3eff70a
 
 
c369996
c4ff90a
3eff70a
b623eca
 
 
de1720f
b623eca
 
cc3990f
 
3eff70a
 
 
 
 
194cf5c
ba364bb
 
 
0c06311
cc3990f
194cf5c
a9ac3f6
be393fd
a9ac3f6
 
3eff70a
 
 
 
 
 
 
 
b623eca
3eff70a
b623eca
 
 
 
 
3eff70a
 
 
 
3be3aca
 
 
 
 
 
 
 
 
b623eca
3be3aca
 
 
3eff70a
 
 
 
3be3aca
3eff70a
 
3be3aca
3eff70a
 
 
 
c369996
3eff70a
a9ac3f6
194cf5c
3eff70a
 
 
 
 
194cf5c
3eff70a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
license: apache-2.0
language:
- en
tags:
- LLM
- BELLE
---
## Model Card for lyraBELLE

lyraBELLE is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of BELLE**.

The inference speed of lyraBELLE has achieved **3.3x+** acceleration upon the original version.

Among its main features are:

- weights: the original BELLE-7B-2M weights released by BelleGroup.
- device: Nvidia Ampere architechture or newer (e.g., A100)

Note that:
**Some interface/code were set for future uses(see demo below).**

- **int8 mode**: not supported yet, please always set it at 0 
- **data type**: only `fp16` available.



## Speed

### test environment

- device: Nvidia A100 40G
- warmup: 10 rounds
- percision: fp16
- batch size: 64
- language: Chinese, keep the same in a batch.
- do_sample: True, the model will generate slightly different answsers to the same questions.


|version|speed|
|:-:|:-:|
|original|826.34 tokens/sec|
|lyraBELLE|2701.71 tokens/sec|




## Model Sources

- **Repository:** [https://huggingface.co/BelleGroup/BELLE-7B-2M?clone=true]

## Environment

- **docker image available** at [https://hub.docker.com/repository/docker/bigmoyan/lyrallm/general], pull image by: 

```
docker pull bigmoyan/lyrallm:v0.1
```

## Uses

```python

from lyraBelle import LyraBelle

data_type = "fp16"
prompts = "今天天气大概 25度,有点小雨,吹着风,我想去户外散步,应该穿什么样的衣服裤子鞋子搭配。"
model_dir = "./model"
model_name = "1-gpu-fp16.h5"
max_output_length = 512

# int8 mode not supported, data_type only support fp16
model = LyraBelle(model_dir, model_name, data_type, 0)
output_texts = model.generate(prompts, output_length=max_output_length,top_k=30, top_p=0.85, temperature=0.35, repetition_penalty=1.2, do_sample=True)
print(output_texts)
```
## Demo output

### input
今天天气大概 25度,有点小雨,吹着风,我想去户外散步,应该穿什么样的衣服裤子鞋子搭配。

### output
建议穿着一件轻便的衬衫或T恤、一条牛仔裤和一双运动鞋或休闲鞋。如果下雨了可以带上一把伞。


## Citation
``` bibtex
@Misc{lyraBELLE2023,
  author =       {Kangjian Wu, Zhengtao Wang, Bin Wu},
  title =        {lyraBELLE: Accelerating BELLE by 3x+},
  howpublished = {\url{https://huggingface.co/TMElyralab/lyraBELLE},
  year =         {2023}
}
```

## Report bug
- start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraBELLE/discussions
- report bug with a `[bug]` mark in the title.