TMElyralab
/

lyraBELLE

Model card Files Files and versions Community

bigmoyan commited on May 22, 2023

Commit

cc3990f

•

1 Parent(s): 57951e0

Update README.md

Files changed (1) hide show

README.md +16 -9

README.md CHANGED Viewed

@@ -8,13 +8,13 @@ tags:
 ---
 ## Model Card for lyraBELLE
-lyraBELLE is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of BELLE**.
-The inference speed of lyraBELLE has achieved **100x** acceleration upon the original version.
 Among its main features are:
-- weights: the original BELLE-7B-2M weights released by BelleGroup.
 - device: Nvidia Ampere architechture or newer (e.g A100)
 Note that:
@@ -23,20 +23,27 @@ Note that:
 - **int8 mode**: not supported yet, please always set it to 0
 - **data type**: only `fp16` available.
 ## Speed
 ### test environment
 - device: Nvidia A100 40G
 - warmup: 10 rounds
 - percision：fp16
-- batch size for our version: 64 (maximum under A100 40G)
-- batch size for original: xx (maximum under A100 40G)
 |version|batch size|speed|
-|:-:|:-:|
-|original|xxx|
-|lyraBELLE|80|3030.36 tokens/sec|
@@ -81,7 +88,7 @@ print(output_texts)
 ## Citation
 ``` bibtex
-@Misc{lyraBELLE2023,
   author =       {Kangjian Wu, Zhengtao Wang, Bin Wu},
   title =        {lyraBELLE: Accelerating BELLE by 100x+},
   howpublished = {\url{https://huggingface.co/TMElyralab/lyraBELLE},

 ---
 ## Model Card for lyraBELLE
+lyraBelle is currently the **fastest BELLE model** available. To the best of our knowledge, it is the **first accelerated version of BELLE**.
+The inference speed of lyraBelle has achieved **100x** acceleration upon the ealry original version. We are still working hard to further improve the performance.
 Among its main features are:
+- weights: original BELLE-7B-2M weights released by BelleGroup.
 - device: Nvidia Ampere architechture or newer (e.g A100)
 Note that:
 - **int8 mode**: not supported yet, please always set it to 0
 - **data type**: only `fp16` available.
 ## Speed
 ### test environment
+**it takes few minutes to load model and much longer time when you use original version. Just be patient..**
 - device: Nvidia A100 40G
 - warmup: 10 rounds
 - percision：fp16
+- batch size for our version: 96 (almost maximum under A100 40G)
+- batch size for the original: xx (almost maximum under A100 40G)
+- language： chinese, keep same in a batch.
 |version|batch size|speed|
+|:-:|:-:|:-:|
+|original|50|xx|
+|lyraBELLE|50|xx|
+|lyraBELLE|96|3507.00/sec|
 ## Citation
 ``` bibtex
+@Misc{lyraBelle2023,
   author =       {Kangjian Wu, Zhengtao Wang, Bin Wu},
   title =        {lyraBELLE: Accelerating BELLE by 100x+},
   howpublished = {\url{https://huggingface.co/TMElyralab/lyraBELLE},