TMElyralab
/

lyraXVERSE

English

LLM

XVERSE-13B-Chat

Model card Files Files and versions Community

carsonhxsu commited on Sep 14, 2023

Commit

e9adcd1

•

2 Parent(s): 0b2ea18 44dfb53

Merge branch 'main' of https://huggingface.co/TMElyralab/lyraXVERSE into main

Browse files

Files changed (1) hide show

README.md +3 -8

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ tags:
 lyraXVERSE is currently the **fastest XVERSE-13b** available. The inference speed of lyraXVERSE has achieved up to **3900+ tokens/s** on A100, up to **2.7x** acceleration upon the torch version.
 Among its main features are:
-- device: Nvidia GPU with Amperer architecture or Volta architecture (A100 or higher, V100).
 - batch_size: compiled with dynamic batch size, maximum depends on device.
 - MEMOPT mode: significantly optimized VRAM usage and increased speed
@@ -27,7 +27,7 @@ We use the XVERSE-13B-Chat model for measurement, but this optimized inference i
 | Version | Batch Size 1 | Batch Size 8 | Batch Size 16 | Batch Size 32 | Batch Size 64 |
 | --- | --- | --- | --- | --- | --- |
 | Torch | 34.8 | 249.2 | 470.1 | 878.6 | 1478.9 |
-| lyraXVERSE MEMOPT | 96.6 | 725.5 | 1359.3 | 2415.6 | 3923.2 |
 ## Docker Environment Recommendation
@@ -90,11 +90,6 @@ print(output_texts)
 这个故事告诉我们,画家的价值不只是他们的绘画技巧,而是他们的画作带给人们的感动和希望。画家的价值并不在于他们的画有多么昂贵,有多么独特,而在于他们能用画作打开人们的心扉,让人们看见希望,看见生活的美好。
-## TODO
-1. Support for int4
-2. Inference for longer context situations
-3. Streaming inference mode.
 ## Citation
 ``` bibtex
 @Misc{lyraXVERSE2023,
@@ -105,6 +100,6 @@ print(output_texts)
 }
 ```
-## Report bug
 - start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraXVERSE
 - report bug with a `[bug]` mark in the title.

 lyraXVERSE is currently the **fastest XVERSE-13b** available. The inference speed of lyraXVERSE has achieved up to **3900+ tokens/s** on A100, up to **2.7x** acceleration upon the torch version.
 Among its main features are:
+- device: Nvidia GPU with Amperer architecture or Volta architecture (A10, A100 or higher, V100).
 - batch_size: compiled with dynamic batch size, maximum depends on device.
 - MEMOPT mode: significantly optimized VRAM usage and increased speed
 | Version | Batch Size 1 | Batch Size 8 | Batch Size 16 | Batch Size 32 | Batch Size 64 |
 | --- | --- | --- | --- | --- | --- |
 | Torch | 34.8 | 249.2 | 470.1 | 878.6 | 1478.9 |
+| lyraXVERSE | 96.6 | 725.5 | 1359.3 | 2415.6 | 3923.2 |
 ## Docker Environment Recommendation
 这个故事告诉我们,画家的价值不只是他们的绘画技巧,而是他们的画作带给人们的感动和希望。画家的价值并不在于他们的画有多么昂贵,有多么独特,而在于他们能用画作打开人们的心扉,让人们看见希望,看见生活的美好。
 ## Citation
 ``` bibtex
 @Misc{lyraXVERSE2023,
 }
 ```
+## Report bugs
 - start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraXVERSE
 - report bug with a `[bug]` mark in the title.