|
--- |
|
{} |
|
--- |
|
|
|
# MPT-7b-8k-chat |
|
|
|
This model is originally released under CC-BY-NC-SA-4.0, and the AWQ framework is MIT licensed. |
|
|
|
Original model can be found at [https://huggingface.co/mosaicml/mpt-7b-8k-chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat). |
|
|
|
## ⚡ 4-bit Inference Speed |
|
|
|
Machines rented from RunPod - speed may vary dependent on both GPU/CPU. |
|
|
|
H100: |
|
- CUDA 12.0, Driver 525.105.17: 92 tokens/s (10.82 ms/token) |
|
|
|
RTX 4090 + Intel i9 13900K (2 different VMs): |
|
- CUDA 12.0, Driver 525.125.06: 134 tokens/s (7.46 ms/token) |
|
- CUDA 12.0, Driver 525.125.06: 117 tokens/s (8.52 ms/token) |
|
|
|
RTX 4090 + AMD EPYC 7-Series (3 different VMs): |
|
- CUDA 12.2, Driver 535.54.03: 53 tokens/s (18.6 ms/token) |
|
- CUDA 12.2, Driver 535.54.03: 56 tokens/s (17.71 ms/token) |
|
- CUDA 12.0, Driver 525.125.06: 55 tokens/ (18.15 ms/token) |
|
|
|
A6000 (2 different VMs): |
|
- CUDA 12.0, Driver 525.105.17: 61 tokens/s (16.31 ms/token) |
|
- CUDA 12.1, Driver 530.30.02: 46 tokens/s (21.79 ms/token) |
|
|
|
## How to run |
|
|
|
Install [AWQ](https://github.com/mit-han-lab/llm-awq): |
|
|
|
```sh |
|
git clone https://github.com/mit-han-lab/llm-awq && \ |
|
cd llm-awq && \ |
|
pip3 install -e . && \ |
|
cd awq/kernels && \ |
|
python3 setup.py install && \ |
|
cd ../.. && \ |
|
pip3 install einops |
|
``` |
|
|
|
Run: |
|
|
|
```sh |
|
hfuser="casperhansen" |
|
model_name="mpt-7b-8k-chat-awq" |
|
group_size=128 |
|
repo_path="$hfuser/$model_name" |
|
model_path="/workspace/llm-awq/$model_name" |
|
quantized_model_path="/workspace/llm-awq/$model_name/$model_name-w4-g$group_size.pt" |
|
|
|
git clone https://huggingface.co/$repo_path |
|
|
|
python3 tinychat/demo.py --model_type mpt \ |
|
--model_path $model_path \ |
|
--q_group_size $group_size \ |
|
--load_quant $quantized_model_path \ |
|
--precision W4A16 |
|
``` |
|
|
|
## Citation |
|
|
|
Please cite this model using the following format: |
|
|
|
``` |
|
@online{MosaicML2023Introducing, |
|
author = {MosaicML NLP Team}, |
|
title = {Introducing MPT-30B: Raising the bar |
|
for open-source foundation models}, |
|
year = {2023}, |
|
url = {www.mosaicml.com/blog/mpt-30b}, |
|
note = {Accessed: 2023-06-22}, |
|
urldate = {2023-06-22} |
|
} |
|
``` |