File size: 1,102 Bytes

---
language:
- zh
- en
pipeline_tag: text-generation
---
# Qwen/Qwen-14B-Chat

Despite the repo name, it's the chat version.  

After the release of Mistral, I realized that Chinese models were underappreciated.

This monster needed 60 GB of peak memory for quantization.

## Credits

[阿里云 Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat)

## Usage

Start the interactive chat by running the linux command.

```
./main -m ./Qwen-14b-Q8_0.bin --tiktoken ./qwen.tiktoken -i 
```

## 仙女小可 Evaluation Results

|  | MMLU | GSM8K | Humaneval | MBPP |
|--|--|--|--|--|
| Qwen 14B Chat | 64% | 61% | 32% | 41% |
| LLama 2 13B | 56% | 34% | 19% | 35% |
| Phi 1.5 | 37% | 40% | 34% | 38% |
| Code Llama 7B | 37% | 21% | 31% | 53% |
| Mistral 7B | 60% | 52% | 31% | 48% |

**Columns:** English, Mathematics, Coding, Basic Python Programming Evaluation

## 馍馍的做法 Architecture

|  |  |
|--|--|
| Layers | 40 |
| Heads | 40 |
| Embedding | 5120 |
| Vocabulary | 151851 |
| Sequence length | 2048 |

## Find me on

[Sh-it-just-works](https://sh.itjust.works/c/localllama) and Patreon