File size: 2,397 Bytes
801c690
 
 
 
 
 
 
 
 
 
e1997dc
801c690
 
f4e4b42
 
4761149
 
f4e4b42
801c690
 
e1997dc
801c690
 
 
 
 
e1997dc
 
 
 
 
 
9759c6e
e1997dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
801c690
e1997dc
801c690
 
 
e1997dc
 
 
 
801c690
4761149
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
license: llama3
datasets:
- yuyijiong/Long-Instruction-with-Paraphrasing
language:
- zh
- en
pipeline_tag: text-generation
---


# Llama3-8b-chinese-chat-32k 

* 📄[Paper](https://arxiv.org/abs/2312.11193)
* 📚[Dataset Download](https://huggingface.co/datasets/yuyijiong/Long-Instruction-with-Paraphrasing)
* ✨[GitHub
](https://github.com/yuyijiong/train_with_paraphrasing)

## 训练方式

* 使用 NTK-aware 方法扩展上下文长度至 **32k**

* 以 [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat) 为基础
在 [Long-Instruction-with-Paraphrasing](https://huggingface.co/datasets/yuyijiong/Long-Instruction-with-Paraphrasing)
数据集上,使用 QLora 微调 1 epoch。

## 使用方法
和原版相同

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yuyijiong/Llama3-8B-Chinese-Chat-32k"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype="auto", device_map="auto"
)

messages = [
    {"role": "user", "content": "写一首诗吧"},
]

input_ids = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=32768,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
```

## Long-Context Performance
相比原始版本,拥有更强的长上下文能力

### LongBench (en)
| model                     | hotpotqa  | multifieldqa_en| passage_retrieval_en |qmsum| trec     |
|---------------------------|-----------|--|----------------------|--|----------|
| llama3-8b-chinese-chat            | 45.88     |50.56| 68.00                |22.52| 73.00    |
| llama3-8b-chinese-chat-32k| **47.64** |49.98| **100.00**           |**25.13**| **75.0** |

## Longbench (zh)

| model                      | dureader  | multifieldqa_zh | passage_retrieval_zh | vcsum     | lsht     |
|----------------------------|-----------|-----------------|----------------------|-----------|----------|
| llama3-8b-chinese-chat     | 29.08     | 58.4            | 93.5                 | 14.61     | 28.25     |
| llama3-8b-chinese-chat-32k | **32.31** | **58.66**       | 82.5                 | **16.15** | **38.5** |