|
--- |
|
license: llama3 |
|
datasets: |
|
- yuyijiong/Long-Instruction-with-Paraphrasing |
|
language: |
|
- zh |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
|
|
# Llama3-8b-chinese-chat-32k |
|
|
|
* 📄[Paper](https://arxiv.org/abs/2312.11193) |
|
* 📚[Dataset Download](https://huggingface.co/datasets/yuyijiong/Long-Instruction-with-Paraphrasing) |
|
* ✨[GitHub |
|
](https://github.com/yuyijiong/train_with_paraphrasing) |
|
|
|
## 训练方式 |
|
|
|
* 使用 NTK-aware 方法扩展上下文长度至 **32k** |
|
|
|
* 以 [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat) 为基础 |
|
在 [Long-Instruction-with-Paraphrasing](https://huggingface.co/datasets/yuyijiong/Long-Instruction-with-Paraphrasing) |
|
数据集上,使用 QLora 微调 1 epoch。 |
|
|
|
## 使用方法 |
|
和原版相同 |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_id = "yuyijiong/Llama3-8B-Chinese-Chat-32k" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_id, torch_dtype="auto", device_map="auto" |
|
) |
|
|
|
messages = [ |
|
{"role": "user", "content": "写一首诗吧"}, |
|
] |
|
|
|
input_ids = tokenizer.apply_chat_template( |
|
messages, add_generation_prompt=True, return_tensors="pt" |
|
).to(model.device) |
|
|
|
outputs = model.generate( |
|
input_ids, |
|
max_new_tokens=32768, |
|
do_sample=True, |
|
temperature=0.6, |
|
top_p=0.9, |
|
) |
|
response = outputs[0][input_ids.shape[-1]:] |
|
print(tokenizer.decode(response, skip_special_tokens=True)) |
|
``` |
|
|
|
## Long-Context Performance |
|
相比原始版本,拥有更强的长上下文能力 |
|
|
|
### LongBench (en) |
|
| model | hotpotqa | multifieldqa_en| passage_retrieval_en |qmsum| trec | |
|
|---------------------------|-----------|--|----------------------|--|----------| |
|
| llama3-8b-chinese-chat | 45.88 |50.56| 68.00 |22.52| 73.00 | |
|
| llama3-8b-chinese-chat-32k| **47.64** |49.98| **100.00** |**25.13**| **75.0** | |
|
|
|
## Longbench (zh) |
|
|
|
| model | dureader | multifieldqa_zh | passage_retrieval_zh | vcsum | lsht | |
|
|----------------------------|-----------|-----------------|----------------------|-----------|----------| |
|
| llama3-8b-chinese-chat | 29.08 | 58.4 | 93.5 | 14.61 | 28.25 | |
|
| llama3-8b-chinese-chat-32k | **32.31** | **58.66** | 82.5 | **16.15** | **38.5** | |