shibing624 commited on
Commit
e19dad0
1 Parent(s): 0ea4dd5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -1
README.md CHANGED
@@ -1,3 +1,102 @@
1
  ---
2
- license: llama3
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - zh
4
+ - en
5
+ pipeline_tag: text-generation
6
+ license: other
7
+ license_name: llama3
8
+ license_link: LICENSE
9
+ tags:
10
+ - llama3
11
+ - chinese
12
+ - meta
13
  ---
14
+
15
+
16
+ # llama-3-8b-instruct-262k-chinese-lora
17
+
18
+
19
+ llama-3-8b-instruct-262k-chinese基于[Llama-3-8B-Instruct-262k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k),使用ORPO方法,在中英文偏好数据集[shibing624/DPO-En-Zh-20k-Preference](https://huggingface.co/datasets/shibing624/DPO-En-Zh-20k-Preference)
20
+ 上微调得到的对话模型。
21
+
22
+ 模型的部署、训练等方法详见MedicalGPT的GitHub仓库:[https://github.com/shibing624/MedicalGPT](https://github.com/shibing624/MedicalGPT)
23
+ ## Relate models
24
+ - 完整模型权重:https://huggingface.co/shibing624/llama-3-8b-instruct-262k-chinese
25
+ - lora权重:https://huggingface.co/shibing624/llama-3-8b-instruct-262k-chinese-lora
26
+ ## 如何使用
27
+
28
+ ```python
29
+ import transformers
30
+ import torch
31
+
32
+ model_id = "shibing624/llama-3-8b-instruct-262k-chinese"
33
+ pipeline = transformers.pipeline(
34
+ "text-generation",
35
+ model=model_id,
36
+ model_kwargs={"torch_dtype": torch.float16},
37
+ device="cuda",
38
+ )
39
+
40
+ messages = [{"role": "system", "content": ""}]
41
+ messages.append({"role": "user", "content": "介绍一下机器学习"})
42
+ prompt = pipeline.tokenizer.apply_chat_template(
43
+ messages,
44
+ tokenize=False,
45
+ add_generation_prompt=True
46
+ )
47
+ terminators = [
48
+ pipeline.tokenizer.eos_token_id,
49
+ pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
50
+ ]
51
+ outputs = pipeline(
52
+ prompt,
53
+ max_new_tokens=512,
54
+ eos_token_id=terminators,
55
+ do_sample=True,
56
+ temperature=0.6,
57
+ top_p=0.9
58
+ )
59
+ content = outputs[0]["generated_text"][len(prompt):]
60
+ print(content)
61
+ ```
62
+
63
+
64
+ ## About Llama-3-8B-Instruct-262k
65
+ Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. To learn more or collaborate on a custom model.
66
+
67
+ This model extends LLama-3 8B's context length from 8k to -> 160K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens) by appropriately adjusting RoPE theta.
68
+
69
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6585dc9be92bc5f258156bd6/hiHWva3CbsrnPvZTp5-lu.png" width="600">
70
+
71
+ **Approach:**
72
+
73
+ - [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) as the base
74
+ - NTK-aware interpolation [1] to initialize an optimal schedule for RoPE theta, followed by a new data-driven RoPE theta optimization technique
75
+ - Progressive training on increasing context lengths similar to the [Large World Model](https://huggingface.co/LargeWorldModel) [2] (See details below)
76
+
77
+ **Infra:**
78
+
79
+ We build on top of the EasyContext Blockwise RingAttention library [3] to scalably and efficiently train on contexts up to 262144 tokens on [Crusoe Energy](https://huggingface.co/crusoeai) high performance L40S cluster.
80
+
81
+ **Data:**
82
+
83
+ For training data, we generate long contexts by augmenting [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
84
+
85
+ **Progressive Training Details:**
86
+
87
+ | Parameter | 65K | 262K |
88
+ |-----------------------------|----------------|------------|
89
+ | Initialize From | LLaMA-3-8B-Inst| 65K |
90
+ | Sequence Length | 2^16 | 2^18 |
91
+ | RoPE theta | 15.3 M | 207.1 M |
92
+ | Batch Size (Tokens / Step) | 2.097 M | 4.192 M |
93
+ | Steps | 30 | 24 |
94
+ | Total Tokens | 63 M | 101 M |
95
+ | Learning Rate | 2.00E-05 | 2.00E-05 |
96
+ | # GPUs | 32 | 32 |
97
+ | GPU Type | NVIDIA L40S | NVIDIA L40S|
98
+
99
+
100
+
101
+
102
+