Files changed (1) hide show
  1. README.md +110 -0
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: winglian/Llama-3-8b-64k-PoSE
3
+ library_name: transformers
4
+ tags:
5
+ - axolotl
6
+ - finetune
7
+ - dpo
8
+ - facebook
9
+ - meta
10
+ - pytorch
11
+ - llama
12
+ - llama-3
13
+ - 64k
14
+ - pose
15
+ language:
16
+ - en
17
+ pipeline_tag: text-generation
18
+ license: llama3
19
+ license_name: llama3
20
+ license_link: LICENSE
21
+ inference: false
22
+ model_creator: MaziyarPanahi
23
+ model_name: Llama-3-8B-Instruct-64k
24
+ quantized_by: MaziyarPanahi
25
+ datasets:
26
+ - Intel/orca_dpo_pairs
27
+ ---
28
+
29
+ <img src="./llama-3-merges.webp" alt="Llama-3 DPO Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
30
+
31
+
32
+ # MaziyarPanahi/Llama-3-8B-Instruct-64k
33
+
34
+ This model has been made based on a great of [@winglian](https://huggingface.co/winglian/) with his latest model [winglian/Llama-3-8b-64k-PoSE](https://huggingface.co/winglian/Llama-3-8b-64k-PoSE/)
35
+
36
+ > This model uses [PoSE](https://huggingface.co/papers/2309.10400) to extend Llama's context length from 8k to 64k @ rope_theta: 500000.0.
37
+ > We used PoSE with continued pretraining on 300M tokens from the RedPajama V1 dataset using data between 6k-8k tokens.
38
+ > We have further set rope_theta to 2M after continued pre-training to potentially further extend the context past 64k.
39
+ > This was trained on a subset of the RedPajama v1 dataset with text between 6k-8k context. We trained a rank stabilized LoRA of rank 256. [WandB](https://wandb.ai/oaaic/llama-3-64k/runs/tkcyjt37)
40
+
41
+ # Quantized GGUF
42
+
43
+ All GGUF models come with context length of `64000`: [MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF](https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-64k-GGUF)
44
+
45
+
46
+ # How to use
47
+
48
+ You can use this model by using `MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.3` as the model name in Hugging Face's
49
+ transformers library.
50
+
51
+ ```python
52
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
53
+ from transformers import pipeline
54
+ import torch
55
+
56
+ model_id = "MaziyarPanahi/Llama-3-8B-Instruct-64k"
57
+
58
+ model = AutoModelForCausalLM.from_pretrained(
59
+ model_id,
60
+ torch_dtype=torch.bfloat16,
61
+ device_map="auto",
62
+ trust_remote_code=True,
63
+ # attn_implementation="flash_attention_2"
64
+ )
65
+
66
+ tokenizer = AutoTokenizer.from_pretrained(
67
+ model_id,
68
+ trust_remote_code=True
69
+ )
70
+
71
+ streamer = TextStreamer(tokenizer)
72
+
73
+ pipeline = pipeline(
74
+ "text-generation",
75
+ model=model,
76
+ tokenizer=tokenizer,
77
+ model_kwargs={"torch_dtype": torch.bfloat16},
78
+ streamer=streamer
79
+ )
80
+
81
+ # Then you can use the pipeline to generate text.
82
+
83
+ messages = [
84
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
85
+ {"role": "user", "content": "Who are you?"},
86
+ ]
87
+
88
+ prompt = tokenizer.apply_chat_template(
89
+ messages,
90
+ tokenize=False,
91
+ add_generation_prompt=True
92
+ )
93
+
94
+ terminators = [
95
+ tokenizer.eos_token_id,
96
+ tokenizer.convert_tokens_to_ids("<|im_end|>")
97
+ ]
98
+
99
+ outputs = pipeline(
100
+ prompt,
101
+ max_new_tokens=8192,
102
+ eos_token_id=terminators,
103
+ do_sample=True,
104
+ temperature=0.6,
105
+ top_p=0.95,
106
+ )
107
+ print(outputs[0]["generated_text"][len(prompt):])
108
+ ```
109
+
110
+