Zardos commited on
Commit
e2af11d
1 Parent(s): e5b85ff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -6
README.md CHANGED
@@ -5,18 +5,110 @@ license: apache-2.0
5
  tags:
6
  - text-generation-inference
7
  - transformers
8
- - unsloth
9
  - llama
10
  - trl
11
- base_model: unsloth/llama-3-8b-Instruct-bnb-4bit
12
  ---
13
 
 
14
  # Uploaded model
15
 
16
- - **Developed by:** Zardos
17
  - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/llama-3-8b-Instruct-bnb-4bit
19
 
20
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  tags:
6
  - text-generation-inference
7
  - transformers
8
+ - llama3
9
  - llama
10
  - trl
11
+ base_model: unsloth/llama-3-8b-Instruct
12
  ---
13
 
14
+
15
  # Uploaded model
16
 
17
+ - **Finetuned by:** Zardos
18
  - **License:** apache-2.0
 
19
 
 
20
 
21
+ ## How to use
22
+
23
+ This repository contains two versions of Meta-Llama-3-8B-Instruct, for use with transformers and with the original `llama3` codebase.
24
+
25
+ ### Use with transformers
26
+
27
+ You can run conversational inference using the Transformers pipeline abstraction, or by leveraging the Auto classes with the `generate()` function. Let's see examples of both.
28
+
29
+ #### Transformers pipeline
30
+
31
+ ```python
32
+ import transformers
33
+ import torch
34
+
35
+ model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
36
+
37
+ pipeline = transformers.pipeline(
38
+ "text-generation",
39
+ model=model_id,
40
+ model_kwargs={"torch_dtype": torch.bfloat16},
41
+ device_map="auto",
42
+ )
43
+
44
+ messages = [
45
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
46
+ {"role": "user", "content": "Who are you?"},
47
+ ]
48
+
49
+ prompt = pipeline.tokenizer.apply_chat_template(
50
+ messages,
51
+ tokenize=False,
52
+ add_generation_prompt=True
53
+ )
54
+
55
+ terminators = [
56
+ pipeline.tokenizer.eos_token_id,
57
+ pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
58
+ ]
59
+
60
+ outputs = pipeline(
61
+ prompt,
62
+ max_new_tokens=256,
63
+ eos_token_id=terminators,
64
+ do_sample=True,
65
+ temperature=0.6,
66
+ top_p=0.9,
67
+ )
68
+ print(outputs[0]["generated_text"][len(prompt):])
69
+ ```
70
+
71
+ #### Transformers AutoModelForCausalLM
72
+
73
+ ```python
74
+ from transformers import AutoTokenizer, AutoModelForCausalLM
75
+ import torch
76
+
77
+ model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
78
+
79
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
80
+ model = AutoModelForCausalLM.from_pretrained(
81
+ model_id,
82
+ torch_dtype=torch.bfloat16,
83
+ device_map="auto",
84
+ )
85
+
86
+ messages = [
87
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
88
+ {"role": "user", "content": "Who are you?"},
89
+ ]
90
+
91
+ input_ids = tokenizer.apply_chat_template(
92
+ messages,
93
+ add_generation_prompt=True,
94
+ return_tensors="pt"
95
+ ).to(model.device)
96
+
97
+ terminators = [
98
+ tokenizer.eos_token_id,
99
+ tokenizer.convert_tokens_to_ids("<|eot_id|>")
100
+ ]
101
+
102
+ outputs = model.generate(
103
+ input_ids,
104
+ max_new_tokens=256,
105
+ eos_token_id=terminators,
106
+ do_sample=True,
107
+ temperature=0.6,
108
+ top_p=0.9,
109
+ )
110
+ response = outputs[0][input_ids.shape[-1]:]
111
+ print(tokenizer.decode(response, skip_special_tokens=True))
112
+ ```
113
+
114
+