dfurman commited on
Commit
79a5c2b
1 Parent(s): a9a7ab8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -11
README.md CHANGED
@@ -16,9 +16,9 @@ base_model:
16
 
17
  # dfurman/Llama-3-8B-Orpo-v0.1
18
 
19
- ![](https://i.imgur.com/ZHwzQvI.png)
20
 
21
- This is an ORPO fine-tune of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on 2k samples of [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k).
22
 
23
  It's a successful fine-tune that follows the ChatML template!
24
 
@@ -36,28 +36,65 @@ TBD.
36
 
37
  You can find the experiment on W&B at [this address](https://wandb.ai/dryanfurman/huggingface/runs/rlytsd0k?nw=nwuserdryanfurman).
38
 
39
-
40
  ## 💻 Usage
41
 
 
 
 
 
42
  ```python
43
- !pip install -qU transformers accelerate
44
 
45
- from transformers import AutoTokenizer
46
  import transformers
47
  import torch
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  model = "dfurman/Llama-3-8B-Orpo-v0.1"
50
- messages = [{"role": "user", "content": "What is a large language model?"}]
51
 
52
  tokenizer = AutoTokenizer.from_pretrained(model)
53
- prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
54
  pipeline = transformers.pipeline(
55
  "text-generation",
56
  model=model,
57
- torch_dtype=torch.float16,
58
- device_map="auto",
 
 
 
 
59
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
  outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
62
- print(outputs[0]["generated_text"])
63
- ```
 
 
 
 
 
16
 
17
  # dfurman/Llama-3-8B-Orpo-v0.1
18
 
19
+ ![](https://raw.githubusercontent.com/daniel-furman/sft-demos/main/assets/llama_3.jpeg)
20
 
21
+ This is an ORPO fine-tune of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on 4k samples of [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k).
22
 
23
  It's a successful fine-tune that follows the ChatML template!
24
 
 
36
 
37
  You can find the experiment on W&B at [this address](https://wandb.ai/dryanfurman/huggingface/runs/rlytsd0k?nw=nwuserdryanfurman).
38
 
 
39
  ## 💻 Usage
40
 
41
+ <details>
42
+
43
+ <summary>Setup</summary>
44
+
45
  ```python
46
+ !pip install -qU transformers accelerate bitsandbytes
47
 
48
+ from transformers import AutoTokenizer, BitsAndBytesConfig
49
  import transformers
50
  import torch
51
 
52
+ if torch.cuda.get_device_capability()[0] >= 8:
53
+ !pip install -qqq flash-attn
54
+ attn_implementation = "flash_attention_2"
55
+ torch_dtype = torch.bfloat16
56
+ else:
57
+ attn_implementation = "eager"
58
+ torch_dtype = torch.float16
59
+
60
+ bnb_config = BitsAndBytesConfig(
61
+ load_in_4bit=True,
62
+ bnb_4bit_quant_type="nf4",
63
+ bnb_4bit_compute_dtype=torch_dtype,
64
+ bnb_4bit_use_double_quant=True,
65
+ )
66
+
67
  model = "dfurman/Llama-3-8B-Orpo-v0.1"
 
68
 
69
  tokenizer = AutoTokenizer.from_pretrained(model)
 
70
  pipeline = transformers.pipeline(
71
  "text-generation",
72
  model=model,
73
+ model_kwargs={
74
+ "torch_dtype": torch_dtype,
75
+ "quantization_config": bnb_config,
76
+ "device_map": "auto",
77
+ "attn_implementation": attn_implementation,
78
+ }
79
  )
80
+ ```
81
+
82
+ </details>
83
+
84
+ ### Run
85
+
86
+ ```python
87
+ messages = [
88
+ {"role": "system", "content": "You are a helpful assistant."},
89
+ {"role": "user", "content": "What is a large language model?"},
90
+ ]
91
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
92
+ print("***Prompt:\n", prompt)
93
 
94
  outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
95
+ print("***Generation:\n", outputs[0]["generated_text"])
96
+ ```
97
+
98
+ ### Output
99
+
100
+ coming