ShirinYamani commited on
Commit
069bd91
1 Parent(s): b3fc640

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -20
README.md CHANGED
@@ -1,34 +1,95 @@
1
  ---
2
  license: llama3
3
- library_name: peft
4
  tags:
5
- - generated_from_trainer
 
 
 
6
  base_model: meta-llama/Meta-Llama-3-8B
7
  model-index:
8
  - name: llama-3-8B-fine-tuned-dora
9
  results: []
 
 
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
15
  # llama-3-8B-fine-tuned-dora
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on an unknown dataset.
18
-
19
- ## Model description
20
-
21
- More information needed
22
-
23
- ## Intended uses & limitations
24
-
25
- More information needed
26
-
27
- ## Training and evaluation data
28
-
29
- More information needed
30
-
31
- ## Training procedure
32
 
33
  ### Training hyperparameters
34
 
@@ -45,9 +106,6 @@ The following hyperparameters were used during training:
45
  - training_steps: 10
46
  - mixed_precision_training: Native AMP
47
 
48
- ### Training results
49
-
50
-
51
 
52
  ### Framework versions
53
 
 
1
  ---
2
  license: llama3
3
+ library_name: transformers
4
  tags:
5
+ - text-generation-inference
6
+ - Dora
7
+ - Qdora
8
+ - peft
9
  base_model: meta-llama/Meta-Llama-3-8B
10
  model-index:
11
  - name: llama-3-8B-fine-tuned-dora
12
  results: []
13
+ datasets:
14
+ - timdettmers/openassistant-guanaco
15
  ---
16
 
17
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
  should probably proofread and complete it, then remove this comment. -->
19
 
20
  # llama-3-8B-fine-tuned-dora
21
+ <img src="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e32b89a-9146-4004-81b4-18c20a913df0_1920x1080.jpeg" alt="im" width="700" />
22
+
23
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on [openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco) dataset.
24
+ For LoraConfig we set the `use_dora=True` for the Dora decomposition and comparison with Lora.
25
+
26
+ ## Inference
27
+ ```python
28
+ import os
29
+ from os.path import exists, join, isdir
30
+ import torch
31
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, GenerationConfig
32
+ from peft import PeftModel
33
+ from peft.tuners.lora import LoraLayer
34
+ import accelerate
35
+
36
+ # Update variables!
37
+ max_new_tokens = 100
38
+ top_p = 0.9
39
+ temperature=0.7
40
+ user_question = "What is central limit theorem?"
41
+
42
+ # Base model
43
+ model_name_or_path = 'meta-llama/Meta-Llama-3-8B' # Change it to 'YOUR_BASE_MODEL'
44
+ adapter_path = 'ShirinYamani/llama-3-8B-fine-tuned-dora' # Change it to 'YOUR_ADAPTER_PATH'
45
+
46
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
47
+ # if you wanna use LLaMA HF then fix the early conversion issues.
48
+ tokenizer.bos_token_id = 1
49
+
50
+ # Load the model (use bf16 for faster inference)
51
+ model = AutoModelForCausalLM.from_pretrained(
52
+ model_name_or_path,
53
+ torch_dtype=torch.bfloat16,
54
+ device_map={"": 0},
55
+ quantization_config=BitsAndBytesConfig(
56
+ load_in_4bit=True,
57
+ bnb_4bit_compute_dtype=torch.bfloat16,
58
+ bnb_4bit_use_double_quant=True,
59
+ bnb_4bit_quant_type='nf4',
60
+ )
61
+ )
62
+
63
+ model = PeftModel.from_pretrained(model, adapter_path)
64
+ model.eval()
65
+
66
+ prompt = (
67
+ "A chat between a curious human and an artificial intelligence assistant. "
68
+ "The assistant gives helpful, detailed, and polite answers to the user's questions. "
69
+ "### Human: {user_question}"
70
+ "### Assistant: "
71
+ )
72
+
73
+ def generate(model, user_question, max_new_tokens=max_new_tokens, top_p=top_p, temperature=temperature):
74
+ inputs = tokenizer(prompt.format(user_question=user_question), return_tensors="pt").to('cuda')
75
+
76
+ outputs = model.generate(
77
+ **inputs,
78
+ generation_config=GenerationConfig(
79
+ do_sample=True,
80
+ max_new_tokens=max_new_tokens,
81
+ top_p=top_p,
82
+ temperature=temperature,
83
+ )
84
+ )
85
+
86
+ text = tokenizer.decode(outputs[0], skip_special_tokens=True)
87
+ #print(text)
88
+ return text
89
+
90
+ generate(model, user_question)
91
+ ```
92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
  ### Training hyperparameters
95
 
 
106
  - training_steps: 10
107
  - mixed_precision_training: Native AMP
108
 
 
 
 
109
 
110
  ### Framework versions
111