ShirinYamani commited on
Commit
13fb0e3
1 Parent(s): 3cb3051

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -4
README.md CHANGED
@@ -32,9 +32,77 @@ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggin
32
 
33
  Please refer to [this notebook](https://github.com/shirinyamani/mistral7b-lora-finetuning/blob/main/misral_7B_updated.ipynb) for a complete demo including notes regarding cloud deployment
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ### Training hyperparameters
36
 
37
- The following hyperparameters were used during training:
38
  - learning_rate: 0.0002
39
  - train_batch_size: 1
40
  - eval_batch_size: 8
@@ -46,12 +114,13 @@ The following hyperparameters were used during training:
46
  - lr_scheduler_warmup_steps: 2
47
  - training_steps: 10
48
  - mixed_precision_training: Native AMP
49
-
50
 
51
  ### Framework versions
52
-
53
  - PEFT 0.11.2.dev0
54
  - Transformers 4.42.0.dev0
55
  - Pytorch 2.3.0+cu121
56
  - Datasets 2.19.2
57
- - Tokenizers 0.19.1
 
 
32
 
33
  Please refer to [this notebook](https://github.com/shirinyamani/mistral7b-lora-finetuning/blob/main/misral_7B_updated.ipynb) for a complete demo including notes regarding cloud deployment
34
 
35
+ ## Inference
36
+
37
+ ```python
38
+ import os
39
+ from os.path import exists, join, isdir
40
+ import torch
41
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, GenerationConfig
42
+ from peft import PeftModel
43
+ from peft.tuners.lora import LoraLayer
44
+
45
+ # Update variables!
46
+ max_new_tokens = 100
47
+ top_p = 0.9
48
+ temperature=0.7
49
+ user_question = "What is central limit theorem?"
50
+
51
+ # Base model
52
+ model_name_or_path = 'mistralai/Mistral-7B-v0.1' # Change it to 'YOUR_BASE_MODEL'
53
+ adapter_path = 'ShirinYamani/mistral7b-fine-tuned-qlora' # Change it to 'YOUR_ADAPTER_PATH'
54
+
55
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
56
+ # if you wanna use LLaMA HF then fix the early conversion issues.
57
+ tokenizer.bos_token_id = 1
58
+
59
+ # Load the model (use bf16 for faster inference)
60
+ model = AutoModelForCausalLM.from_pretrained(
61
+ model_name_or_path,
62
+ torch_dtype=torch.bfloat16,
63
+ device_map={"": 0},
64
+ # Qlora -- 4-bit config
65
+ quantization_config=BitsAndBytesConfig(
66
+ load_in_4bit=True,
67
+ bnb_4bit_compute_dtype=torch.bfloat16,
68
+ bnb_4bit_use_double_quant=True,
69
+ bnb_4bit_quant_type='nf4',
70
+ )
71
+ )
72
+
73
+ model = PeftModel.from_pretrained(model, adapter_path)
74
+ model.eval()
75
+
76
+ prompt = (
77
+ "A chat between a curious human and an artificial intelligence assistant. "
78
+ "The assistant gives helpful, detailed, and polite answers to the user's questions. "
79
+ "### Human: {user_question}"
80
+ "### Assistant: "
81
+ )
82
+
83
+ def generate(model, user_question, max_new_tokens=max_new_tokens, top_p=top_p, temperature=temperature):
84
+ inputs = tokenizer(prompt.format(user_question=user_question), return_tensors="pt").to('cuda')
85
+
86
+ outputs = model.generate(
87
+ **inputs,
88
+ generation_config=GenerationConfig(
89
+ do_sample=True,
90
+ max_new_tokens=max_new_tokens,
91
+ top_p=top_p,
92
+ temperature=temperature,
93
+ )
94
+ )
95
+
96
+ text = tokenizer.decode(outputs[0], skip_special_tokens=True)
97
+ print(text)
98
+ return text
99
+
100
+ generate(model, user_question)
101
+ ```
102
+
103
  ### Training hyperparameters
104
 
105
+ ```python
106
  - learning_rate: 0.0002
107
  - train_batch_size: 1
108
  - eval_batch_size: 8
 
114
  - lr_scheduler_warmup_steps: 2
115
  - training_steps: 10
116
  - mixed_precision_training: Native AMP
117
+ ```
118
 
119
  ### Framework versions
120
+ ```python
121
  - PEFT 0.11.2.dev0
122
  - Transformers 4.42.0.dev0
123
  - Pytorch 2.3.0+cu121
124
  - Datasets 2.19.2
125
+ - Tokenizers 0.19.1
126
+ ```