File size: 2,128 Bytes
2d5eb5d
 
 
 
 
 
 
 
 
feb2b06
886fdd8
 
 
 
29fe3b1
886fdd8
 
 
 
 
feb2b06
 
 
10ea185
 
 
 
feb2b06
d820d12
feb2b06
10ea185
feb2b06
 
10ea185
feb2b06
10ea185
feb2b06
 
10ea185
feb2b06
 
 
f56ac71
d820d12
feb2b06
f56ac71
 
feb2b06
f56ac71
 
 
feb2b06
 
f56ac71
feb2b06
 
 
f56ac71
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
license: apache-2.0
language:
- en
library_name: transformers
tags:
- 'quantization '
- LLM
- Dolly
---

Requirements:

<pre>
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git 
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
</pre>

Import this model using:

<pre>
<code>
<span style="color: #0000FF;">import</span> torch
<span style="color: #0000FF;">from</span> peft <span style="color: #0000FF;">import</span> PeftModel, PeftConfig
<span style="color: #0000FF;">from</span> transformers <span style="color: #0000FF;">import</span> AutoModelForCausalLM, AutoTokenizer

peft_model_id = "<span style="color: #A31515;>"AhmedBou/databricks-dolly-v2-3b_on_NCSS"</span>
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=<span style="color: #0000FF;">True</span>, load_in_8bit=<span style="color: #0000FF;">True</span>, device_map=<span style="color: #0000FF;">'auto'</span>)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

<span style="color: #808080;"># Load the Lora model</span>
model = PeftModel.from_pretrained(model, peft_model_id)
</code>
</pre>


Inference using:

<pre>
<code>
<span style="color: #0000FF;">batch</span> = tokenizer("Multiple Regression for Appraisal --&gt;: ", return_tensors=<span style="color: #A31515;">'pt'</span>)

<span style="color: #0000FF;">with</span> torch.cuda.amp.autocast():
    output_tokens = model.generate(**batch, max_new_tokens=<span style="color: #098658;">50</span>)

<span style="color: #0000FF;">print</span>('
', tokenizer.decode(output_tokens[<span style="color: #098658;">0</span>], skip_special_tokens=<span style="color: #0000FF;">True</span>))
</code>
</pre>


Output:

<pre>
<code>
“Multiple Regression for Appraisal” --&gt;: Multiple Regression for Appraisal (MRA) --&gt;: Multiple Regression for Appraisal (MRA) (with Covariates) --&gt;: Multiple Regression for Appraisal (MRA) (with Covariates)
</code>
</pre>