File size: 2,626 Bytes
57cf84a
 
 
 
 
b3d5fef
57cf84a
 
b3d5fef
 
57cf84a
 
ab895f1
e7132ef
735cb24
d65a0c1
a418818
 
 
ab895f1
98d5593
b3d5fef
 
 
d8c70a4
b3d5fef
 
 
 
 
ab895f1
d65a0c1
 
 
 
4e7a567
 
 
cd25aff
98d5593
d65a0c1
7a3db86
d65a0c1
4e7a567
d65a0c1
 
735cb24
d65a0c1
 
b57a229
d65a0c1
735cb24
 
 
 
 
d65a0c1
4e7a567
d65a0c1
 
 
 
 
 
735cb24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f710983
735cb24
 
 
 
 
 
 
fc69a83
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
library_name: transformers
tags:
- trl
- sft
license: apache-2.0
datasets:
- Mike0307/alpaca-en-zhtw
language:
- zh
pipeline_tag: text-generation
---


## Download Model

The base-model [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) currently relies on 
the latest dev-version transformers and torch.<br>
Also, it needs *trust_remote_code=True* as an argument of the from_pretrained() function.
```
pip install git+https://github.com/huggingface/transformers accelerate
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
```

Additionally, LoRA model requires the peft package.
```
pip install peft
```

Now, let's start to download the model. 

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Mike0307/Phi-3-mini-4k-instruct-chinese-lora"
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="mps", # Change mps if not MacOS
    torch_dtype=torch.float32,  # try float16 for M1 chip
    trust_remote_code=True,
    attn_implementation="eager", # without flash_attn
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
```

## Inference Example

```python
# M2 pro takes about 3 seconds in this example.
input_text = "<|user|>將這五種動物分成兩組。\n老虎、鯊魚、大象、鯨魚、袋鼠 <|end|>\n<|assistant|>"

inputs = tokenizer(
    input_text, 
    return_tensors="pt"
).to(torch.device("mps")) # Change mps if not MacOS

outputs = model.generate(
    **inputs, 
    temperature = 0.0,
    max_length = 500,
    do_sample = False
)

generated_text = tokenizer.decode(
    outputs[0], 
    skip_special_tokens=True
)
print(generated_text)
```


## Streaming Example
```python
from transformers import TextStreamer
streamer = TextStreamer(tokenizer)

input_text = "<|user|>將這五種動物分成兩組。\n老虎、鯊魚、大象、鯨魚、袋鼠 <|end|>\n<|assistant|>"

inputs = tokenizer(
    input_text, 
    return_tensors="pt"
).to(torch.device("mps")) # Change mps if not MacOS

outputs = model.generate(
    **inputs, 
    temperature = 0.0,
    do_sample = False,
    streamer=streamer,
    max_length=500,
)

generated_text = tokenizer.decode(
    outputs[0], 
    skip_special_tokens=True
)
```

## Example of RAG with Langchain

[This reference](https://huggingface.co/Mike0307/text2vec-base-chinese-rag#example-of-langchain-rag) shows how to customize langchain llm with this phi-3 lora model.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6414866f1cbd604c9217c7d0/RrBoHJINfrSWtCNkePs7g.png)