File size: 2,202 Bytes
57cf84a
 
 
 
 
b3d5fef
57cf84a
 
b3d5fef
 
57cf84a
 
ab895f1
e7132ef
735cb24
d65a0c1
a418818
 
 
ab895f1
98d5593
b3d5fef
 
 
d8c70a4
b3d5fef
 
 
 
 
ab895f1
d65a0c1
 
 
 
4e7a567
 
 
cd25aff
98d5593
d65a0c1
 
4e7a567
d65a0c1
 
735cb24
d65a0c1
 
 
735cb24
 
 
 
 
d65a0c1
4e7a567
d65a0c1
 
 
 
 
 
735cb24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f710983
735cb24
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
library_name: transformers
tags:
- trl
- sft
license: apache-2.0
datasets:
- Mike0307/alpaca-en-zhtw
language:
- zh
pipeline_tag: text-generation
---


## Download Model

The base-model [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) currently relies on 
the latest dev-version transformers and torch.<br>
Also, it needs *trust_remote_code=True* as an argument of the from_pretrained() function.
```
pip install git+https://github.com/huggingface/transformers accelerate
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
```

Additionally, LoRA model requires the peft package.
```
pip install peft
```

Now, let's start to download the model. 

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Mike0307/Phi-3-mini-4k-instruct-chinese-lora"
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="mps", # Change mps if not MacOS
    torch_dtype=torch.float32,  # try float16 for M1 chip
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
```

## Inference Example

```python
input_text = "<|user|>將這五種動物分成兩組。\n老虎、鯊魚、大象、鯨魚、袋鼠 <|end|>\n<|assistant|>"

inputs = tokenizer(
    input_text, 
    return_tensors="pt"
).to(torch.device("mps")) # Change mps if not MacOS

outputs = model.generate(
    **inputs, 
    temperature = 0.0,
    max_length = 500,
    do_sample = False
)

generated_text = tokenizer.decode(
    outputs[0], 
    skip_special_tokens=True
)
print(generated_text)
```


## Streaming Example
```python
from transformers import TextStreamer
streamer = TextStreamer(tokenizer)

input_text = "<|user|>將這五種動物分成兩組。\n老虎、鯊魚、大象、鯨魚、袋鼠 <|end|>\n<|assistant|>"

inputs = tokenizer(
    input_text, 
    return_tensors="pt"
).to(torch.device("mps")) # Change mps if not MacOS

outputs = model.generate(
    **inputs, 
    temperature = 0.0,
    do_sample = False,
    streamer=streamer,
    max_length=500,
)

generated_text = tokenizer.decode(
    outputs[0], 
    skip_special_tokens=True
)
```