Mike0307
/

Phi-3-mini-4k-instruct-chinese-lora

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Phi-3-mini-4k-instruct-chinese-lora / README.md

Mike0307's picture

Update README.md

fc69a83 verified 4 months ago

|

history blame contribute delete

No virus

2.63 kB

	---
	library_name: transformers
	tags:
	- trl
	- sft
	license: apache-2.0
	datasets:
	- Mike0307/alpaca-en-zhtw
	language:
	- zh
	pipeline_tag: text-generation
	---


	## Download Model

	The base-model [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) currently relies on
	the latest dev-version transformers and torch.<br>
	Also, it needs trust_remote_code=True as an argument of the from_pretrained() function.
	```
	pip install git+https://github.com/huggingface/transformers accelerate
	pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
	```

	Additionally, LoRA model requires the peft package.
	```
	pip install peft
	```

	Now, let's start to download the model.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "Mike0307/Phi-3-mini-4k-instruct-chinese-lora"
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="mps", # Change mps if not MacOS
	torch_dtype=torch.float32, # try float16 for M1 chip
	trust_remote_code=True,
	attn_implementation="eager", # without flash_attn
	)
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	```

	## Inference Example

	```python
	# M2 pro takes about 3 seconds in this example.
	input_text = "<\|user\|>將這五種動物分成兩組。\n老虎、鯊魚、大象、鯨魚、袋鼠 <\|end\|>\n<\|assistant\|>"

	inputs = tokenizer(
	input_text,
	return_tensors="pt"
	).to(torch.device("mps")) # Change mps if not MacOS

	outputs = model.generate(
	**inputs,
	temperature = 0.0,
	max_length = 500,
	do_sample = False
	)

	generated_text = tokenizer.decode(
	outputs[0],
	skip_special_tokens=True
	)
	print(generated_text)
	```


	## Streaming Example
	```python
	from transformers import TextStreamer
	streamer = TextStreamer(tokenizer)

	input_text = "<\|user\|>將這五種動物分成兩組。\n老虎、鯊魚、大象、鯨魚、袋鼠 <\|end\|>\n<\|assistant\|>"

	inputs = tokenizer(
	input_text,
	return_tensors="pt"
	).to(torch.device("mps")) # Change mps if not MacOS

	outputs = model.generate(
	**inputs,
	temperature = 0.0,
	do_sample = False,
	streamer=streamer,
	max_length=500,
	)

	generated_text = tokenizer.decode(
	outputs[0],
	skip_special_tokens=True
	)
	```

	## Example of RAG with Langchain

	[This reference](https://huggingface.co/Mike0307/text2vec-base-chinese-rag#example-of-langchain-rag) shows how to customize langchain llm with this phi-3 lora model.

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6414866f1cbd604c9217c7d0/RrBoHJINfrSWtCNkePs7g.png)