mii-llm
/

maestrale-chat-v0.3-alpha-sft

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

maestrale-chat-v0.3-alpha-sft / README.md

efederici's picture

Update README.md

b19f529 verified 9 months ago

|

2.8 kB

	---
	tags:
	- sft
	- it
	- mistral
	- chatml
	model-index:
	- name: maestrale-chat-v0.3-alpha
	results: []
	license: cc-by-nc-4.0
	language:
	- it
	prompt_template: >-
	<\|im_start\|>system {system_message}<\|im_end\|> <\|im_start\|>user
	{prompt}<\|im_end\|> <\|im_start\|>assistant
	---

	<div style="width: auto; margin-left: auto; margin-right: auto">
	<img src="https://i.imgur.com/3XRfTOq.jpg" alt="Mii-LLM" style="width: 100%; min-width: 400px; display: block; margin: auto;">
	</div>
	<div style="display: flex; justify-content: space-between; width: 100%;">
	<div style="display: flex; flex-direction: column; align-items: flex-end;">
	<p style="margin-top: 0.5em; margin-bottom: 0em;"><a href="https://buy.stripe.com/8wM00Sf3vb3H3pmfYY">Want to contribute? Please donate! This will let us work on better datasets and models!</a></p>
	</div>
	</div>
	<hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
	<!-- header end -->

	# Maestrale chat alpha ༄

	By @efederici and @mferraretto

	## Model description

	- Language Model: Mistral-7b for the Italian language, continued pre-training for Italian on a curated large-scale high-quality corpus.
	- Fine-Tuning: SFT performed on convs/instructions for two epochs.

	v0.3
	- Function calling
	- Reduced default system prompt to avoid wasting tokens (pre-alignment)

	This model uses ChatML prompt format:
	```
	<\|im_start\|>system
	Sei un assistente utile.<\|im_end\|>
	<\|im_start\|>user
	{prompt}<\|im_end\|>
	<\|im_start\|>assistant
	```

	## Usage:
	```python
	from transformers import (
	AutoTokenizer,
	AutoModelForCausalLM,
	GenerationConfig,
	TextStreamer
	)
	import torch

	tokenizer = AutoTokenizer.from_pretrained("mii-llm/maestrale-chat-v0.3-alpha")
	model = AutoModelForCausalLM.from_pretrained("mii-llm/maestrale-chat-v0.3-alpha", load_in_8bit=True, device_map="auto")

	gen = GenerationConfig(
	do_sample=True,
	temperature=0.7,
	repetition_penalty=1.2,
	top_k=50,
	top_p=0.95,
	max_new_tokens=500,
	pad_token_id=tokenizer.eos_token_id,
	eos_token_id=tokenizer.convert_tokens_to_ids("<\|im_end\|>")
	)

	messages = [
	{"role": "system", "content": "Sei un assistente utile."},
	{"role": "user", "content": "{prompt}"}
	]

	with torch.no_grad(), torch.backends.cuda.sdp_kernel(
	enable_flash=True,
	enable_math=False,
	enable_mem_efficient=False
	):
	temp = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(temp, return_tensors="pt").to("cuda")

	streamer = TextStreamer(tokenizer, skip_prompt=True)

	_ = model.generate(
	**inputs,
	streamer=streamer,
	generation_config=gen
	)
	```

	## Intended uses & limitations

	It's an alpha version, it's not `aligned`. It's a first test. We are working on alignment data and evals.