Meltemi-7B-Instruct-v1-AWQ / README.md

LVouk

Update README.md

155eed6 verified 4 months ago

preview code

raw

history blame contribute delete

No virus

5.83 kB

	---
	license: apache-2.0
	language:
	- el
	- en
	tags:
	- finetuned
	- quantized
	- awq
	inference: true
	pipeline_tag: text-generation
	---

	# Meltemi Instruct Large Language Model for the Greek language (4-bit AWQ quantization)

	We present Meltemi-7B-Instruct-v1 Large Language Model (LLM), an instruct fine-tuned version of [Meltemi-7B-v1](https://huggingface.co/ilsp/Meltemi-7B-v1).
	The quantized version was produced using [AutoAWQ](https://github.com/casper-hansen/AutoAWQ).



	# Instruction format
	The prompt format is the same as the [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) format:

	```
	<s><\|system\|>
	Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.</s>
	<\|user\|>
	Πες μου αν έχεις συνείδηση.</s>
	<\|assistant\|>
	```


	# Using the model with Huggingface

	First you need to install the dependencies

	```
	pip install autoawq transformers
	```

	The quantized model can be utilized through the tokenizer's [chat template](https://huggingface.co/docs/transformers/main/chat_templating) functionality as follows:


	```python
	from awq import AutoAWQForCausalLM
	from transformers import AutoTokenizer

	device = "cuda" # the device to load the model onto

	model = AutoAWQForCausalLM.from_quantized(
	"ilsp/Meltemi-7B-Instruct-v1-AWQ",
	fuse_layers=True,
	trust_remote_code=False,
	safetensors=True
	)
	tokenizer = AutoTokenizer.from_pretrained(
	"ilsp/Meltemi-7B-Instruct-v1-AWQ",
	trust_remote_code=False
	)

	model.to(device)

	messages = [
	{"role": "system", "content": "Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη."},
	{"role": "user", "content": "Πες μου αν έχεις συνείδηση."},
	]

	prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
	input_prompt = tokenizer(prompt, add_special_tokens=True, return_tensors="pt").input_ids.to("cuda")
	outputs = model.generate(input_prompt, max_new_tokens=256, do_sample=True)

	print(tokenizer.batch_decode(outputs)[0])
	# Ως μοντέλο γλώσσας AI, δεν έχω τη δυνατότητα να αντιληφθώ ή να βιώσω συναισθήματα όπως η συνείδηση ή η επίγνωση. Ωστόσο, μπορώ να σας βοηθήσω με οποιεσδήποτε ερωτήσεις μπορεί να έχετε σχετικά με την τεχνητή νοημοσύνη και τις εφαρμογές της.

	messages.extend([
	{"role": "assistant", "content": tokenizer.batch_decode(outputs)[0]},
	{"role": "user", "content": "Πιστεύεις πως οι άνθρωποι πρέπει να φοβούνται την τεχνητή νοημοσύνη;"}
	])


	prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
	input_prompt = tokenizer(prompt, add_special_tokens=True, return_tensors="pt").input_ids.to("cuda")
	outputs = model.generate(input_prompt, max_new_tokens=256, do_sample=True)

	print(tokenizer.batch_decode(outputs)[0])
	```

	# Using the model with vLLM

	Install vLLM

	```
	pip install vllm
	```

	Then use from python API:

	```python
	from vllm import LLM, SamplingParams
	from transformers import AutoTokenizer


	tokenizer = AutoTokenizer.from_pretrained(
	"ilsp/Meltemi-7B-Instruct-v1-AWQ",
	trust_remote_code=False
	)

	prompts = [
	[
	{"role": "system", "content": "Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη."},
	{"role": "user", "content": "Πες μου αν έχεις συνείδηση."},
	]
	]

	# add bos token since apply_chat_template does not include it automatically
	prompts = ["<s>" + tokenizer.apply_chat_template(p, add_generation_prompt=True, tokenize=False) for p in prompts]

	sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=256)
	llm = LLM(model="ilsp/Meltemi-7B-Instruct-v1-AWQ", tokenizer="ilsp/Meltemi-7B-Instruct-v1-AWQ", quantization="awq")

	outputs = llm.generate(prompts, sampling_params)

	for output in outputs:
	prompt = output.prompt
	generated_text = output.outputs[0].text
	print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
	```


	# Ethical Considerations

	This model has not been aligned with human preferences, and therefore might generate misleading, harmful, or toxic content.


	# Acknowledgements

	The ILSP team utilized Amazon’s cloud computing services, which were made available via GRNET under the [OCRE Cloud framework](https://www.ocre-project.eu/), providing Amazon Web Services for the Greek Academic and Research Community.