Model Card for Loquace-7B-Mistral (Versione in Italiano tradotta da Loquace)

🇮🇹 Loquace-7B-Mistral v0.1 🇮🇹

Loquace is an Italian speaking, instruction finetuned, Large Language model. 🇮🇹

Loquace-7B-Mistral's peculiar features:

Is pretty good a following istructions in Italian.
Responds well to prompt-engineering.
Works well in a RAG (Retrival Augmented Generation) setup.
It has been trained on a relatively raw dataset Loquace-102K using QLoRa and Mistral-7B-Instruct as base.
Training took only 4 hours on a 3090, costing a little more than 1 euro! On Genesis Cloud GPU.
It is Truly Open Source: Model, Dataset and Code to replicate the results are completely released.
Created in a garage in the south of Italy.

The Loquace Italian LLM models are created with the goal of democratizing AI and LLM in the Italian Landscape.

No more need for expensive GPU, large funding, Big Corporation or Ivory Tower Institution, just download the code and train on your dataset on your own PC (or a cheap and reliable cloud provider like Genesis Cloud )

Fine-tuning Instructions:

The related code can be found at: https://github.com/cosimoiaia/Loquace

GGUF Version for CPU Inference:

8bit quantized Version of Loquace can be found here

Here is an incomplate list of clients and libraries that are known to support GGUF (thanks to TheBloke for this list and his awesome work) ):

llama.cpp. The source project for GGUF. Offers a CLI and a server option.
text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration.
LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection.
Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration.
ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use.

Previous releases of the Loquace family:

The Loquace family started in the beginning of 2023 to prove it was possible to fine-tune a Large Language model in a different language, you can find the other members of the family here:

https://huggingface.co/cosimoiaia/Loquace-70m - Based on pythia-70m
https://huggingface.co/cosimoiaia/Loquace-410m - Based on pythia-410m
https://huggingface.co/cosimoiaia/Loquace-7B - Based on Falcon-7B
https://huggingface.co/cosimoiaia/Loquace-12B - Based on pythia-12B
https://huggingface.co/cosimoiaia/Loquace-20B - Based on gpt-neox-20B

These model, although not producing good results, are proof that cheap finetuning is actually possible in a short amount of time, with a relaively raw dataset and extremely low resources.

Inference:

from transformers import MistralForCausalLM, AutoTokenizer


def generate_prompt(instruction):    
   prompt = f"""### Instruction: {instruction}
   
### Response:
"""
   return prompt

model_name = "cosimoiaia/Loquace-7B-Mistral"

model = MistralForCausalLM.from_pretrained(
   model_name,
   device_map="auto",
   torch_dtype=torch.bfloat16                
)

model.config.use_cache = True


tokenizer = AutoTokenizer.from_pretrained(model_name, add_eos_token=False)

prompt = generate_prompt("Chi era Dante Alighieri?")
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, do_sample = True, num_beams = 2, top_k=50, top_p= 0.95, max_new_tokens=2046, early_stopping = True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True).split("Response:")[1].strip())

Examples:

> Pretendi di essere Dante Alighieri e raccontami la tua vita.
 Sono Dante Alighieri, un poeta e scrittore italiano del XIV secolo. Nacqui a Firenze nel 1265 e ho trascorso gran parte della mia vita viaggiando in Europa e scrivendo poesie. Ho anche scritto la Divina Commedia, una delle opere letterarie più famose di tutti i tempi.

> Quanto fa 7 x 14 - 98 ?
 7 x 14 = 98, quindi la risposta è 0.

> La radice quadrata di 144 ?
 La radice quadrata di 144 è 12.

> Inventa una ricetta pugliese con la polenta.
 La ricetta è chiamata "Polenta al burro di arachidi e formaggio". Per prepararla, inizia bollendo la polenta secondo le istruzioni sulla confezione. Mentre la polenta si cucina, mescola insieme 1/2 tazza di burro di arachidi, 1/4 di tazza di formaggio grattugiato e 1/4 di tazza di parmigiano grattugiato. Una volta che la polenta è cotta, versala su una teglia da forno e distribuire il composto di burro di arachidi e formaggio sopra la polenta. Metti in forno a 350 gradi per 15 minuti o fino a quando la parte superiore è dorata. Servi caldo con un'insalata di pomodoro e basilico fresco.

Limitations

Loquace-7B may not handle complex or nuanced queries well and may struggle with ambiguous or poorly formatted inputs.
The model may generate responses that are factually incorrect or nonsensical. It should be used with caution, and outputs should be carefully verified.

Model Author:

Cosimo Iaia cosimo.iaia@gmail.com

cosimoiaia
/

Loquace-7B-Mistral