Draig-Fach-v0.1

This is a proof of concept model*

Model Details

Model Description

Draig-Fach-v0.1 is an instruction fine-tuned small language model based on the Mistral-7b architecture, specifically developed to understand and generate the Welsh language. This model represents an effort to support and preserve the Welsh language, leveraging AI and machine learning technologies. It has been trained on a bespoke dataset compiled from a variety of sources, including literature, websites, and conversational transcripts in Welsh.

Developed by: Eryrilabs.com
License: cc-by-nc-4.0
Finetuned from model: mistralai/Mistral-7B-Instruct-v0.2

How to use

You can use this model directly with a Hugging Face pipeline:


from transformers import pipeline, Conversation
import torch

base_model_name = "EryriLabs/Draig-Fach-v0.1"
chatbot = pipeline("conversational", model=base_model_name, torch_dtype=torch.float16, device_map="auto")
conversation = Conversation("Sut wyt ti?")
conversation = chatbot(conversation)
print(conversation.messages[-1]["content"])

Uses

Draig-Fach-v0.1 is intended for:

Natural language understanding and generation in Welsh
Supporting developers and researchers interested in the Welsh language
Serving as a tool for education and language preservation

Bias, Risks, and Limitations

As a proof of concept, Draig-Fach-v0.1 has several limitations:

The model's understanding and generation capabilities in Welsh are basic and may not accurately reflect complex nuances.
Performance may vary across different types of Welsh text, especially with colloquialisms or regional dialects.
Some Welsh sentences might not make complete sense and the model does hallucinate at times.

Training Details

Training Data

The small set of training data for Draig-Fach-v0.1 was sourced from a variety of Welsh language materials, including but not limited to:

Published literature
Online articles
Conversational transcripts

Training

The model was fine-tuned on a Mistral-7b base, utilizing a custom dataset specifically curated for this project. Fine-tuning was conducted with an emphasis on understanding and generating conversational Welsh.

About EryriLabs

EryriLabs® is a dynamic tech startup located in the picturesque heart of Snowdonia, also known as Eryri in Welsh. At EryriLabs, our specialisation lies in creating tailor-made LLM models that cater to the unique requirements of our clients

Let us know if you use our model. Also, if you need any help or more information, feel free to contact us at queries@eryrilabs.com