--- license: apache-2.0 datasets: - irlab-udc/alpaca_data_galician language: - gl - en --- # Galician Fine-Tuned LLM Model This repository contains a large language model (LLM) fine-tuned using the LLaMA Factory library and the Finisterrae III supercomputer at CESGA. The base model used for fine-tuning was Meta's `LLaMA 3`. ## Model Description This LLM model has been specifically fine-tuned to understand and generate text in Galician. It was fine-tuned using a modified version of the [irlab-udc/alpaca_data_galician](https://huggingface.co/datasets/irlab-udc/alpaca_data_galician) dataset, enriched with synthetic data to enhance its text generation and comprehension capabilities in specific contexts. ### Technical Details - **Base Model**: Meta's LLaMA 3 - **Fine-Tuning Platform**: LLaMA Factory - **Infrastructure**: Finisterrae III, CESGA - **Dataset**: [irlab-udc/alpaca_data_galician](https://huggingface.co/datasets/irlab-udc/alpaca_data_galician) (with modifications) - **Fine-Tuning Objective**: To improve text comprehension and generation in Galician. ## How to Use the Model To use this model, follow the example code provided below. Ensure you have the necessary libraries installed (e.g., Hugging Face's `transformers`). ### Installation ```bash pip install transformers ### Example code ```bash from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "abrahammg/Llama3-8B-Galician-Chat" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) text = "Enter some text in Galician here." inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs) print(tokenizer.decode(outputs[0], skip_special_tokens=True))