metadata
license: apache-2.0
language:
- es
library_name: transformers
tags:
- falcon
- alpaca
- Transformers
- gpt
- PyTorch
- llm
- llm spanish
pipeline_tag: text-generation
datasets:
- bertin-project/alpaca-spanish
FALCON 7B Spanish Fine-tuned 8bit 🤗
Dataset
The dataset is a translation to Spanish of alpaca_data_cleaned.json (a clean version of the Alpaca dataset made at Stanford) using OpenAI's gpt-3.5-turbo model. This translation was made by bertin-project. It was translated using a full-sample prompt instead of per strings, which resulted in more coherent tuples of (instruction, input, output). Dataset link: here
Finetuning details
To fine-tune the FALCON-7B model we used the following code to run it on a distributed cluster on AWS. You are free to use such code as a fingerprint to finetune any model as you please, as it is easily customizable.