AdrianBZG's picture
Update README.md
95837d3
|
raw
history blame
1.06 kB
metadata
license: apache-2.0
language:
  - es
library_name: transformers
tags:
  - falcon
  - alpaca
  - Transformers
  - gpt
  - PyTorch
  - llm
  - llm spanish
pipeline_tag: text-generation
datasets:
  - bertin-project/alpaca-spanish

FALCON 7B Spanish Fine-tuned 8bit 🤗

Dataset

The dataset is a translation to Spanish of alpaca_data_cleaned.json (a clean version of the Alpaca dataset made at Stanford) using OpenAI's gpt-3.5-turbo model. This translation was made by bertin-project. It was translated using a full-sample prompt instead of per strings, which resulted in more coherent tuples of (instruction, input, output). Dataset link: here

Finetuning details

To fine-tune the FALCON-7B model we used the following code to run it on a distributed cluster on AWS. You are free to use such code as a fingerprint to finetune any model as you please, as it is easily customizable.