llammas-prelim / README.md
mphi's picture
Update README.md
418d0dc
metadata
language:
  - et
widget:
  - text: 'Mida sa tead Juhan Liivi kohta? Vastus:'

Llama-2-7B finetuned in three stages:

  1. 1B tokens of CulturaX (75% Estonain, 25% English)
  2. 1M English->Estonian sentence-pairs from CCMatrix (500000), WikiMatrix (400000), Europarl (50000), and OpenSubtitles (50000) as Alpaca-style translation instructions
  3. Alpaca-cleaned and Alpaca-est (both ~50000 instructions)

Alpaca-est is an instruction dataset generated for Estonian with gpt-3.5-turbo-0613, following Alpaca.