File size: 507 Bytes
3a83d99
 
 
418d0dc
 
3a83d99
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
---
language:
- et
widget:
- text: "Mida sa tead Juhan Liivi kohta? Vastus:"
---

Llama-2-7B finetuned in three stages:
1. 1B tokens of CulturaX (75% Estonain, 25% English)
2. 1M English->Estonian sentence-pairs from CCMatrix (500000), WikiMatrix (400000), Europarl (50000), and OpenSubtitles (50000) as Alpaca-style translation instructions
3. Alpaca-cleaned and Alpaca-est (both ~50000 instructions)

Alpaca-est is an instruction dataset generated for Estonian with *gpt-3.5-turbo-0613*, following Alpaca.