taidopurason commited on
Commit
3a83d99
1 Parent(s): 46f5c34

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - et
4
+ ---
5
+
6
+ Llama-2-7B finetuned in three stages:
7
+ 1. 1B tokens of CulturaX (75% Estonain, 25% English)
8
+ 2. 1M English->Estonian sentence-pairs from CCMatrix (500000), WikiMatrix (400000), Europarl (50000), and OpenSubtitles (50000) as Alpaca-style translation instructions
9
+ 3. Alpaca-cleaned and Alpaca-est (both ~50000 instructions)
10
+
11
+ Alpaca-est is an instruction dataset generated for Estonian with *gpt-3.5-turbo-0613*, following Alpaca.