taidopurason commited on
Commit
3902038
1 Parent(s): 1dae7b0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -7,10 +7,9 @@ pipeline_tag: conversational
7
 
8
  # LLammas 🐑
9
 
10
- Llama-2-7B finetuned in three stages:
11
  1. 5B tokens of CulturaX (75% Estonain, 25% English)
12
- 2. 1M English->Estonian sentence-pairs from CCMatrix (500000), WikiMatrix (400000), Europarl (50000), and OpenSubtitles (50000) as Alpaca-style translation instructions, 25% of the examples are given in opposite direction (Estonian->Englih)
13
- 3. Alpaca-cleaned, Alpaca-est, OASST1 top-1 English conversations, CoT and FLAN-V2 following open-instruct (both 10,000), WMT18 English-Estonian translation development data (as documents), general MTee validation English-Estonian held-out data
14
 
15
  Alpaca-est is an instruction dataset generated for Estonian with *gpt-3.5-turbo-0613*, following Alpaca.
16
 
 
7
 
8
  # LLammas 🐑
9
 
10
+ Llama-2-7B finetuned in two stages:
11
  1. 5B tokens of CulturaX (75% Estonain, 25% English)
12
+ 2. Alpaca-cleaned, Alpaca-est, OASST1 top-1 English conversations, CoT and FLAN-V2 following open-instruct (both 10,000), WMT18 English-Estonian translation development data (as documents), general MTee validation English-Estonian held-out data
 
13
 
14
  Alpaca-est is an instruction dataset generated for Estonian with *gpt-3.5-turbo-0613*, following Alpaca.
15