Llama-2-7B finetuned in three stages:

1B tokens of CulturaX (75% Estonain, 25% English)
1M English->Estonian sentence-pairs from CCMatrix (500000), WikiMatrix (400000), Europarl (50000), and OpenSubtitles (50000) as Alpaca-style translation instructions
Alpaca-cleaned and Alpaca-est (both ~50000 instructions)

Alpaca-est is an instruction dataset generated for Estonian with gpt-3.5-turbo-0613, following Alpaca.

Downloads last month: 12

Safetensors

Model size

6.74B params

Tensor type

BF16

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.