File size: 717 Bytes
94a2c5e
 
 
 
 
d4015b3
 
94a2c5e
e21364d
 
718e479
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
---
language:
- et
- en
pipeline_tag: text-generation
base_model:
- meta-llama/Llama-2-7b-hf
---
# LLammas-base 🐑

Llama-2-7B with continued pre-training of 5B tokens of CulturaX (75% Estonian, 25% English documents).

This model is also instruction-tuned resulting in [Llammas](https://huggingface.co/tartuNLP/Llammas).

More details in our [paper](https://arxiv.org/abs/2404.04042). 

### Citation
```
@misc{kuulmets2024teaching,
      title={Teaching Llama a New Language Through Cross-Lingual Knowledge Transfer}, 
      author={Hele-Andra Kuulmets and Taido Purason and Agnes Luhtaru and Mark Fishel},
      year={2024},
      eprint={2404.04042},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```