|
--- |
|
license: apache-2.0 |
|
language: |
|
- ja |
|
--- |
|
# Leia-Swallow-7B |
|
|
|
LEIA is a training technique for autoregressive LLMs to effectively improve their performance in languages other than English by enhancing cross-lingual knowledge transfer from English to a target language. |
|
This model is constructed by applying LEIA to Swallow, a Japanese-English bilingual LLM based on LLaMA 2. |
|
The model achieves the enhanced performance on six Japanese question answering benchmarks as reported below. |
|
|
|
Please refer to our paper or blog post (in Japanese) for further technical details: |
|
|
|
- [LEIA: Facilitating Cross-Lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation](https://arxiv.org/abs/2402.11485) (arxiv.org) |
|
- [LEIA: 言語間転移学習でLLMを賢くする新しい方法](#) (zenn.dev) |
|
|
|
## Model List |
|
|
|
- [Leia-Swallow-7b](https://huggingface.co/leia-llm/Leia-Swallow-7b/) |
|
- [Leia-Swallow-13b](https://huggingface.co/leia-llm/Leia-Swallow-13b/) |
|
|
|
## Empirical Results |
|
|
|
The model is assessed using the following six question answering benchmarks: |
|
- X-CODAH |
|
- X-CSQA |
|
- JCommonsenseQA |
|
- NIILC |
|
- JEMHopQA |
|
- JAQKET v2 |
|
|
|
| Model | X-CODAH | X-CSQA | JCommonsenseQA | NIILC | JEMHopQA | JAQKET v2 | |
|
| ---- | ---- | ---- | ---- | ---- | ---- | ---- | |
|
| Swallow | 42.0 | 41.0 | 80.3 | 59.5 | 50.8 | 86.2 | |
|
| LEIA | **42.7** | **42.4** | **80.6** | **60.3** | **54.7** | **86.5** | |
|
|
|
For further details of this experiment, please refer to [our paper](https://arxiv.org/abs/2402.11485). |
|
|
|
## Contributors |
|
|
|
Ikuya Yamada (Studio Ousia, RIKEN) |
|
Ryokan Ri (LY Corporation, SB Intuitions) |
|
|