|
--- |
|
language: |
|
- en |
|
- ja |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
license: apache-2.0 |
|
model_type: mamba |
|
--- |
|
|
|
# Kotomamba |
|
|
|
The kotomamba model represents a cutting-edge approach in natural language processing (NLP), leveraging the innovative State Space Model mamba architecture. |
|
The kotomamba model comes in two distinct versions. |
|
|
|
1. Bilingual Pre-training (Japanese and English): |
|
The first variant of the kotomamba model is pre-trained on a rich dataset(About 200B Token) comprising both Japanese and English texts. |
|
2. Continual Pre-training (Mainly Japanese): |
|
The second variant of the kotomamba model takes a different approach, focusing exclusively on Japanese-centric data for its continual pre-training phase. |
|
|
|
## Kotomamba Model Index |
|
|Model|kotomamba-hf| |
|
|---|---| |
|
|kotomamba-2.8B-v1.0| [Link](https://huggingface.co/kotoba-tech/kotomamba-2.8B-v1.0) | |
|
|kotomamba-2.8B-CL=v1.0| [Link](https://huggingface.co/kotoba-tech/kotomamba-2.8B-CL-v1.0) | |
|
|
|
|
|
![logo](./logo.webp) |
|
|
|
This repository provides large language models developed by [Kotoba Technologies](https://www.kotoba.tech/), Tohoku University [TohokuNLP group](https://www.nlp.ecei.tohoku.ac.jp/), and Tokyo Institute of Technology [Okazaki Lab](https://www.nlp.c.titech.ac.jp/index.en.html), [Yokota Lab](https://www.rio.gsic.titech.ac.jp/en/index.html). |
|
Read our [blog post](https://zenn.dev/kotoba_tech/articles/f15b2495d44c4f) or our technical paper (preprint coming soon) for more details! |
|
|
|
|
|
## Model Details |
|
|
|
* **Model type**: Please refer to [mamba technical paper](https://arxiv.org/abs/2312.00752) for details on the model architecture. |
|
* **Language(s)**: Japanese English |
|
* **Library**: [kotomamba](https://github.com/kotoba-tech/kotomamba) |
|
* **Tokenizer**: kotomamba-2.8B uses [llm-jp-tokenizer 100K](https://github.com/llm-jp/llm-jp-tokenizer) and kotomamba-2.8B-CL uses [GPT-NeoX Tokenizer](https://huggingface.co/EleutherAI/gpt-neox-20b). |
|
* **Contact**: |
|
|
|
## Base Model Performance |
|
|
|
### Japanese version |
|
|
|
|Model|Size|JCommonsenseQA|JEMHopQA|NIILC|JSQuAD| |
|
|---|---|---|---|---|---| |
|
| | |4-shot|4-shot|4-shot|4-shot| |
|
| [state-spaces/mamba-2.8b-slimpj](https://huggingface.co/state-spaces/mamba-2.8b-slimpj) | 2.8B |0.1796|0.2825|0.0998|0.3301| |
|
| kotomamba-2.8B | 2.8B |0.185|0.4532|0.3871|0.4685| |
|
| kotomamba-2.8B-CL | 2.8B |0.185|0.3758|0.2393|0.5929| |
|
|
|
|
|
## Usage |
|
|
|
First, install additional dependencies in [requirements.txt](./requirements.txt): |
|
|
|
```sh |
|
pip install -r requirements.txt |
|
``` |
|
|
|
### Use the base model |
|
|
|
`git clone https://github.com/kotoba-tech/kotomamba` and follow the README installation section. |
|
|
|
**WARNING**: huggingface transformers `AutoModelForCausalLM` **doesn't support** mamba model. So, please use `kotomamba/benchmarks/benchmark_generation_mamba_simple.py` |
|
|
|
You can find the inference sample script in `scripts/abci/inference/inference_sample.sh` |
|
|
|
## Training Datasets |
|
|
|
### Pre-Training & Continual Pre-Training |
|
The following datasets were used for training. |
|
|
|
- [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch) |
|
- Swallow Corpus |
|
- [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B) |
|
|
|
|
|
## Risks and Limitations |
|
|
|
The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations. |
|
|
|
## Acknowledgements |
|
|
|
We thank Albert Gu and Tri Dao for releasing the original mamba model and implementation on GitHub. |
|
|
|
Our project is supported by the [ABCI Grand Challenge](https://abci.ai/en/link/grandchallenge.html) of the National Institute of Advanced Industrial Science and Technology. |
|
|
|
## License |
|
|
|
Apache License Version 2.0, January 2004 |
|
|
|
## Authors |
|
|
|
Here are the team members: |
|
- From [Kotoba Technologies](https://www.kotoba.tech/) |
|
- [Noriyuki Kojima](https://twitter.com/noriyuki_kojima) |
|
- [Jungo Kasai](https://twitter.com/jungokasai) |
|
- [Hiroto Kurita](https://twitter.com/hiroto_kurita) |
|
- [Kazuki Fujii](https://twitter.com/okoge_kaz) |
|
- From [TohokuNLP group at Tohoku University](https://www.nlp.ecei.tohoku.ac.jp/) |
|
- [Keisuke Sakaguchi](https://twitter.com/KeisukeS_) |
|
- From Tokyo Institute of Technologies |
|
- From [Okazaki Laboratory](https://www.nlp.c.titech.ac.jp/index.en.html), the following members: |
|
- [Naoaki Okazaki](https://www.chokkan.org/index.ja.html) |
|
- [Sakae Mizuki](https://s-mizuki-nlp.github.io/) |
|
- [Hiroki Iida](https://meshidenn.github.io/) |
|
- [Mengsay Loem](https://loem-ms.github.io/) |
|
- [Shota Hirai](https://huggingface.co/Kotemo428) |
|
- [Kakeru Hattori](https://aya-se.vercel.app/) |
|
- [Masanari Ohi](https://twitter.com/stjohn2007) |
|
- From [YOKOTA Laboratory](https://www.rio.gsic.titech.ac.jp/en/index.html), the following members: |
|
- [Rio Yokota](https://twitter.com/rioyokota) |
|
- [Taishi Nakamura](https://twitter.com/Setuna7777_2) |