kotomamba-2.8B-v1.0 / README.md
okoge's picture
Update README.md
399c741 verified
metadata
language:
  - en
  - ja
library_name: transformers
pipeline_tag: text-generation
license: apache-2.0
model_type: mamba

Kotomamba

The kotomamba model represents a cutting-edge approach in natural language processing (NLP), leveraging the innovative State Space Model mamba architecture. The kotomamba model comes in two distinct versions.

  1. Bilingual Pre-training (Japanese and English): The first variant of the kotomamba model is pre-trained on a rich dataset(About 200B Token) comprising both Japanese and English texts.
  2. Continual Pre-training (Mainly Japanese): The second variant of the kotomamba model takes a different approach, focusing exclusively on Japanese-centric data for its continual pre-training phase.

Kotomamba Model Index

Model kotomamba-hf
kotomamba-2.8B-v1.0 Link
kotomamba-2.8B-CL-v1.0 Link

logo

This repository provides large language models developed by Kotoba Technologies, Tohoku University TohokuNLP group, and Tokyo Institute of Technology Okazaki Lab, Yokota Lab. Read our blog post or our technical paper (preprint coming soon) for more details!

Model Details

Base Model Performance

Japanese version

Model Size JCommonsenseQA JEMHopQA NIILC JSQuAD
4-shot 4-shot 4-shot 4-shot
state-spaces/mamba-2.8b-slimpj 2.8B 0.1796 0.2825 0.0998 0.3301
kotomamba-2.8B 2.8B 0.185 0.4532 0.3871 0.4685
kotomamba-2.8B-CL 2.8B 0.185 0.3758 0.2393 0.5929

Usage

git clone https://github.com/kotoba-tech/kotomamba and follow the repository's README installation section.

WARNING: huggingface transformers AutoModelForCausalLM doesn't support mamba model. So, please use kotomamba/benchmarks/benchmark_generation_mamba_simple.py

You can find the inference sample script in scripts/abci/inference/inference_sample.sh

Training Datasets

Pre-Training & Continual Pre-Training

The following datasets were used for training.

Risks and Limitations

The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.

Acknowledgements

We thank Albert Gu and Tri Dao for releasing the original mamba model and implementation on GitHub.

Our project is supported by the ABCI Grand Challenge of the National Institute of Advanced Industrial Science and Technology.

License

Apache License Version 2.0, January 2004

Authors

Here are the team members: