okoge commited on
Commit
d8948b2
1 Parent(s): bff0b42

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -0
README.md ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - ja
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ license: apache-2.0
8
+ model_type: mamba
9
+ ---
10
+
11
+ # Kotomamba
12
+
13
+ The kotomamba model represents a cutting-edge approach in natural language processing (NLP), leveraging the innovative State Space Model mamba architecture.
14
+ The kotomamba model comes in two distinct versions.
15
+
16
+ 1. Bilingual Pre-training (Japanese and English):
17
+ The first variant of the kotomamba model is pre-trained on a rich dataset(About 200B Token) comprising both Japanese and English texts.
18
+ 2. Continual Pre-training (Mainly Japanese):
19
+ The second variant of the kotomamba model takes a different approach, focusing exclusively on Japanese-centric data for its continual pre-training phase.
20
+
21
+ ## Kotomamba Model Index
22
+ |Model|kotomamba-hf|
23
+ |---|---|
24
+ |kotomamba-2.8B-v1.0| [Link](https://huggingface.co/kotoba-tech/kotomamba-2.8B-v1.0) |
25
+ |kotomamba-2.8B-CL=v1.0| [Link](https://huggingface.co/kotoba-tech/kotomamba-2.8B-CL-v1.0) |
26
+
27
+
28
+ ![logo](./logo.webp)
29
+
30
+ This repository provides large language models developed by [Kotoba Technologies](https://www.kotoba.tech/), Tohoku University [TohokuNLP group](https://www.nlp.ecei.tohoku.ac.jp/), and Tokyo Institute of Technology [Okazaki Lab](https://www.nlp.c.titech.ac.jp/index.en.html), [Yokota Lab](https://www.rio.gsic.titech.ac.jp/en/index.html).
31
+ Read our [blog post](https://zenn.dev/kotoba_tech/articles/f15b2495d44c4f) or our technical paper (preprint coming soon) for more details!
32
+
33
+
34
+ ## Model Details
35
+
36
+ * **Model type**: Please refer to [mamba technical paper](https://arxiv.org/abs/2312.00752) for details on the model architecture.
37
+ * **Language(s)**: Japanese English
38
+ * **Library**: [kotomamba](https://github.com/kotoba-tech/kotomamba)
39
+ * **Tokenizer**: kotomamba-2.8B uses [llm-jp-tokenizer 100K](https://github.com/llm-jp/llm-jp-tokenizer) and kotomamba-2.8B-CL uses [GPT-NeoX Tokenizer](https://huggingface.co/EleutherAI/gpt-neox-20b).
40
+ * **Contact**:
41
+
42
+ ## Base Model Performance
43
+
44
+ ### Japanese version
45
+
46
+ |Model|Size|JCommonsenseQA|JEMHopQA|NIILC|JSQuAD|
47
+ |---|---|---|---|---|---|
48
+ | | |4-shot|4-shot|4-shot|4-shot|
49
+ | [state-spaces/mamba-2.8b-slimpj](https://huggingface.co/state-spaces/mamba-2.8b-slimpj) | 2.8B |0.1796|0.2825|0.0998|0.3301|
50
+ | kotomamba-2.8B | 2.8B |0.185|0.4532|0.3871|0.4685|
51
+ | kotomamba-2.8B-CL | 2.8B |0.185|0.3758|0.2393|0.5929|
52
+
53
+
54
+ ## Usage
55
+
56
+ First, install additional dependencies in [requirements.txt](./requirements.txt):
57
+
58
+ ```sh
59
+ pip install -r requirements.txt
60
+ ```
61
+
62
+ ### Use the base model
63
+
64
+ `git clone https://github.com/kotoba-tech/kotomamba` and follow the README installation section.
65
+
66
+ **WARNING**: huggingface transformers `AutoModelForCausalLM` **doesn't support** mamba model. So, please use `kotomamba/benchmarks/benchmark_generation_mamba_simple.py`
67
+
68
+ You can find the inference sample script in `scripts/abci/inference/inference_sample.sh`
69
+
70
+ ## Training Datasets
71
+
72
+ ### Pre-Training & Continual Pre-Training
73
+ The following datasets were used for training.
74
+
75
+ - [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
76
+ - Swallow Corpus
77
+ - [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B)
78
+
79
+
80
+ ## Risks and Limitations
81
+
82
+ The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.
83
+
84
+ ## Acknowledgements
85
+
86
+ We thank Albert Gu and Tri Dao for releasing the original mamba model and implementation on GitHub.
87
+
88
+ Our project is supported by the [ABCI Grand Challenge](https://abci.ai/en/link/grandchallenge.html) of the National Institute of Advanced Industrial Science and Technology.
89
+
90
+ ## License
91
+
92
+ Apache License Version 2.0, January 2004
93
+
94
+ ## Authors
95
+
96
+ Here are the team members:
97
+ - From [Kotoba Technologies](https://www.kotoba.tech/)
98
+ - [Noriyuki Kojima](https://twitter.com/noriyuki_kojima)
99
+ - [Jungo Kasai](https://twitter.com/jungokasai)
100
+ - [Hiroto Kurita](https://twitter.com/hiroto_kurita)
101
+ - [Kazuki Fujii](https://twitter.com/okoge_kaz)
102
+ - From [TohokuNLP group at Tohoku University](https://www.nlp.ecei.tohoku.ac.jp/)
103
+ - [Keisuke Sakaguchi](https://twitter.com/KeisukeS_)
104
+ - From Tokyo Institute of Technologies
105
+ - From [Okazaki Laboratory](https://www.nlp.c.titech.ac.jp/index.en.html), the following members:
106
+ - [Naoaki Okazaki](https://www.chokkan.org/index.ja.html)
107
+ - [Sakae Mizuki](https://s-mizuki-nlp.github.io/)
108
+ - [Hiroki Iida](https://meshidenn.github.io/)
109
+ - [Mengsay Loem](https://loem-ms.github.io/)
110
+ - [Shota Hirai](https://huggingface.co/Kotemo428)
111
+ - [Kakeru Hattori](https://aya-se.vercel.app/)
112
+ - [Masanari Ohi](https://twitter.com/stjohn2007)
113
+ - From [YOKOTA Laboratory](https://www.rio.gsic.titech.ac.jp/en/index.html), the following members:
114
+ - [Rio Yokota](https://twitter.com/rioyokota)
115
+ - [Taishi Nakamura](https://twitter.com/Setuna7777_2)