kuotient commited on
Commit
c13dfe5
1 Parent(s): bc6b6f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -5
README.md CHANGED
@@ -17,9 +17,8 @@ For a coffee chat or if you have any questions, please do not hesitate to contac
17
 
18
  I would like to thank Allganize Korea for their generosity in providing resources for this personal project. This project is not directly related to the company's goals or research.
19
  ## TODO
20
- - Complete training with korean_textbooks - 6B tokens down, 2B to go.
21
  - More training with publicly available Korean corpora
22
- - Instruct tuning
23
  ## **What is Mamba?**
24
  Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.
25
  ## **License**
@@ -33,7 +32,7 @@ Jisoo Kim(kuotient)
33
  ### KoBEST
34
  | Model | boolq | copa | hellaswag | sentineg |
35
  | --- | --- | --- | --- | --- |
36
- | kuotient/mamba-ko-2.8b* | 0.5825 | 0.6166 | 0.4051 | 0.3383 |
37
  | state_spaces/mamba-2.8b-slimpj | 0.3343 | 0.4867 | 0.3452 | 0.3547 |
38
  | kuotient/mamba-ko-2.8b-old (2B trained only) | 0.4236 | 0.5896 | 0.4012 | 0.4348 |
39
  | kuotient/mamba-ko-2.8b-old-instruct | 0.4041 | 0.6505 | 0.4906 | 0.3348 |
@@ -41,7 +40,6 @@ Jisoo Kim(kuotient)
41
  | maywell/TinyWand-SFT | 0.3455 | 0.6142 | 0.3944 | N/A |
42
  | microsoft/phi-2 | 0.3343 | 0.4792 | 0.3235 | N/A |
43
  | TinyLlama/TinyLlama-1.1B | 0.3343 | 0.4784 | 0.3396 | N/A |
44
- *>6B tokens trained. Further up to 8B tokens.
45
  ### Thanks
46
  한국어 LLM 커뮤니티에 많은 기여와 동기부여를 해주고 계신 [maywell](https://huggingface.co/maywell)님 감사드립니다.
47
  ## Usage
@@ -55,7 +53,7 @@ from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
55
 
56
  device = "cuda" if torch.cuda.is_available() else "cpu"
57
 
58
- model_name = "kuotient/mamba-2.8b-ko"
59
  tokenizer = AutoTokenizer.from_pretrained(model_name)
60
  tokenizer.pad_token = tokenizer.eos_token
61
 
 
17
 
18
  I would like to thank Allganize Korea for their generosity in providing resources for this personal project. This project is not directly related to the company's goals or research.
19
  ## TODO
 
20
  - More training with publicly available Korean corpora
21
+ - 🟡 Instruct tuning
22
  ## **What is Mamba?**
23
  Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.
24
  ## **License**
 
32
  ### KoBEST
33
  | Model | boolq | copa | hellaswag | sentineg |
34
  | --- | --- | --- | --- | --- |
35
+ | kuotient/mamba-ko-2.8b | 0.6213 | 0.6150 | 0.4014 | 0.3383 |
36
  | state_spaces/mamba-2.8b-slimpj | 0.3343 | 0.4867 | 0.3452 | 0.3547 |
37
  | kuotient/mamba-ko-2.8b-old (2B trained only) | 0.4236 | 0.5896 | 0.4012 | 0.4348 |
38
  | kuotient/mamba-ko-2.8b-old-instruct | 0.4041 | 0.6505 | 0.4906 | 0.3348 |
 
40
  | maywell/TinyWand-SFT | 0.3455 | 0.6142 | 0.3944 | N/A |
41
  | microsoft/phi-2 | 0.3343 | 0.4792 | 0.3235 | N/A |
42
  | TinyLlama/TinyLlama-1.1B | 0.3343 | 0.4784 | 0.3396 | N/A |
 
43
  ### Thanks
44
  한국어 LLM 커뮤니티에 많은 기여와 동기부여를 해주고 계신 [maywell](https://huggingface.co/maywell)님 감사드립니다.
45
  ## Usage
 
53
 
54
  device = "cuda" if torch.cuda.is_available() else "cpu"
55
 
56
+ model_name = "kuotient/mamba-ko-2.8b"
57
  tokenizer = AutoTokenizer.from_pretrained(model_name)
58
  tokenizer.pad_token = tokenizer.eos_token
59