kuotient commited on
Commit
900c144
โ€ข
1 Parent(s): 9b33721

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - maywell/korean_textbooks
5
+ language:
6
+ - ko
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - mamba
10
+ ---
11
+ # **Mamba-ko-2.8B๐Ÿ**
12
+ ![Mamba-ko-2.8B](./Seagull-mamba.png)
13
+ **Mamba-ko-2.8B** is the state space model, further pretrained(or continous trained) with synthetically generated dataset - [**korean_textbooks**](https://huggingface.co/datasets/maywell/korean_textbooks).
14
+
15
+ If you're interested in building large-scale language models to solve a wide variety of problems in a wide variety of domains, you should consider joining [Allganize](https://allganize.career.greetinghr.com/o/65146).
16
+ For a coffee chat or if you have any questions, please do not hesitate to contact me as well! - jisoo.kim@allganize.ai
17
+ ## TODO
18
+ - Complete training with korean_textbooks - 6B tokens down, 2B to go.
19
+ - More training with publicly available Korean corpora
20
+ - Instruct tuning
21
+ ## **What is Mamba?**
22
+ Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.
23
+ ## **License**
24
+ Apache 2.0
25
+ ## **Model Details**
26
+ #### **Developed by**
27
+ Jisoo Kim(kuotient)
28
+ #### **Base Model**
29
+ [state-spaces/mamba-2.8b-slimpj](https://huggingface.co/state-spaces/mamba-2.8b-slimpj)
30
+ ## **Model Benchmark**
31
+ ### Ko-LLM-Leaderboard
32
+ TBD - Waiting for polyglot branch refactoring...
33
+ ### Thanks
34
+ ํ•œ๊ตญ์–ด LLM ์ปค๋ฎค๋‹ˆํ‹ฐ์— ๋งŽ์€ ๊ธฐ์—ฌ์™€ ๋™๊ธฐ๋ถ€์—ฌ๋ฅผ ํ•ด์ฃผ๊ณ  ๊ณ„์‹  [maywell](https://huggingface.co/maywell)๋‹˜ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.
35
+ ## Usage
36
+ ```sh
37
+ pip install torch==2.1.0 transformers==4.35.0 causal_conv1d>=1.1.0 mamba-ssm==1.1.1
38
+ ```
39
+ ```py
40
+ import torch
41
+ from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
42
+ from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
43
+ device = "cuda" if torch.cuda.is_available() else "cpu"
44
+ model_name = "kuotient/mamba-2.8b-ko"
45
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
46
+ tokenizer.pad_token = tokenizer.eos_token
47
+ model = MambaLMHeadModel.from_pretrained(
48
+ model_name, device=device, dtype=torch.float16)
49
+
50
+ prompt = "์•„์ด๋“คํ•œํ…Œ ์ œ๊ณตํ•  ์˜์–‘๊ฐ€ ์žˆ๋Š” ์Œ์‹ 5๊ฐ€์ง€์˜ ์˜ˆ์‹œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค."
51
+ tokens = tokenizer(prompt, return_tensors='pt')
52
+ input_ids = tokens.input_ids.to(device)
53
+ streamer = TextStreamer(tokenizer)
54
+ out = model.generate(
55
+ input_ids=input_ids,
56
+ streamer=streamer,
57
+ max_length=2000,
58
+ temperature=0.7,
59
+ top_p=0.7,
60
+ eos_token_id=tokenizer.eos_token_id,
61
+ )
62
+ ```