isemmanuelolowe commited on
Commit
8fb5890
·
verified ·
1 Parent(s): 2045956

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -0
README.md CHANGED
@@ -1,3 +1,45 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ # Jamba 8xMoe (Slerp Merge)
6
+
7
+ This model has been merged from [Jamba](https://huggingface.co/ai21labs/Jamba-v0.1) a 52B parameter model with 16 experts. It used an accumulative SLERP to merge experts from 16 to 8.
8
+
9
+
10
+ 4 Bit Inference Code
11
+ ```python
12
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
13
+ import torch
14
+
15
+ model_id = "isemmanuelolowe/Jamba-8xMoE_slerp"
16
+
17
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
18
+ quantization_config = BitsAndBytesConfig(
19
+ load_in_4bit=True,
20
+ # load_in_8bit=True,
21
+ bnb_4bit_quant_type="nf4",
22
+ bnb_4bit_compute_dtype=torch.bfloat16,
23
+ bnb_4bit_use_double_quant=True,
24
+ llm_int8_skip_modules=["mamba"],
25
+ )
26
+
27
+ model = AutoModelForCausalLM.from_pretrained(
28
+ model_id,
29
+ trust_remote_code=True,
30
+ torch_dtype=torch.bfloat16,
31
+ attn_implementation="flash_attention_2",
32
+ quantization_config=quantization_config
33
+ )
34
+
35
+ input_ids = tokenizer("Here is how to do bubble sort\n```python\n", return_tensors="pt")["input_ids"].to("cuda")
36
+
37
+ out = model.generate(input_ids, max_new_tokens=256, temperature=0, repetition_penalty=1)
38
+ print(tokenizer.batch_decode(out, skip_special_tokens=True))
39
+ ```
40
+
41
+ OUTPUT:
42
+ Here is how to do bubble sort
43
+ ```bash
44
+ ['Here is how to do bubble sort\n```python\ndef bubble_sort(array):\n for i in 0, len(array):\n for j in 0, len(array):\n if a[i] < a[j]\n a[i], a[j]\n\n```\n\n\n\n\n\n\n']
45
+ ```