femiari commited on
Commit
f38e7bf
1 Parent(s): 0823729

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -0
README.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2-1.5B
4
+ - Replete-AI/Replete-Coder-Qwen2-1.5b
5
+ license: apache-2.0
6
+ tags:
7
+ - moe
8
+ - frankenmoe
9
+ - merge
10
+ - mergekit
11
+ - lazymergekit
12
+ - Qwen/Qwen2-1.5B
13
+ - Replete-AI/Replete-Coder-Qwen2-1.5b
14
+ ---
15
+
16
+ # QwenMoEAriel
17
+
18
+ QwenMoEAriel is a Mixture of Experts (MoE) made with the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
19
+ * [Qwen/Qwen2-1.5B](https://huggingface.co/Qwen/Qwen2-1.5B)
20
+ * [Replete-AI/Replete-Coder-Qwen2-1.5b](https://huggingface.co/Replete-AI/Replete-Coder-Qwen2-1.5b)
21
+
22
+ ## 🧩 Configuration
23
+
24
+ ```yaml
25
+ base_model: Qwen/Qwen2-1.5B
26
+ architecture: qwen
27
+ experts:
28
+ - source_model: Qwen/Qwen2-1.5B
29
+ positive_prompts:
30
+ - "chat"
31
+ - "assistant"
32
+ - "tell me"
33
+ - "explain"
34
+ - "I want"
35
+ - source_model: Replete-AI/Replete-Coder-Qwen2-1.5b
36
+ positive_prompts:
37
+ - "code"
38
+ - "python"
39
+ - "javascript"
40
+ - "programming"
41
+ - "algorithm"
42
+ shared_experts:
43
+ - source_model: Qwen/Qwen2-1.5B
44
+ positive_prompts: # required by Qwen MoE for "hidden" gate mode, otherwise not allowed
45
+ - "chat"
46
+ # (optional, but recommended:)
47
+ residual_scale: 0.1 # downweight output from shared expert to prevent overcooking the model
48
+ ```
49
+
50
+ ## 💻 Usage
51
+
52
+ ```python
53
+ !pip install -qU transformers bitsandbytes accelerate einops
54
+ import torch
55
+ from transformers import AutoModelForCausalLM, AutoTokenizer
56
+
57
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
58
+ print(device)
59
+ model = AutoModelForCausalLM.from_pretrained(
60
+ "femiari/Qwen2-1.5Moe",
61
+ torch_dtype=torch.float16,
62
+ ignore_mismatched_sizes=True
63
+ ).to(device)
64
+ tokenizer = AutoTokenizer.from_pretrained("femiari/Qwen2-1.5Moe")
65
+
66
+ prompt = "Give me a short introduction to large language model."
67
+ messages = [
68
+ {"role": "system", "content": "You are a helpful assistant."},
69
+ {"role": "user", "content": prompt}
70
+ ]
71
+ text = tokenizer.apply_chat_template(
72
+ messages,
73
+ tokenize=False,
74
+ add_generation_prompt=True
75
+ )
76
+ model_inputs = tokenizer([text], return_tensors="pt").to(device)
77
+
78
+ generated_ids = model.generate(
79
+ model_inputs.input_ids,
80
+ max_new_tokens=512
81
+ )
82
+ generated_ids = [
83
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
84
+ ]
85
+
86
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
87
+
88
+ print(response)
89
+
90
+ ```