xDAN2099 commited on
Commit
2a0bdf4
1 Parent(s): 31ea432

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -38
README.md CHANGED
@@ -12,47 +12,37 @@ further optimized with human-enhanced feedback algorithms to improve reasoning,
12
  For more comprehensive information, please visit our blog post and GitHub repository.
13
  https://github.com/shootime2021/APUS-xDAN-4.0-moe
14
 
15
- Model Details
16
-
17
- APUS-xDAN-4.0-MOE leverages the innovative Mixture of Experts (MoE) architecture, incorporating components from dense language models. Specifically, it inherits its capabilities from the highly performant xDAN-L2 Series. With a total of 136 billion parameters, of which 30 billion are activated during runtime, APUS-xDAN-4.0-MOE demonstrates unparalleled efficiency. Through advanced quantization techniques, our open-source version occupies a mere 42GB, making it seamlessly compatible with consumer-grade GPUs like the 4090 and 3090.
18
-
19
- Requirements
20
-
21
- The codebase for APUS-xDAN-4.0-MOE is integrated into the latest Hugging Face transformers library. We recommend building from source using the command pip install git+https://github.com/huggingface/transformers to ensure compatibility. Failure to do so may result in encountering the following error:
22
-
23
- Copy code
24
- Usage llama.cpp
25
-
 
 
 
 
26
  ## Usage
27
 
 
28
  ```python
29
- import torch
30
- from transformers import AutoModelForCausalLM, AutoTokenizer
31
-
32
- torch.set_default_dtype(torch.bfloat16)
33
-
34
- tokenizer = AutoTokenizer.from_pretrained("hpcai-tech/grok-1", trust_remote_code=True)
35
-
36
- model = AutoModelForCausalLM.from_pretrained(
37
- "xDAN-AI/APUS-xDAN-4.0-MOE",
38
- trust_remote_code=True,
39
- device_map="auto",
40
- torch_dtype=torch.bfloat16,
41
- )
42
- model.eval()
43
-
44
- text = "Hi, xDAN-APUS4.0, nice to meet you!"
45
- input_ids = tokenizer(text, return_tensors="pt").input_ids
46
- input_ids = input_ids.cuda()
47
- attention_mask = torch.ones_like(input_ids)
48
- generate_kwargs = {} # Add any additional args if you want
49
- inputs = {
50
- "input_ids": input_ids,
51
- "attention_mask": attention_mask,
52
- **generate_kwargs,
53
- }
54
- outputs = model.generate(**inputs)
55
- print(outputs)
56
  ```
57
  License
58
 
 
12
  For more comprehensive information, please visit our blog post and GitHub repository.
13
  https://github.com/shootime2021/APUS-xDAN-4.0-moe
14
 
15
+ # Model Details
16
+ APUS-xDAN-4.0-MOE leverages the innovative Mixture of Experts (MoE) architecture, incorporating components from dense language models. Specifically, it inherits its capabilities from the highly performant xDAN-L2 Series. With a total of 136 billion parameters, of which 30 billion are activated during runtime, APUS-xDAN-4.0-MOE demonstrates unparalleled efficiency.
17
+ Through advanced quantization techniques, our open-source version occupies a mere 42GB, making it seamlessly compatible with consumer-grade GPUs like the 4090 and 3090.
18
+ The following specifications:
19
+
20
+ - **Parameters:** 134B
21
+ - **Architecture:** Mixture of 4 Experts (MoE)
22
+ - **Experts Utilization:** 2 experts used per token
23
+ - **Layers:** 60
24
+ - **Attention Heads:** 56 for queries, 8 for keys/values
25
+ - **Embedding Size:** 7,168
26
+ - **Additional Features:**
27
+ - Rotary embeddings (RoPE)
28
+ - Supports activation sharding and 1.5bit~4bit quantization
29
+ - **Maximum Sequence Length (context):** 32,768 tokens
30
  ## Usage
31
 
32
+ ### Initial
33
  ```python
34
+
35
+ git clone https://github.com/ggerganov/llama.cpp.git
36
+ make LLAMA_CUDA=1
37
+ ```
38
+ ### Interactive Chat
39
+ ```python
40
+
41
+ ./main -m xDAN-L2-moe-4x34b-v4-0326.IQ3_S.gguf \
42
+ --prompt "You are a helpful assistant." --chatml \
43
+ --interactive \
44
+ --temp 0.7 \
45
+ --ctx-size 4096
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  ```
47
  License
48