xDAN2099 commited on
Commit
29fde44
1 Parent(s): c5dbeba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -2,7 +2,9 @@
2
  license: apache-2.0
3
  ---
4
 
5
- Introduction
 
 
6
 
7
  APUS-xDAN-4.0-MOE is a transformer-based decoder-only language model, developed on a vast corpus of data to ensure robust performance.
8
 
@@ -17,7 +19,7 @@ APUS-xDAN-4.0-MOE leverages the innovative Mixture of Experts (MoE) architecture
17
  Through advanced quantization techniques, our open-source version occupies a mere 42GB, making it seamlessly compatible with consumer-grade GPUs like the 4090 and 3090.
18
  The following specifications:
19
 
20
- - **Parameters:** 134B
21
  - **Architecture:** Mixture of 4 Experts (MoE)
22
  - **Experts Utilization:** 2 experts used per token
23
  - **Layers:** 60
@@ -26,7 +28,7 @@ The following specifications:
26
  - **Additional Features:**
27
  - Rotary embeddings (RoPE)
28
  - Supports activation sharding and 1.5bit~4bit quantization
29
- - **Maximum Sequence Length (context):** 32,768 tokens
30
  ## Usage
31
 
32
  ### Initial
@@ -38,7 +40,7 @@ make LLAMA_CUDA=1
38
  ### Interactive Chat
39
  ```python
40
 
41
- ./main -m xDAN-L2-moe-4x34b-v4-0326.IQ3_S.gguf \
42
  --prompt "You are a helpful assistant." --chatml \
43
  --interactive \
44
  --temp 0.7 \
 
2
  license: apache-2.0
3
  ---
4
 
5
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/643197ac288c9775673a01e9/w-lgOpASM1DMl2PO0kdFy.png)
6
+
7
+ ## Introduction
8
 
9
  APUS-xDAN-4.0-MOE is a transformer-based decoder-only language model, developed on a vast corpus of data to ensure robust performance.
10
 
 
19
  Through advanced quantization techniques, our open-source version occupies a mere 42GB, making it seamlessly compatible with consumer-grade GPUs like the 4090 and 3090.
20
  The following specifications:
21
 
22
+ - **Parameters:** 136B
23
  - **Architecture:** Mixture of 4 Experts (MoE)
24
  - **Experts Utilization:** 2 experts used per token
25
  - **Layers:** 60
 
28
  - **Additional Features:**
29
  - Rotary embeddings (RoPE)
30
  - Supports activation sharding and 1.5bit~4bit quantization
31
+ - **Maximum Sequence Length (context):** 32,768 tokens
32
  ## Usage
33
 
34
  ### Initial
 
40
  ### Interactive Chat
41
  ```python
42
 
43
+ ./main -m APUS-xDAN-4.0-quanzied_version.gguf \
44
  --prompt "You are a helpful assistant." --chatml \
45
  --interactive \
46
  --temp 0.7 \