Pacific-Prime commited on
Commit
b6d0af5
·
verified ·
1 Parent(s): 307ac07

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +187 -0
README.md ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ - fr
6
+ tags:
7
+ - complexity-deep
8
+ - transformer
9
+ - moe
10
+ - token-routed
11
+ - inl-dynamics
12
+ - mu-guided
13
+ - causal-lm
14
+ - chat
15
+ - conversational
16
+ - sft
17
+ pipeline_tag: text-generation
18
+ library_name: complexity-deep
19
+ base_model: Pacific-Prime/pacific-prime
20
+ model-index:
21
+ - name: chat-node
22
+ results: []
23
+ ---
24
+
25
+ # Chat-Node 1.5B
26
+
27
+ > **Conversational chat model built on Pacific-Prime 1.5B with Mu-Guided Attention and Token-Routed MLP**
28
+
29
+ Chat-Node is a conversational variant of [Pacific-Prime 1.5B](https://huggingface.co/Pacific-Prime/pacific-prime), fine-tuned for general-purpose chat using the Alpaca-Cleaned dataset. Part of the Pacific-Prime node architecture for modular AI agents.
30
+
31
+ ## Generation Example (Epoch 350)
32
+
33
+ ![Generation at epoch 350](image.png)
34
+
35
+ ---
36
+
37
+ ## Model Details
38
+
39
+ | Attribute | Value |
40
+ |-----------|-------|
41
+ | Base Model | Pacific-Prime 1.5B v0.13.0 |
42
+ | Parameters | ~1.52B |
43
+ | Fine-tuning | SFT (Supervised Fine-Tuning) |
44
+ | Base Checkpoint | pacific-prime-python epoch 450 |
45
+ | Dataset | [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) (20K samples) |
46
+ | Current Epoch | 350 |
47
+ | Precision | F32 |
48
+ | Hardware | H100 80GB |
49
+ | Context Length | 2048 tokens |
50
+
51
+ ### Training Hyperparameters
52
+
53
+ | Parameter | Value |
54
+ |-----------|-------|
55
+ | Learning Rate | 2e-5 |
56
+ | Batch Size | 4 |
57
+ | Gradient Accumulation | 8 (effective batch: 32) |
58
+ | Weight Decay | 0.01 |
59
+ | Warmup Ratio | 3% |
60
+ | Gradient Checkpointing | Enabled |
61
+
62
+ ---
63
+
64
+ ## Chat Format
65
+
66
+ Chat-Node uses a simple User / Assistant prompt format with an optional system message:
67
+
68
+ User: Give three tips for staying healthy.
69
+
70
+ Assistant:
71
+
72
+ ### Chat Template (Jinja)
73
+
74
+ The model includes a chat template compatible with HuggingFace's `apply_chat_template`:
75
+
76
+ {% if messages[0]['role'] == 'system' %}{{ messages[0]['content'] }}
77
+ {% set messages = messages[1:] %}{% endif %}
78
+ {% for message in messages %}
79
+ {% if message['role'] == 'user' %}User: {{ message['content'] }}
80
+ {% elif message['role'] == 'assistant' %}Assistant: {{ message['content'] }}
81
+ {% endif %}
82
+ {% endfor %}
83
+
84
+ ---
85
+
86
+ ## Architecture
87
+
88
+ | Parameter | Value |
89
+ |-----------|-------|
90
+ | Hidden Size | 2048 |
91
+ | Intermediate Size | 5632 |
92
+ | Layers | 24 |
93
+ | Attention Heads | 16 |
94
+ | KV Heads (GQA) | 8 |
95
+ | Max Position | 2048 |
96
+ | Vocab Size | 32,000 |
97
+ | Experts (Token-Routed MLP) | 4 |
98
+
99
+ ### Key Innovations (v0.13.0)
100
+
101
+ - **Mu-Guided KQV** - Learned equilibrium parameter biases K, Q, and V projections
102
+ - **Mu-Guided Expert Routing** - mu influences MLP expert selection
103
+ - **Mu Residual Highway** - Accumulated context across layers
104
+ - **Token-Routed MLP** - Deterministic 4-expert MoE with zero routing overhead
105
+ - **INL Dynamics** - Velocity tracking for temporal coherence (alpha=0.9, beta=0.1)
106
+ - **Grouped Query Attention** - 16 heads / 8 KV heads for efficient inference
107
+ - **QK Normalization** + **Flash Attention (SDPA)**
108
+ - **RoPE** positional embeddings
109
+
110
+ ---
111
+
112
+ ## Usage
113
+
114
+ ### CLI (generate.py)
115
+
116
+ ```bash
117
+ python generate.py -c ./checkpoints/pacific-prime-chat -m 300 -t 0.3 \
118
+ $'User: Give three tips for staying healthy.\n\nAssistant:'
119
+ ```
120
+
121
+ ### Python
122
+
123
+ ```python
124
+ from complexity_deep import DeepForCausalLM
125
+ from tokenizers import Tokenizer
126
+ import torch
127
+
128
+ model = DeepForCausalLM.from_pretrained("Pacific-Prime/chat-node")
129
+ tokenizer = Tokenizer.from_file("tokenizer.json")
130
+
131
+ prompt = "User: Explain what a neural network is.\n\nAssistant:"
132
+
133
+ input_ids = torch.tensor([tokenizer.encode(prompt).ids])
134
+ output = model.generate(input_ids, max_new_tokens=300, temperature=0.3)
135
+ print(tokenizer.decode(output[0].tolist()))
136
+ ```
137
+
138
+ ---
139
+
140
+ ## Files
141
+
142
+ | File | Description |
143
+ |------|-------------|
144
+ | `checkpoint_epoch350.pt` | Model weights (F32) |
145
+ | `config.json` | Architecture configuration |
146
+ | `tokenizer.json` | BPE tokenizer (32K vocab) |
147
+ | `tokenizer_config.json` | Tokenizer settings |
148
+ | `special_tokens_map.json` | Special tokens |
149
+ | `chat_template.jinja` | Chat prompt template |
150
+
151
+ ---
152
+
153
+ ## Limitations
154
+
155
+ - **In development**: Training ongoing, not yet production-ready
156
+ - **English-focused**: Alpaca dataset is primarily English
157
+ - **Instruction following**: May overshoot requested list lengths
158
+ - **Context window**: Limited to 2048 tokens
159
+
160
+ ---
161
+
162
+ ## Links
163
+
164
+ - [Paper - Zenodo](https://zenodo.org/records/18293026)
165
+ - [Base Model - Pacific-Prime 1.5B](https://huggingface.co/Pacific-Prime/pacific-prime)
166
+ - [GitHub - complexity-deep](https://github.com/Complexity-ML/complexity-deep)
167
+ - [PyPI - complexity-deep](https://pypi.org/project/complexity-deep/)
168
+ - [GitHub - mu-inference](https://github.com/Complexity-ML/mu-inference)
169
+
170
+ ---
171
+
172
+ ## License
173
+
174
+ **CC-BY-NC-4.0** (Creative Commons Attribution-NonCommercial 4.0)
175
+
176
+ ---
177
+
178
+ ## Citation
179
+
180
+ ```bibtex
181
+ @misc{chat-node-2025,
182
+ title={Chat-Node: A Conversational 1.5B Model with Mu-Guided Attention},
183
+ author={Boris Peyriguere},
184
+ year={2025},
185
+ url={https://huggingface.co/Pacific-Prime/chat-node}
186
+ }
187
+ ```