sade-adrien
commited on
Commit
•
ee7a8a3
1
Parent(s):
15e8d2f
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- sade-adrien/redpajama_v2_sample_10M
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
---
|
7 |
+
MappingAdapter exact structure available in representation_mapping.py
|
8 |
+
|
9 |
+
Mapping "sentence-transformers/stsb-roberta-large"'s hidden representation to "mistralai/Mistral-7B-Instruct-v0.1"'s.
|
10 |
+
|
11 |
+
Training:
|
12 |
+
* Steps: 114k
|
13 |
+
* Gradient accumulation: 2
|
14 |
+
* Batch size: 64
|
15 |
+
* Warm-up steps: 100
|
16 |
+
* Learning Rate: 3e-5 with linear scheduling
|
17 |
+
* Eval steps: %8000
|
18 |
+
* Training hours: ~98h
|
19 |
+
* Eval hours: ~10h
|
20 |
+
|
21 |
+
* Gradient updates: 57k
|
22 |
+
* Train examples: 7.3M
|
23 |
+
* Eval examples: 106k
|
24 |
+
* Adapter: Decoder_dim (4096) → 4096 → LeakyRelu(.1) → Encoder_dim (1024)
|