tokenizer note
Browse files
README.md
CHANGED
@@ -11,9 +11,12 @@ convert_v2.py
|
|
11 |
|
12 |
Training Notes:
|
13 |
```
|
14 |
-
# dbrx trains like a much smaller model (~7B)
|
15 |
# start with this as reference point and move up or down based on eval/train loss
|
16 |
learning_rate = 1.5e-5
|
|
|
|
|
|
|
17 |
```
|
18 |
|
19 |
Known Issues:
|
|
|
11 |
|
12 |
Training Notes:
|
13 |
```
|
14 |
+
# 1. dbrx trains like a much smaller model (~7B)
|
15 |
# start with this as reference point and move up or down based on eval/train loss
|
16 |
learning_rate = 1.5e-5
|
17 |
+
|
18 |
+
# 2. due to BPE (tiktoken) nature, tokenizer expansion/resize is not very friendly to training
|
19 |
+
# use text based special tokens if you need/use extra tokens to avoid bad train/eval losses
|
20 |
```
|
21 |
|
22 |
Known Issues:
|