Qubitium commited on
Commit
103afef
1 Parent(s): 3fd46fb

tokenizer note

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -11,9 +11,12 @@ convert_v2.py
11
 
12
  Training Notes:
13
  ```
14
- # dbrx trains like a much smaller model (~7B)
15
  # start with this as reference point and move up or down based on eval/train loss
16
  learning_rate = 1.5e-5
 
 
 
17
  ```
18
 
19
  Known Issues:
 
11
 
12
  Training Notes:
13
  ```
14
+ # 1. dbrx trains like a much smaller model (~7B)
15
  # start with this as reference point and move up or down based on eval/train loss
16
  learning_rate = 1.5e-5
17
+
18
+ # 2. due to BPE (tiktoken) nature, tokenizer expansion/resize is not very friendly to training
19
+ # use text based special tokens if you need/use extra tokens to avoid bad train/eval losses
20
  ```
21
 
22
  Known Issues: