Qubitium commited on
Commit
a1307ee
1 Parent(s): 103afef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -9,15 +9,14 @@ Special thanks to https://huggingface.co/fahadh4ilyas
9
  convert_v2.py
10
  ```
11
 
12
- Training Notes:
 
 
13
  ```
14
- # 1. dbrx trains like a much smaller model (~7B)
15
  # start with this as reference point and move up or down based on eval/train loss
16
  learning_rate = 1.5e-5
17
-
18
- # 2. due to BPE (tiktoken) nature, tokenizer expansion/resize is not very friendly to training
19
- # use text based special tokens if you need/use extra tokens to avoid bad train/eval losses
20
  ```
 
21
 
22
  Known Issues:
23
 
 
9
  convert_v2.py
10
  ```
11
 
12
+ Training Notes/Observations:
13
+
14
+ 1. dbrx trains like a much smaller model (~7B)
15
  ```
 
16
  # start with this as reference point and move up or down based on eval/train loss
17
  learning_rate = 1.5e-5
 
 
 
18
  ```
19
+ 2. Due to nature of BPE (tiktoken), tokenizer expansion/resize is not very friendly to training. Use text based special tokens if you need/use extra tokens to avoid bad train/eval losses
20
 
21
  Known Issues:
22