rskuzma commited on
Commit
77bf4f1
1 Parent(s): cb7fba3

point to svg for long msl image

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -118,7 +118,7 @@ Figure 4: Performance at 7B model size
118
  ## Long Sequence Lengths
119
  To enable long sequence applications, we use ALiBi position embeddings and trained on 470B tokens at the context length of 2,048 followed by 157B of tokens trained at 8,192 context length. To assess BTLM’s long sequence capability, we evaluate it on SlimPajama test set with 32,768 context length and plot loss at each token position. Although ALiBi allows extrapolation in theory, 2,048 context length training alone does not extrapolate well in practice. Thankfully variable sequence length training allows for substantially improved extrapolation. BTLM-3B extrapolates well up to 10k context length but the performance degrades slightly beyond this.
120
 
121
- ![figure_5_image](./figure_5_xentropy_with_sequence_lengths.png)
122
  Figure 5: BTLM-3B model's cross-entropy evaluation on the SlimPajama’s test set. Inference performed on the extrapolated sequence length of 32,768 tokens.
123
 
124
  ## Model Details
 
118
  ## Long Sequence Lengths
119
  To enable long sequence applications, we use ALiBi position embeddings and trained on 470B tokens at the context length of 2,048 followed by 157B of tokens trained at 8,192 context length. To assess BTLM’s long sequence capability, we evaluate it on SlimPajama test set with 32,768 context length and plot loss at each token position. Although ALiBi allows extrapolation in theory, 2,048 context length training alone does not extrapolate well in practice. Thankfully variable sequence length training allows for substantially improved extrapolation. BTLM-3B extrapolates well up to 10k context length but the performance degrades slightly beyond this.
120
 
121
+ ![figure_5_image](./figure_5_xentropy_with_sequence_lengths.svg)
122
  Figure 5: BTLM-3B model's cross-entropy evaluation on the SlimPajama’s test set. Inference performed on the extrapolated sequence length of 32,768 tokens.
123
 
124
  ## Model Details