cerebras
/

btlm-3b-8k-base

Text Generation

Model card Files Files and versions Community

rskuzma commited on Jul 24, 2023

Commit

77bf4f1

•

1 Parent(s): cb7fba3

point to svg for long msl image

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -118,7 +118,7 @@ Figure 4: Performance at 7B model size
 ## Long Sequence Lengths
 To enable long sequence applications, we use ALiBi position embeddings and trained on 470B tokens at the context length of 2,048 followed by 157B of tokens trained at 8,192 context length. To assess BTLM’s long sequence capability, we evaluate it on SlimPajama test set with 32,768 context length and plot loss at each token position. Although ALiBi allows extrapolation in theory, 2,048 context length training alone does not extrapolate well in practice. Thankfully variable sequence length training allows for substantially improved extrapolation. BTLM-3B extrapolates well up to 10k context length but the performance degrades slightly beyond this.
-![figure_5_image](./figure_5_xentropy_with_sequence_lengths.png)
 Figure 5: BTLM-3B model's cross-entropy evaluation on the SlimPajama’s test set. Inference performed on the extrapolated sequence length of 32,768 tokens.
 ## Model Details

 ## Long Sequence Lengths
 To enable long sequence applications, we use ALiBi position embeddings and trained on 470B tokens at the context length of 2,048 followed by 157B of tokens trained at 8,192 context length. To assess BTLM’s long sequence capability, we evaluate it on SlimPajama test set with 32,768 context length and plot loss at each token position. Although ALiBi allows extrapolation in theory, 2,048 context length training alone does not extrapolate well in practice. Thankfully variable sequence length training allows for substantially improved extrapolation. BTLM-3B extrapolates well up to 10k context length but the performance degrades slightly beyond this.
+![figure_5_image](./figure_5_xentropy_with_sequence_lengths.svg)
 Figure 5: BTLM-3B model's cross-entropy evaluation on the SlimPajama’s test set. Inference performed on the extrapolated sequence length of 32,768 tokens.
 ## Model Details