BEE-spoke-data
/

smol_llama-101M-GQA

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

pszemraj commited on Oct 26, 2023

Commit

9c2d3e3

•

1 Parent(s): 1958a14

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -67,7 +67,9 @@ A small 101M param (total) decoder model. This is the first version of the model
 - GQA (24 heads, 8 key-value), context length 1024
 - train-from-scratch
-**This model** is the 'raw' pretrained model and has not been fine-tuned on a more specific task. **it should be fine-tuned further before using for most use-cases**.
 - For the chat version of this model, please [see here](https://youtu.be/dQw4w9WgXcQ?si=3ePIqrY1dw94KMu4)

 - GQA (24 heads, 8 key-value), context length 1024
 - train-from-scratch
+## Notes
+**This checkpoint** is the 'raw' pre-trained model and has not been tuned to a more specific task. **It should be fine-tuned** before use in most cases.
 - For the chat version of this model, please [see here](https://youtu.be/dQw4w9WgXcQ?si=3ePIqrY1dw94KMu4)