Text Generation
Transformers
Safetensors
English
llama
smol_llama
llama2
text-generation-inference
Inference Endpoints
pszemraj commited on
Commit
366be30
1 Parent(s): da8f31d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ thumbnail: https://i.ibb.co/TvyMrRc/rsz-smol-llama-banner.png
4
+ language:
5
+ - en
6
+ inference:
7
+ parameters:
8
+ max_new_tokens: 64
9
+ do_sample: true
10
+ repetition_penalty: 1.2
11
+ no_repeat_ngram_size: 5
12
+ eta_cutoff: 0.0006
13
+ renormalize_logits: true
14
+ widget:
15
+ - text: My name is El Microondas the Wise and
16
+ example_title: El Microondas
17
+ - text: Kennesaw State University is a public
18
+ example_title: Kennesaw State University
19
+ - text: >-
20
+ Bungie Studios is an American video game developer. They are most famous for
21
+ developing the award winning Halo series of video games. They also made
22
+ Destiny. The studio was founded
23
+ example_title: Bungie
24
+ - text: The Mona Lisa is a world-renowned painting created by
25
+ example_title: Mona Lisa
26
+ - text: >-
27
+ The Harry Potter series, written by J.K. Rowling, begins with the book
28
+ titled
29
+ example_title: Harry Potter Series
30
+ - text: >-
31
+ Question: I have cities, but no houses. I have mountains, but no trees. I
32
+ have water, but no fish. What am I?
33
+
34
+ Answer:
35
+ example_title: Riddle
36
+ - text: The process of photosynthesis involves the conversion of
37
+ example_title: Photosynthesis
38
+ - text: >-
39
+ Jane went to the store to buy some groceries. She picked up apples, oranges,
40
+ and a loaf of bread. When she got home, she realized she forgot
41
+ example_title: Story Continuation
42
+ - text: >-
43
+ Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
44
+ another train leaves Station B at 10:00 AM and travels at 80 mph, when will
45
+ they meet if the distance between the stations is 300 miles?
46
+
47
+ To determine
48
+ example_title: Math Problem
49
+ - text: In the context of computer programming, an algorithm is
50
+ example_title: Algorithm Definition
51
+ pipeline_tag: text-generation
52
+ tags:
53
+ - smol_llama
54
+ - llama2
55
+ datasets:
56
+ - JeanKaddour/minipile
57
+ - pszemraj/simple_wikipedia_LM
58
+ - BEE-spoke-data/wikipedia-20230901.en-deduped
59
+ - mattymchen/refinedweb-3m
60
+ ---
61
+
62
+
63
+ # smol_llama-81M-tied
64
+
65
+ A small 101M param (total) decoder model. This is the first version of the model.
66
+
67
+ - 768 hidden size, 6 layers
68
+ - standard multi-head attention (24 heads), context length 1024
69
+ - input/output embeddings **are tied**
70
+ - train-from-scratch
71
+
72
+ ## Notes
73
+
74
+ **This checkpoint** is the 'raw' pre-trained model and has not been tuned to a more specific task. **It should be fine-tuned** before use in most cases.
75
+
76
+ - slightly larger 101M param GQA pretrained version: [here](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA)
77
+ - For the chat version of this model, please [see here](https://youtu.be/dQw4w9WgXcQ?si=3ePIqrY1dw94KMu4)
78
+
79
+ ---