Text Generation
Transformers
Safetensors
English
llama
smol_llama
llama2
Eval Results
Inference Endpoints
text-generation-inference
pszemraj commited on
Commit
ac53d81
1 Parent(s): 05a36f4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ inference:
6
+ parameters:
7
+ max_new_tokens: 64
8
+ do_sample: true
9
+ temperature: 0.8
10
+ repetition_penalty: 1.05
11
+ no_repeat_ngram_size: 4
12
+ eta_cutoff: 0.0006
13
+ renormalize_logits: true
14
+ widget:
15
+ - text: My name is El Microondas the Wise and
16
+ example_title: El Microondas
17
+ - text: Kennesaw State University is a public
18
+ example_title: Kennesaw State University
19
+ - text: >-
20
+ Bungie Studios is an American video game developer. They are most famous for
21
+ developing the award winning Halo series of video games. They also made
22
+ Destiny. The studio was founded
23
+ example_title: Bungie
24
+ - text: The Mona Lisa is a world-renowned painting created by
25
+ example_title: Mona Lisa
26
+ - text: >-
27
+ The Harry Potter series, written by J.K. Rowling, begins with the book
28
+ titled
29
+ example_title: Harry Potter Series
30
+ - text: >-
31
+ Question: I have cities, but no houses. I have mountains, but no trees. I
32
+ have water, but no fish. What am I?
33
+
34
+ Answer:
35
+ example_title: Riddle
36
+ - text: The process of photosynthesis involves the conversion of
37
+ example_title: Photosynthesis
38
+ - text: >-
39
+ Jane went to the store to buy some groceries. She picked up apples, oranges,
40
+ and a loaf of bread. When she got home, she realized she forgot
41
+ example_title: Story Continuation
42
+ - text: >-
43
+ Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
44
+ another train leaves Station B at 10:00 AM and travels at 80 mph, when will
45
+ they meet if the distance between the stations is 300 miles?
46
+
47
+ To determine
48
+ example_title: Math Problem
49
+ - text: In the context of computer programming, an algorithm is
50
+ example_title: Algorithm Definition
51
+ pipeline_tag: text-generation
52
+ tags:
53
+ - smol_llama
54
+ - llama2
55
+ datasets:
56
+ - BEE-spoke-data/knowledge-inoc-concat-v1
57
+ ---
58
+
59
+
60
+ # smol_llama: 220M GQA
61
+
62
+ > model card WIP, more details to come
63
+
64
+
65
+ A small 220M param (total) decoder model. This is the first version of the model.
66
+
67
+ - 1024 hidden size, 10 layers
68
+ - GQA (32 heads, 8 key-value), context length 2048
69
+ - train-from-scratch on one GPU :)
70
+
71
+
72
+ ---