Text Generation
Transformers
Safetensors
English
mega
Inference Endpoints
pszemraj commited on
Commit
11384f1
1 Parent(s): c70ce0c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - JeanKaddour/minipile
5
+ - BEE-spoke-data/wikipedia-20230901.en-deduped
6
+ - BEE-spoke-data/knowledge-inoc-concat-v1
7
+ language:
8
+ - en
9
+ inference:
10
+ parameters:
11
+ max_new_tokens: 64
12
+ do_sample: true
13
+ temperature: 0.8
14
+ repetition_penalty: 1.05
15
+ no_repeat_ngram_size: 4
16
+ eta_cutoff: 0.0006
17
+ renormalize_logits: true
18
+ widget:
19
+ - text: My name is El Microondas the Wise, and
20
+ example_title: El Microondas
21
+ - text: Kennesaw State University is a public
22
+ example_title: Kennesaw State University
23
+ - text: >-
24
+ Bungie Studios is an American video game developer. They are most famous
25
+ for developing the award winning Halo series of video games. They also
26
+ made Destiny. The studio was founded
27
+ example_title: Bungie
28
+ - text: The Mona Lisa is a world-renowned painting created by
29
+ example_title: Mona Lisa
30
+ - text: >-
31
+ The Harry Potter series, written by J.K. Rowling, begins with the book
32
+ titled
33
+ example_title: Harry Potter Series
34
+ - text: >-
35
+ Question: I have cities, but no houses. I have mountains, but no trees. I
36
+ have water, but no fish. What am I?
37
+
38
+ Answer:
39
+ example_title: Riddle
40
+ - text: The process of photosynthesis involves the conversion of
41
+ example_title: Photosynthesis
42
+ - text: >-
43
+ Jane went to the store to buy some groceries. She picked up apples,
44
+ oranges, and a loaf of bread. When she got home, she realized she forgot
45
+ example_title: Story Continuation
46
+ - text: >-
47
+ Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
48
+ and another train leaves Station B at 10:00 AM and travels at 80 mph, when
49
+ will they meet if the distance between the stations is 300 miles?
50
+
51
+ To determine
52
+ example_title: Math Problem
53
+ - text: In the context of computer programming, an algorithm is
54
+ example_title: Algorithm Definition
55
+ pipeline_tag: text-generation
56
+ ---
57
+
58
+
59
+ # BEE-spoke-data/mega-ar-126m-4k
60
+
61
+ > model card WIP, more details to come
62
+
63
+
64
+ This may not be the _best_ language model, but it is a language model! It's interesting for a few reasons, not in the least of which is that it's technically not a transformer.
65
+
66
+
67
+ Details:
68
+
69
+ - 768 hidden size, 12 layers
70
+ - no MEGA chunking, 4096 context length
71
+ - EMA dimension 16, shared dimension 192
72
+ - train-from-scratch
73
+
74
+
75
+ ---