RichardErkhov
/

BEE-spoke-data_-_mega-ar-126m-4k-8bits

+Quantization made by Richard Erkhov.
+[Github](https://github.com/RichardErkhov)
+[Discord](https://discord.gg/pvy7H8DZMG)
+[Request more models](https://github.com/RichardErkhov/quant_request)
+mega-ar-126m-4k - bnb 8bits
+- Model creator: https://huggingface.co/BEE-spoke-data/
+- Original model: https://huggingface.co/BEE-spoke-data/mega-ar-126m-4k/
+Original model description:
+---
+license: apache-2.0
+datasets:
+- JeanKaddour/minipile
+- BEE-spoke-data/wikipedia-20230901.en-deduped
+- BEE-spoke-data/knowledge-inoc-concat-v1
+language:
+- en
+inference:
+  parameters:
+    max_new_tokens: 64
+    do_sample: true
+    temperature: 0.8
+    repetition_penalty: 1.05
+    no_repeat_ngram_size: 4
+    epsilon_cutoff: 0.0006
+    renormalize_logits: true
+widget:
+  - text: My name is El Microondas the Wise, and
+    example_title: El Microondas
+  - text: Kennesaw State University is a public
+    example_title: Kennesaw State University
+  - text: >-
+      Bungie Studios is an American video game developer. They are most famous
+      for developing the award winning Halo series of video games. They also
+      made Destiny. The studio was founded
+    example_title: Bungie
+  - text: The Mona Lisa is a world-renowned painting created by
+    example_title: Mona Lisa
+  - text: >-
+      The Harry Potter series, written by J.K. Rowling, begins with the book
+      titled
+    example_title: Harry Potter Series
+  - text: >-
+      Question: I have cities, but no houses. I have mountains, but no trees. I
+      have water, but no fish. What am I?
+      Answer:
+    example_title: Riddle
+  - text: The process of photosynthesis involves the conversion of
+    example_title: Photosynthesis
+  - text: >-
+      Jane went to the store to buy some groceries. She picked up apples,
+      oranges, and a loaf of bread. When she got home, she realized she forgot
+    example_title: Story Continuation
+  - text: >-
+      Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
+      and another train leaves Station B at 10:00 AM and travels at 80 mph, when
+      will they meet if the distance between the stations is 300 miles?
+      To determine
+    example_title: Math Problem
+  - text: In the context of computer programming, an algorithm is
+    example_title: Algorithm Definition
+pipeline_tag: text-generation
+---
+# BEE-spoke-data/mega-ar-126m-4k
+This may not be the _best_ language model, but it is a language model! It's interesting for several reasons, not the least of which is that it's not technically a transformer.
+Details:
+- 768 hidden size, 12 layers
+- no MEGA chunking, 4096 context length
+- EMA dimension 16, shared dimension 192
+- tokenizer: GPT NeoX
+- train-from-scratch
+For more info on MEGA (_& what some of the params above mean_), check out the [model docs](https://huggingface.co/docs/transformers/main/en/model_doc/mega#mega) or the [original paper](https://arxiv.org/abs/2209.10655)
+## Usage
+Usage is the same as any other small textgen model.
+Given the model's small size and architecture, it's probably best to leverage its longer context by adding input context to "see more" rather than "generate more".
+## evals
+Initial data:
+`hf-causal-experimental (pretrained=BEE-spoke-data/mega-ar-126m-4k,revision=main,trust_remote_code=True,dtype='float'), limit: None, provide_description: False, num_fewshot: 0, batch_size: 4`
+|     Task     |Version| Metric | Value |   |Stderr|
+|--------------|------:|--------|------:|---|-----:|
+|arc_easy      |      0|acc     | 0.4415|±  |0.0102|
+|              |       |acc_norm| 0.3969|±  |0.0100|
+|boolq         |      1|acc     | 0.5749|±  |0.0086|
+|lambada_openai|      0|ppl     |94.9912|±  |3.9682|
+|              |       |acc     | 0.2408|±  |0.0060|
+|openbookqa    |      0|acc     | 0.1660|±  |0.0167|
+|              |       |acc_norm| 0.2780|±  |0.0201|
+|piqa          |      0|acc     | 0.5974|±  |0.0114|
+|              |       |acc_norm| 0.5914|±  |0.0115|
+|winogrande    |      0|acc     | 0.4830|±  |0.0140|
+---