kenhktsui
/

nano-phi-115M-control-v0.1

@@ -1,13 +1,250 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
@@ -196,6 +433,4 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 ## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+language:
+- en
+inference:
+  parameters:
+    max_new_tokens: 64
+    do_sample: true
+    temperature: 0.8
+    repetition_penalty: 1.15
+    no_repeat_ngram_size: 4
+    eta_cutoff: 0.0006
+    renormalize_logits: true
+widget:
+- text: My name is El Microondas the Wise, and
+  example_title: El Microondas
+- text: Kennesaw State University is a public
+  example_title: Kennesaw State University
+- text: >-
+    Bungie Studios is an American video game developer. They are most famous for
+    developing the award winning Halo series of video games. They also made
+    Destiny. The studio was founded
+  example_title: Bungie
+- text: The Mona Lisa is a world-renowned painting created by
+  example_title: Mona Lisa
+- text: >-
+    The Harry Potter series, written by J.K. Rowling, begins with the book
+    titled
+  example_title: Harry Potter Series
+- text: >-
+    Question: I have cities, but no houses. I have mountains, but no trees. I
+    have water, but no fish. What am I?
+    Answer:
+  example_title: Riddle
+- text: The process of photosynthesis involves the conversion of
+  example_title: Photosynthesis
+- text: >-
+    Jane went to the store to buy some groceries. She picked up apples, oranges,
+    and a loaf of bread. When she got home, she realized she forgot
+  example_title: Story Continuation
+- text: >-
+    Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
+    another train leaves Station B at 10:00 AM and travels at 80 mph, when will
+    they meet if the distance between the stations is 300 miles?
+    To determine
+  example_title: Math Problem
+- text: In the context of computer programming, an algorithm is
+  example_title: Algorithm Definition
+pipeline_tag: text-generation
+datasets:
+- JeanKaddour/minipile
+- pszemraj/simple_wikipedia_LM
+- mattymchen/refinedweb-3m
+- Locutusque/TM-DATA
+- Skylion007/openwebtext
 ---
+# Model Card for nano-phi-115M-control-v0.1
+Inspired by [Phi2](https://huggingface.co/microsoft/phi-2), and open source small language model attempts like [smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA).
+Pre-trained with training 7B token from scratch, with a dataset of 0.6B token.
+This model acts as a control of [kenhktsui/nano-phi-115M-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-v0.1) which applies quality filter to dataset resulting in small dataset.
+It just took 2d 4h to train in Colab with a A100 40GB (~USD$ 100).
+It achieves quite competitive results in evaluation given its training token, and training data size.
+No alignment has been done yet.
+## Some metrics
+- model
+  - hidden_size: 768
+  - num_key_value_heads: 8 (grouped query attention)
+  - num_attention_heads: 24
+  - num_hidden_layers: 6
+  - context length: 1024
+  - total params: 115M
+- training:
+  - global steps: 14,000
+## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+| Metric                | Value                     |
+|-----------------------|---------------------------|
+| Avg.                  | 28.75    |
+| ARC (25-shot)         | 21.67    |
+| HellaSwag (10-shot)   | 26.89    |
+| MMLU (5-shot)         | 24.76     |
+| TruthfulQA (0-shot)   | 47.69    |
+| Winogrande (5-shot)   | 51.46    |
+| GSM8K (5-shot)        |  0.0     |
+Details:
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
+|  Task  |Version| Metric |Value |   |Stderr|
+|--------|------:|--------|-----:|---|-----:|
+|arc_easy|      0|acc     |0.3973|±  |0.0100|
+|        |       |acc_norm|0.3531|±  |0.0098|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 25, batch_size: 16
+|    Task     |Version| Metric |Value |   |Stderr|
+|-------------|------:|--------|-----:|---|-----:|
+|arc_challenge|      0|acc     |0.1843|±  |0.0113|
+|             |       |acc_norm|0.2167|±  |0.0120|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 10, batch_size: 16
+|  Task   |Version| Metric |Value |   |Stderr|
+|---------|------:|--------|-----:|---|-----:|
+|hellaswag|      0|acc     |0.2682|±  |0.0044|
+|         |       |acc_norm|0.2689|±  |0.0044|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
+|    Task     |Version|Metric|Value |   |Stderr|
+|-------------|------:|------|-----:|---|-----:|
+|truthfulqa_mc|      1|mc1   |0.2619|±  |0.0154|
+|             |       |mc2   |0.4769|±  |0.0156|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 16
+|                      Task                       |Version| Metric |Value |   |Stderr|
+|-------------------------------------------------|------:|--------|-----:|---|-----:|
+|hendrycksTest-abstract_algebra                   |      1|acc     |0.2200|±  |0.0416|
+|                                                 |       |acc_norm|0.2200|±  |0.0416|
+|hendrycksTest-anatomy                            |      1|acc     |0.3333|±  |0.0407|
+|                                                 |       |acc_norm|0.3333|±  |0.0407|
+|hendrycksTest-astronomy                          |      1|acc     |0.2895|±  |0.0369|
+|                                                 |       |acc_norm|0.2895|±  |0.0369|
+|hendrycksTest-business_ethics                    |      1|acc     |0.2000|±  |0.0402|
+|                                                 |       |acc_norm|0.2000|±  |0.0402|
+|hendrycksTest-clinical_knowledge                 |      1|acc     |0.2189|±  |0.0254|
+|                                                 |       |acc_norm|0.2189|±  |0.0254|
+|hendrycksTest-college_biology                    |      1|acc     |0.2222|±  |0.0348|
+|                                                 |       |acc_norm|0.2222|±  |0.0348|
+|hendrycksTest-college_chemistry                  |      1|acc     |0.1700|±  |0.0378|
+|                                                 |       |acc_norm|0.1700|±  |0.0378|
+|hendrycksTest-college_computer_science           |      1|acc     |0.3000|±  |0.0461|
+|                                                 |       |acc_norm|0.3000|±  |0.0461|
+|hendrycksTest-college_mathematics                |      1|acc     |0.2500|±  |0.0435|
+|                                                 |       |acc_norm|0.2500|±  |0.0435|
+|hendrycksTest-college_medicine                   |      1|acc     |0.1965|±  |0.0303|
+|                                                 |       |acc_norm|0.1965|±  |0.0303|
+|hendrycksTest-college_physics                    |      1|acc     |0.2353|±  |0.0422|
+|                                                 |       |acc_norm|0.2353|±  |0.0422|
+|hendrycksTest-computer_security                  |      1|acc     |0.2000|±  |0.0402|
+|                                                 |       |acc_norm|0.2000|±  |0.0402|
+|hendrycksTest-conceptual_physics                 |      1|acc     |0.2043|±  |0.0264|
+|                                                 |       |acc_norm|0.2043|±  |0.0264|
+|hendrycksTest-econometrics                       |      1|acc     |0.2456|±  |0.0405|
+|                                                 |       |acc_norm|0.2456|±  |0.0405|
+|hendrycksTest-electrical_engineering             |      1|acc     |0.2621|±  |0.0366|
+|                                                 |       |acc_norm|0.2621|±  |0.0366|
+|hendrycksTest-elementary_mathematics             |      1|acc     |0.2566|±  |0.0225|
+|                                                 |       |acc_norm|0.2566|±  |0.0225|
+|hendrycksTest-formal_logic                       |      1|acc     |0.1587|±  |0.0327|
+|                                                 |       |acc_norm|0.1587|±  |0.0327|
+|hendrycksTest-global_facts                       |      1|acc     |0.1600|±  |0.0368|
+|                                                 |       |acc_norm|0.1600|±  |0.0368|
+|hendrycksTest-high_school_biology                |      1|acc     |0.3226|±  |0.0266|
+|                                                 |       |acc_norm|0.3226|±  |0.0266|
+|hendrycksTest-high_school_chemistry              |      1|acc     |0.2956|±  |0.0321|
+|                                                 |       |acc_norm|0.2956|±  |0.0321|
+|hendrycksTest-high_school_computer_science       |      1|acc     |0.2800|±  |0.0451|
+|                                                 |       |acc_norm|0.2800|±  |0.0451|
+|hendrycksTest-high_school_european_history       |      1|acc     |0.2606|±  |0.0343|
+|                                                 |       |acc_norm|0.2606|±  |0.0343|
+|hendrycksTest-high_school_geography              |      1|acc     |0.2626|±  |0.0314|
+|                                                 |       |acc_norm|0.2626|±  |0.0314|
+|hendrycksTest-high_school_government_and_politics|      1|acc     |0.2176|±  |0.0298|
+|                                                 |       |acc_norm|0.2176|±  |0.0298|
+|hendrycksTest-high_school_macroeconomics         |      1|acc     |0.2128|±  |0.0208|
+|                                                 |       |acc_norm|0.2128|±  |0.0208|
+|hendrycksTest-high_school_mathematics            |      1|acc     |0.2630|±  |0.0268|
+|                                                 |       |acc_norm|0.2630|±  |0.0268|
+|hendrycksTest-high_school_microeconomics         |      1|acc     |0.2227|±  |0.0270|
+|                                                 |       |acc_norm|0.2227|±  |0.0270|
+|hendrycksTest-high_school_physics                |      1|acc     |0.3046|±  |0.0376|
+|                                                 |       |acc_norm|0.3046|±  |0.0376|
+|hendrycksTest-high_school_psychology             |      1|acc     |0.2055|±  |0.0173|
+|                                                 |       |acc_norm|0.2055|±  |0.0173|
+|hendrycksTest-high_school_statistics             |      1|acc     |0.4815|±  |0.0341|
+|                                                 |       |acc_norm|0.4815|±  |0.0341|
+|hendrycksTest-high_school_us_history             |      1|acc     |0.2059|±  |0.0284|
+|                                                 |       |acc_norm|0.2059|±  |0.0284|
+|hendrycksTest-high_school_world_history          |      1|acc     |0.2574|±  |0.0285|
+|                                                 |       |acc_norm|0.2574|±  |0.0285|
+|hendrycksTest-human_aging                        |      1|acc     |0.2063|±  |0.0272|
+|                                                 |       |acc_norm|0.2063|±  |0.0272|
+|hendrycksTest-human_sexuality                    |      1|acc     |0.2443|±  |0.0377|
+|                                                 |       |acc_norm|0.2443|±  |0.0377|
+|hendrycksTest-international_law                  |      1|acc     |0.2727|±  |0.0407|
+|                                                 |       |acc_norm|0.2727|±  |0.0407|
+|hendrycksTest-jurisprudence                      |      1|acc     |0.2130|±  |0.0396|
+|                                                 |       |acc_norm|0.2130|±  |0.0396|
+|hendrycksTest-logical_fallacies                  |      1|acc     |0.2515|±  |0.0341|
+|                                                 |       |acc_norm|0.2515|±  |0.0341|
+|hendrycksTest-machine_learning                   |      1|acc     |0.2321|±  |0.0401|
+|                                                 |       |acc_norm|0.2321|±  |0.0401|
+|hendrycksTest-management                         |      1|acc     |0.2039|±  |0.0399|
+|                                                 |       |acc_norm|0.2039|±  |0.0399|
+|hendrycksTest-marketing                          |      1|acc     |0.1966|±  |0.0260|
+|                                                 |       |acc_norm|0.1966|±  |0.0260|
+|hendrycksTest-medical_genetics                   |      1|acc     |0.3000|±  |0.0461|
+|                                                 |       |acc_norm|0.3000|±  |0.0461|
+|hendrycksTest-miscellaneous                      |      1|acc     |0.2631|±  |0.0157|
+|                                                 |       |acc_norm|0.2631|±  |0.0157|
+|hendrycksTest-moral_disputes                     |      1|acc     |0.2457|±  |0.0232|
+|                                                 |       |acc_norm|0.2457|±  |0.0232|
+|hendrycksTest-moral_scenarios                    |      1|acc     |0.2682|±  |0.0148|
+|                                                 |       |acc_norm|0.2682|±  |0.0148|
+|hendrycksTest-nutrition                          |      1|acc     |0.2451|±  |0.0246|
+|                                                 |       |acc_norm|0.2451|±  |0.0246|
+|hendrycksTest-philosophy                         |      1|acc     |0.2605|±  |0.0249|
+|                                                 |       |acc_norm|0.2605|±  |0.0249|
+|hendrycksTest-prehistory                         |      1|acc     |0.2932|±  |0.0253|
+|                                                 |       |acc_norm|0.2932|±  |0.0253|
+|hendrycksTest-professional_accounting            |      1|acc     |0.2340|±  |0.0253|
+|                                                 |       |acc_norm|0.2340|±  |0.0253|
+|hendrycksTest-professional_law                   |      1|acc     |0.2432|±  |0.0110|
+|                                                 |       |acc_norm|0.2432|±  |0.0110|
+|hendrycksTest-professional_medicine              |      1|acc     |0.4301|±  |0.0301|
+|                                                 |       |acc_norm|0.4301|±  |0.0301|
+|hendrycksTest-professional_psychology            |      1|acc     |0.2369|±  |0.0172|
+|                                                 |       |acc_norm|0.2369|±  |0.0172|
+|hendrycksTest-public_relations                   |      1|acc     |0.2091|±  |0.0390|
+|                                                 |       |acc_norm|0.2091|±  |0.0390|
+|hendrycksTest-security_studies                   |      1|acc     |0.2408|±  |0.0274|
+|                                                 |       |acc_norm|0.2408|±  |0.0274|
+|hendrycksTest-sociology                          |      1|acc     |0.2388|±  |0.0301|
+|                                                 |       |acc_norm|0.2388|±  |0.0301|
+|hendrycksTest-us_foreign_policy                  |      1|acc     |0.2600|±  |0.0441|
+|                                                 |       |acc_norm|0.2600|±  |0.0441|
+|hendrycksTest-virology                           |      1|acc     |0.2048|±  |0.0314|
+|                                                 |       |acc_norm|0.2048|±  |0.0314|
+|hendrycksTest-world_religions                    |      1|acc     |0.2047|±  |0.0309|
+|                                                 |       |acc_norm|0.2047|±  |0.0309|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 16
+|   Task   |Version|Metric|Value |   |Stderr|
+|----------|------:|------|-----:|---|-----:|
+|winogrande|      0|acc   |0.5146|±  | 0.014|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-ehgq969i:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 16
+|Task |Version|Metric|Value|   |Stderr|
+|-----|------:|------|----:|---|-----:|
+|gsm8k|      0|acc   |    0|±  |     0|
 ## Model Details
 ## Model Card Contact
+[More Information Needed]