euclaise
/

Memphis-CoT-3B

Text Generation

supertrainer2000

Model card Files Files and versions Community

euclaise commited on Jan 30, 2024

Commit

cb86a0f

·

verified ·

1 Parent(s): c297e2a

Update README.md

Files changed (1) hide show

README.md +24 -1

README.md CHANGED Viewed

@@ -7,6 +7,9 @@ datasets:
 library_name: transformers
 tags:
 - supertrainer2000
 ---
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64137e2150358a805203cbac/DlTWku8gant1yx6NaxqJX.png)
@@ -43,7 +46,7 @@ The format for reddit-instruct and oasst2 was:
 ...
 ```
-The format for TinyCot was:
 ```
 ### User:
 [insert instruction here]
@@ -53,6 +56,26 @@ The format for TinyCot was:
 [insert direct answer here]
 ```
 ## Hyperparameters
 For the initial supervised finetuning step:

 library_name: transformers
 tags:
 - supertrainer2000
+- human-data
+metrics:
+- accuracy
 ---
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64137e2150358a805203cbac/DlTWku8gant1yx6NaxqJX.png)
 ...
 ```
+The format for TinyCoT was:
 ```
 ### User:
 [insert instruction here]
 [insert direct answer here]
 ```
+## Benchmarks
+| Model                                                                  | Size   | Data                | Method        | GSM8K (5-shot) | AGIEval (English/Nous subset, acc_norm) |
+|:-----------------------------------------------------------------------|--------|:--------------------|---------------|:---------------|:----------------------------------------|
+| [StableLM 3B Base](https://hf.co/stabilityai/stablelm-zephyr-3b)       | 3B     | Base                | Base          |    2.05%       | 25.14%                                  |
+| [StableHermes 3B](https://hf.co/cxllin/StableHermes-3b)                | 3B     | GPT                 | SFT           |    3.64%       | 24.31%                                  |
+| [MPT 7B Instruct](mosaicml/mpt-7b-instruct)                            | **7B** | **Human+Anthropic** | SFT           |    2.05%       | 24.12%                                  |
+| [OpenLLaMA 7B v2 open-instruct](http://hf.co/VMware/open-llama-7b-v2-open-instruct) | **7B** | **Human** (nearly: ecqa is an exception) | SFT | 8.64% | 23.21%                   |
+| [StableLM Zephyr 3B](https://hf.co/stabilityai/stablelm-zephyr-3b)     | 3B     | GPT                 | DPO           |    **45.72%**  | **33.31%**                              |
+| **[Memphis-CoT 3B](https://hf.co/euclaise/memphis-cot-3b)**            | 3B     | **Human**           | Self-teaching |    13.8%       | *26.24%*                                |
+Memphis outperforms human-data models that are over twice its size, along with SFT models of its size, but doesn't quite reach the performance of the Zephyr DPO model. That said, Zephyr uses synthetic data, and *much* more of it.
+Notes:
+- Evaluations were performed using the `agieval` branch of [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) (commit `0bef5c9c273b1c2f68e6018d4bb9c32b9aaff298`), using the `vllm` model.
+- I tried to find human-data-trained StableLM models, but couldn't find any. I did find a few OpenLLaMA models, but they wouldn't load with LM Eval Harness and vllm.
+- OpenLLaMA 7B v2 open-instruct is a particularly relevant comparison, as it was trained on a *very* similar dataset.
 ## Hyperparameters
 For the initial supervised finetuning step: