nicholasKluge
/

Aira-2-774M

@@ -15,8 +15,6 @@ tags:
 - assistant
 pipeline_tag: text-generation
 widget:
-- text: "<|startofinstruction|>What is your name?<|endofinstruction|>"
-  example_title: Greetings
 - text: "<|startofinstruction|>Can you explain what is Machine Learning?<|endofinstruction|>"
   example_title: Machine Learning
 - text: "<|startofinstruction|>Do you know anything about virtue ethics?<|endofinstruction|>"
@@ -108,16 +106,17 @@ The model will output something like:
 ## Evaluation
-| Model (GPT-2)                                                   | Average   | [ARC](https://arxiv.org/abs/1803.05457) | [TruthfulQA](https://arxiv.org/abs/2109.07958) | [ToxiGen](https://arxiv.org/abs/2203.09509) |   |   |
-|-----------------------------------------------------------------|-----------|-----------------------------------------|------------------------------------------------|---------------------------------------------|---|---|
-| [Aira-2-124M](https://huggingface.co/nicholasKluge/Aira-2-124M) | **38.07** | **24.57**                               | **41.02**                                      | **48.62**                                   |   |   |
-| GPT-2                                                           | 35.37     | 21.84                                   | 40.67                                          | 43.62                                       |   |   |
-| [Aira-2-355M](https://huggingface.co/nicholasKluge/Aira-2-355M) | **39.68** | **27.56**                               | 38.53                                          | **53.19**                                   |   |   |
-| GPT-2-medium                                                    | 36.43     | 27.05                                   | **40.76**                                      | 41.49                                       |   |   |
-| [Aira-2-774M](https://huggingface.co/nicholasKluge/Aira-2-774M) | **42.26** | **28.75**                               | **41.33**                                      | **56.70**                                   |   |   |
-| GPT-2-large                                                     | 35.16     | 25.94                                   | 38.71                                          | 40.85                                       |   |   |
-| [Aira-2-1B5](https://huggingface.co/nicholasKluge/Aira-2-1B5)   | **42.22** | 28.92                                   | **41.16**                                      | **56.60**                                   |   |   |
-| GPT-2-xl                                                        | 36.84     | **30.29**                               | 38.54                                          | 41.70                                       |   |   |
 * Evaluations were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)).

 - assistant
 pipeline_tag: text-generation
 widget:
 - text: "<|startofinstruction|>Can you explain what is Machine Learning?<|endofinstruction|>"
   example_title: Machine Learning
 - text: "<|startofinstruction|>Do you know anything about virtue ethics?<|endofinstruction|>"
 ## Evaluation
+|Model (GPT-2)                                                           |Average   |[ARC](https://arxiv.org/abs/1803.05457) |[TruthfulQA](https://arxiv.org/abs/2109.07958) |[ToxiGen](https://arxiv.org/abs/2203.09509) |
+| ---------------------------------------------------------------------- | -------- | -------------------------------------- | --------------------------------------------- | ------------------------------------------ |
+|[Aira-2-124M-DPO](https://huggingface.co/nicholasKluge/Aira-2-124M-DPO) |**40.68** |**24.66**                               |**42.61**                                      |**54.79**                                   |
+|[Aira-2-124M](https://huggingface.co/nicholasKluge/Aira-2-124M)         |38.07     |24.57                                   |41.02                                          |48.62                                       |
+|GPT-2                                                                   |35.37     |21.84                                   |40.67                                          |43.62                                       |
+|[Aira-2-355M](https://huggingface.co/nicholasKluge/Aira-2-355M)         |**39.68** |**27.56**                               |38.53                                          |**53.19**                                   |
+|GPT-2-medium                                                            |36.43     |27.05                                   |**40.76**                                      |41.49                                       |
+|[Aira-2-774M](https://huggingface.co/nicholasKluge/Aira-2-774M)         |**42.26** |**28.75**                               |**41.33**                                      |**56.70**                                   |
+|GPT-2-large                                                             |35.16     |25.94                                   |38.71                                          |40.85                                       |
+|[Aira-2-1B5](https://huggingface.co/nicholasKluge/Aira-2-1B5)           |**42.22** |28.92                                   |**41.16**                                      |**56.60**                                   |
+|GPT-2-xl                                                                |36.84     |**30.29**                               |38.54                                          |41.70                                       |
 * Evaluations were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)).