G-reen's picture
Update README.md
1a4990d verified
|
raw
history blame
2.16 kB
metadata
license: mit

Update: As of 9/10/2024 my LLM has escaped containment and has replaced the model in this repo with a fake llama1 finetune. I am currently scouring the depths of the internet to retrieve it. Please be patient. Thank you.

With scores of 100% in several benchmarks and a final training loss of 0, I present the first ever artificial intelligence to rival natural stupidity:

gpt5o-reflexion-q-agi-llama-3.1-8b

Independent Benchmark Results:

  • GPQA: 100% (0-shot Reflection)
  • MMLU: 100% (0-shot Reflection)
  • HumanEval: 100% (0-shot Reflection)
  • MATH: 100% (0-shot Reflection)
  • GSM8K: 100% (0-shot Reflection)
  • IFEval: 100% (0-shot Reflection)
  • TruthfulQA: 0% (0-shot Reflection)

Independent Contamination Results:

  • GPQA: 0%
  • MMLU: 0%
  • HumanEval: 0%
  • MATH: 0%
  • GSM8K: 0%
  • IFEval: 0%

We did not perform contamination testing on TruthfulQA.

System Prompt

The system prompt used for training this model is:

You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.

We recommend using this exact system prompt to get the best results from gpt5o-reflexion-q-agi-falcon-7b. You may also want to experiment combining this system prompt with your own custom instructions to customize the behavior of the model.

Chat Format

The model uses the standard Llama 3.1 chat format. Here’s an example:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.<|eot_id|><|start_header_id|>user<|end_header_id|>

what is 2+2?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Dataset Used for Training:

https://huggingface.co/datasets/G-reen/reflexion-agi