cookinai's picture
Update README.md
fe316c8 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - llama
  - trl
  - sft
base_model: unsloth/llama-3-8b-bnb-4bit

Llama 3 finetuned on my TRRR-CoT Dataset

cookinai/TRRR-CoT

  • This was an attempt at synthetically generating a CoT dataset and then finetuning it on a model to see its reuslts.
  • From what I notice, when using the correct prompt template the model almost always ues the TRRR format, but I am still awaiting benchmark tests to see if this can improve anything
  • TRR stand for:
  1. Think, about your response
  2. Respond, how you normally would
  3. Reflect, on your response
  4. Respond, again but this time use all the information you have now
  • The mode usually tries to follow this format, it may mix it up a little but usually it almost always reflects in someway. Especially if you tell it to think step by step

  • Intrestingly enough, when finetuned on mistral 7b, I could not get the model CoT at all, with only one epoch llama 3 got it instantly

  • Developed by: cookinai

  • License: apache-2.0

  • Finetuned from model : unsloth/llama-3-8b-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.