metadata

language:
  - en
license: apache-2.0
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - llama
  - trl
  - sft
base_model: unsloth/llama-3-8b-bnb-4bit

Llama 3 finetuned on my TRRR-CoT Dataset

cookinai/TRRR-CoT

This was an attempt at synthetically generating a CoT dataset and then finetuning it on a model to see its reuslts.
From what I notice, when using the correct prompt template the model almost always ues the TRRR format, but I am still awaiting benchmark tests to see if this can improve anything
TRR stand for:

Think, about your response
Respond, how you normally would
Reflect, on your response
Respond, again but this time use all the information you have now

The mode usually tries to follow this format, it may mix it up a little but usually it almost always reflects in someway. Especially if you tell it to think step by step
Intrestingly enough, when finetuned on mistral 7b, I could not get the model CoT at all, with only one epoch llama 3 got it instantly
Developed by: cookinai
License: apache-2.0
Finetuned from model : unsloth/llama-3-8b-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.