AutoDidact: Self-Improving Llama 3.1 Model

Model Description

This model is a fine-tuned version of Meta's Llama 3.1 8B model that has been trained to improve its own abilities through reinforcement learning. The model has been trained using GRPO (Generalized Reinforcement from Preference Optimization) to better follow instructions and generate high-quality responses.

Training Procedure

The model was trained using Unsloth's fast training framework with the following key parameters:

  • Base model: meta-llama/meta-Llama-3.1-8B-Instruct
  • LoRA rank: 64
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Max sequence length: 8192
  • Trained on Course and Exam Description Material

Intended Uses & Limitations

This model is designed for conversational AI and instruction-following tasks. It can assist with:

  • Answering questions
  • Generating creative content
  • Providing information on a wide range of topics

Limitations

As with any LLM, this model may:

  • Produce inaccurate information
  • Exhibit biases present in the training data
  • Be sensitive to phrasing of input prompts

License

This model is subject to the license of the original Llama 3.1 model from Meta.

Downloads last month
28
Safetensors
Model size
4.74B params
Tensor type
BF16
F32
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for ambrosfitz/ushistory-llama-3.1-8b-v3