AutoDidact: Self-Improving Llama 3.1 Model

Model Description

This model is a fine-tuned version of Meta's Llama 3.1 8B model that has been trained to improve its own abilities through reinforcement learning. The model has been trained using GRPO (Generalized Reinforcement from Preference Optimization) to better follow instructions and generate high-quality responses.

Training Procedure

The model was trained using Unsloth's fast training framework with the following key parameters:

Base model: meta-llama/meta-Llama-3.1-8B-Instruct
LoRA rank: 64
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Max sequence length: 8192
Trained on Course and Exam Description Material

Intended Uses & Limitations

This model is designed for conversational AI and instruction-following tasks. It can assist with:

Answering questions
Generating creative content
Providing information on a wide range of topics

Limitations

As with any LLM, this model may:

Produce inaccurate information
Exhibit biases present in the training data
Be sensitive to phrasing of input prompts

License

This model is subject to the license of the original Llama 3.1 model from Meta.