AutoDidact: Self-Improving Llama 3.1 Model
Model Description
This model is a fine-tuned version of ambrosfitz/ushistory_llama-3.1-8b, based on Meta Llama-3.1-8b model that has been trained to improve its own abilities through reinforcement learning. The model has been trained using GRPO (Generalized Reinforcement from Preference Optimization) to better follow instructions and generate high-quality responses.
Training Procedure
The model was trained using Unsloth's fast training framework with the following key parameters:
- pre-base model: meta-llama/meta-Llama-3.1-8B-Instruct
- LoRA rank: 64
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Max sequence length: 8192
- Trained on American Yawp Textbook
Intended Uses & Limitations
This model is designed for conversational AI and instruction-following tasks. It can assist with:
- Answering questions
- Generating creative content
- Providing information on a wide range of topics
Limitations
As with any LLM, this model may:
- Produce inaccurate information
- Exhibit biases present in the training data
- Be sensitive to phrasing of input prompts
License
This model is subject to the license of the original Llama 3.1 model from Meta.
- Downloads last month
- 20
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for ambrosfitz/ushistory_llama-3.1-8b-v2
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct
Quantized
ambrosfitz/ushistory_llama-3.1-8b