Fine-Tuned Model
fjmgAI/b1-R1-Zero-3B-GGUF
Base Model
unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
Fine-Tuning Method
Fine-tuning was performed using unsloth
, an efficient fine-tuning framework optimized for low-resource environments and Huggingface's TRL library.
Dataset
Description
A Spanish-language dataset containing 15,000 examples, designed for Direct Preference Optimization (DPO) or Outcome-Regularized Preference Optimization (ORPO).
Adaptation
The dataset was adapted to a reasoning-based format for GPRO, enhancing its ability to guide preference-based decision-making during fine-tuning. This adaptation ensures better alignment with instruction-following tasks in Spanish.
Fine-Tuning Details
- The model was trained using the GPRO algorithm, leveraging structured preference data to refine its response generation.
- The model was fine-tuned to maintain its 4-bit quantization (
bnb-4bit
) for memory efficiency while aligning its outputs with the characteristics of the Spanish dataset. - The focus was on retaining the model's instructional abilities while improving its understanding and generation of Spanish text.
Purpose
This fine-tuned model is intended for Spanish-language applications that require efficient AI that follows instructions using a lightweight reasoning process.
- Developed by: fjmgAI
- License: apache-2.0
- Downloads last month
- 40