Model

Developed by: DARJYO
Base Type: Fine-tuned language model
Finetuned model : persadian_14B-GRPO
Base Architecture: Transformer-based/Phi-4

This model is fine-tuned on datasets for tasks with Unsloth and Huggingface's TRL library. It is based on the unsloth/Phi-4 model and uses reinforcement learning for improved performance.

Downloads last month: 4

GGUF

Hardware compatibility

View all variants

Video Preview

Reinforcement Learning

Model tree for DARJYO/persadian_14B-GRPO

Base model

microsoft/phi-4

Finetuned

unsloth/phi-4

Quantized

(24)

this model

Quantizations

1 model