This variant of the model has undergone reinforcement learning (RL) fine-tuning and is based on teknium/OpenHermes-2.5-Mistral-7B. The fine-tuning process utilized a preference dataset derived from HuggingFace's no robots dataset, incorporating Differential Privacy Optimization (DPO) techniques.
- Downloads last month
- 1,250
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.