Phi-3.5-mini-instruct-o1

Phi-3.5-mini-instruct-o1 is a fine-tuned version of the microsoft/Phi-3.5-mini-instruct model, optimized for enhanced reasoning capabilities and robustness.

Model Overview

Phi-3.5-mini-instruct-o1 is built upon the Phi-3.5-mini model, which is a lightweight, state-of-the-art open model with 3.8B parameters. The base model supports a 128K token context length and has undergone rigorous enhancement processes to ensure precise instruction adherence and robust safety measures.

Features

Enhanced Reasoning Process: The model excels at providing clear and traceable reasoning paths, making it easier to follow its thought process and identify any potential mistakes.
Improved Multistep Reasoning: Fine-tuned with O1 data, the model should have enhanced capabilities in multistep reasoning and overall accuracy.
Specialized Capabilities: Particularly well-suited for tasks involving math, coding, and logic, aligning with the strengths of the Phi-3.5 model family.
Robust Performance: Fine-tuned with high dropout rates to increase resilience and generalization capabilities.

Limitations

Verbose Outputs: As a chain-of-thought model, responses may be longer and more detailed than necessary for some applications.
Potential Context Length Reduction: The fine-tuning process may have affected the full 128K token context length supported by the base model.
Quantization Challenges: Standard llama.cpp quantizations, including 8-bit versions, are not compatible with this model.

Training Details

The fine-tuning process for Phi-3.5-mini-instruct-o1 employed the following techniques and parameters:

Method: Low-Rank Adaptation (LoRA) with 4-bit quantization via BitsAndBytes
Dataset: O1-OPEN/OpenO1-SFT
Batch Size: 1 with 8 gradient accumulation steps
Learning Rate: 5e-5
Training Duration: Single epoch, limited to 10,000 samples
LoRA Configuration: Rank 32, alpha 64, dropout 0.9
Advanced Techniques: Shift attention, DoRA, RS-LoRA
Compute Type: BF16
Context Length: 2048 tokens
Optimizer: AdamW with cosine learning rate scheduling
Additional Enhancement: NEFTune with alpha 5

This fine-tuning approach was designed to efficiently adapt the model while maintaining its generalization capabilities and computational efficiency.

Intended Use

Phi-3.5-mini-instruct-o1 is suitable for commercial and research applications that require:

Detailed reasoning and problem-solving in math, coding, and logic tasks
Transparent thought processes for analysis and debugging
Robust performance in various scenarios
Efficient operation in memory/compute constrained environments

Ethical Considerations

Users should be aware of potential biases in the model's outputs and exercise caution when deploying it in sensitive applications. Always verify the model's results, especially for critical decision-making processes.

agentlans
/

Phi-3.5-mini-instruct-o1