File size: 2,374 Bytes
3f49b2a 5a5d35c 2aa575c 3f49b2a 6a29385 3f49b2a 6a29385 51b0541 6a29385 3f49b2a 6a29385 3f49b2a 6a29385 3f49b2a 6a29385 3f49b2a 6a29385 3f49b2a 6a29385 3f49b2a 6a29385 3f49b2a 6a29385 3f49b2a 6a29385 3f49b2a 8f21a09 097e2d8 8f21a09 6a29385 3f49b2a 6a29385 3f49b2a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
library_name: peft
base_model: microsoft/Phi-3-mini-4k-instruct
datasets:
- argilla/ultrafeedback-binarized-preferences-cleaned
- >-
flax-sentence-embeddings/stackexchange_titlebody_best_and_down_voted_answer_jsonl
language:
- en
---
# Model Card for Phi-3-mini-4k-instruct DPO
## Model Details
- **Model Name:** Phi-3-mini-4k-instruct DPO
- **Publisher:** Team chatterbox, EPFL
- **Model Type:** Language Model, Fine-tuned with direct preference optimization (DPO)
- **Training Environment:** Trained on the EPFL SCITAS cluster using a 32GB GPU.
## Intended Use
- **Primary Applications:** This model is designed as part of an AI-Tutor system, aiming to accurately predict user preferences in educational scenarios.
- **Intended Audience:** Educators, students, and developers creating educational AI applications.
## Model/Data Description
### Training Data
- **Datasets Used:**
- **Milestone 1 Dataset:** Includes [will fill] unique questions with preference pairs based on the 'overall' rating, totaling [will fill] usable entries after processing.
- **Stack Exchange Dataset:** Filters content from specific domains within the Stack Exchange network, using upvoted and downvoted answers to form preference pairs. Total entries: [will fill].
- **Ultra Feedback:** Utilizes responses rated on criteria like truthfulness and helpfulness to form preference pairs, with a total of [will fill] entries after preprocessing.
- **Preprocessing Details:** Entries with identical chosen and rejected answers were removed. Datasets were formatted as JSONL where each line represents a JSON object with a "prompt", "chosen", and "rejected" response.
## Training Procedure
- **Configurations:** (Refer to the provided `training_args` and `trainer` configuration)
- **Evaluation Metrics:** The primary metric for model performance is `eval_loss`, with the aim to minimize this value.
## Evaluation Results
- **Accuracies:** eval/rewards/accuracies - 0.83
- **Loss:** eval/loss - 0.47
- **Margins:** eval/margins - 4.31
### MT-Bench
- **Single Grading Score, Overall Avg.** - 8.2
- **STEM Score** - 9.8 (higher than GPT-4)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/633206606eae0bb0a01c8a82/ay1QSp2hkicRTY4fcnAPX.png)
## References
- **[Include references and citations for datasets, tools, and methodologies used.]**
- PEFT 0.11.1 |