Phi-3.5-mini Socratic Tutor

A fine-tuned version of Phi-3.5-mini-instruct trained with DPO (Direct Preference Optimization) to teach using the Socratic method — guiding students to answers through questions rather than giving direct answers.

Training Details

Base Model unsloth/Phi-3.5-mini-instruct (3.8B)
Dataset mudit23/class7-socratic-dpo
Method DPO via Unsloth + TRL
LoRA rank 16
Beta 0.5
Learning rate 1e-6

Usage

1. Install dependencies

!pip install --no-deps "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps trl peft accelerate bitsandbytes
!pip install unsloth_zoo
!pip install mergekit
!pip install --upgrade trl
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

2. Load the model & CLI Chat

from unsloth import FastLanguageModel

model1, tokenizer1 = FastLanguageModel.from_pretrained(
    model_name     = "Bialy17/Phi3.5_socratic_dpo_unsloth",
    max_seq_length = 2048,
    dtype          = None,       # auto-detect (float16 on Colab T4)
    load_in_4bit   = True,       # saves VRAM — works fine on free T4
)

FastLanguageModel.for_inference(model1)

import torch
from transformers import TextStreamer

text_streamer = TextStreamer(tokenizer1, skip_prompt=True)

chat_history = [
    {"role": "system", "content": "You are a helpful Tutor that uses the Socratic teaching method."}
]

print("--- Phi-3.5-mini Socratic Tutor Initialized ---")
print("(Type 'quit' to exit, 'clear' to reset history)\n")

while True:
    user_input = input("Student: ").strip()

    if not user_input:
        continue

    if user_input.lower() in ("exit", "quit", "q"):
        print("Goodbye!")
        break

    if user_input.lower() == "clear":
        chat_history = [chat_history[0]]  # keep system message
        print("\n--- History Cleared ---\n")
        continue

    chat_history.append({"role": "user", "content": user_input})

    inputs = tokenizer1.apply_chat_template(
        chat_history,
        tokenize              = True,
        add_generation_prompt = True,
        return_tensors        = "pt",
    ).to("cuda")

    print("\nTutor: ", end="", flush=True)

    # Get the correct stop token for Phi-3.5
    terminators = tokenizer1.convert_tokens_to_ids(["<|end|>", "<|endoftext|>"])

    outputs = model1.generate(
        input_ids       = inputs,
        streamer        = text_streamer,
        max_new_tokens  = 1024,
        temperature     = 0.2,
        do_sample       = True,
        eos_token_id    = terminators,      # ← stops at <|end|>
        pad_token_id    = tokenizer1.eos_token_id,
    )

    raw_output = tokenizer1.decode(
        outputs[0][len(inputs[0]):],
        skip_special_tokens=True
    )
    chat_history.append({"role": "assistant", "content": raw_output})
    print("\n")

Example

Student: What is gravity?

Tutor: Gravity is described by the general theory of relativity, which is a theory of gravitation that was first proposed by Albert Einstein in 1915. According to Einstein's theory, massive objects cause a distortion in the fabric of spacetime, which we perceive as gravity.

The most familiar example of gravity is the force that keeps us on the ground, and it's the force that keeps planets in orbit around stars.

Gravity is also responsible for the formation of stars, galaxies, and the large-scale structure of the universe.

In everyday life, gravity is the force that keeps us on the ground, and it's the force that keeps planets in orbit around stars.

To summarize, gravity is a fundamental force that attracts two bodies with mass towards each other, and it's one of the four fundamental forces in the universe.


Intended Use

  • Tutoring assistants that guide rather than just answer
  • Exploring Socratic dialogue in AI systems (must add socratic guide in system prompt)

Limitations

  • Not fully tested but a good model

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bialy17/Phi3.5_socratic_dpo_unsloth

Finetuned
(94)
this model

Dataset used to train Bialy17/Phi3.5_socratic_dpo_unsloth