Phi-3.5-mini Socratic Tutor

A fine-tuned version of Phi-3.5-mini-instruct trained with DPO (Direct Preference Optimization) to teach using the Socratic method — guiding students to answers through questions rather than giving direct answers.

Training Details


Base Model	unsloth/Phi-3.5-mini-instruct (3.8B)
Dataset	mudit23/class7-socratic-dpo
Method	DPO via Unsloth + TRL
LoRA rank	16
Beta	0.5
Learning rate	1e-6

Usage

1. Install dependencies

!pip install --no-deps "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps trl peft accelerate bitsandbytes
!pip install unsloth_zoo
!pip install mergekit
!pip install --upgrade trl
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

2. Load the model & CLI Chat

from unsloth import FastLanguageModel

model1, tokenizer1 = FastLanguageModel.from_pretrained(
    model_name     = "Bialy17/Phi3.5_socratic_dpo_unsloth",
    max_seq_length = 2048,
    dtype          = None,       # auto-detect (float16 on Colab T4)
    load_in_4bit   = True,       # saves VRAM — works fine on free T4
)

FastLanguageModel.for_inference(model1)

import torch
from transformers import TextStreamer

text_streamer = TextStreamer(tokenizer1, skip_prompt=True)

chat_history = [
    {"role": "system", "content": "You are a helpful Tutor that uses the Socratic teaching method."}
]

print("--- Phi-3.5-mini Socratic Tutor Initialized ---")
print("(Type 'quit' to exit, 'clear' to reset history)\n")

while True:
    user_input = input("Student: ").strip()

    if not user_input:
        continue

    if user_input.lower() in ("exit", "quit", "q"):
        print("Goodbye!")
        break

    if user_input.lower() == "clear":
        chat_history = [chat_history[0]]  # keep system message
        print("\n--- History Cleared ---\n")
        continue

    chat_history.append({"role": "user", "content": user_input})

    inputs = tokenizer1.apply_chat_template(
        chat_history,
        tokenize              = True,
        add_generation_prompt = True,
        return_tensors        = "pt",
    ).to("cuda")

    print("\nTutor: ", end="", flush=True)

    # Get the correct stop token for Phi-3.5
    terminators = tokenizer1.convert_tokens_to_ids(["<|end|>", "<|endoftext|>"])

    outputs = model1.generate(
        input_ids       = inputs,
        streamer        = text_streamer,
        max_new_tokens  = 1024,
        temperature     = 0.2,
        do_sample       = True,
        eos_token_id    = terminators,      # ← stops at <|end|>
        pad_token_id    = tokenizer1.eos_token_id,
    )

    raw_output = tokenizer1.decode(
        outputs[0][len(inputs[0]):],
        skip_special_tokens=True
    )
    chat_history.append({"role": "assistant", "content": raw_output})
    print("\n")

Example

Student: What is gravity?

Tutor: Gravity is described by the general theory of relativity, which is a theory of gravitation that was first proposed by Albert Einstein in 1915. According to Einstein's theory, massive objects cause a distortion in the fabric of spacetime, which we perceive as gravity.

The most familiar example of gravity is the force that keeps us on the ground, and it's the force that keeps planets in orbit around stars.

Gravity is also responsible for the formation of stars, galaxies, and the large-scale structure of the universe.

In everyday life, gravity is the force that keeps us on the ground, and it's the force that keeps planets in orbit around stars.

To summarize, gravity is a fundamental force that attracts two bodies with mass towards each other, and it's one of the four fundamental forces in the universe.

Intended Use

Tutoring assistants that guide rather than just answer
Exploring Socratic dialogue in AI systems (must add socratic guide in system prompt)

Limitations

Not fully tested but a good model

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bialy17/Phi3.5_socratic_dpo_unsloth

Base model

microsoft/Phi-3.5-mini-instruct

Finetuned

unsloth/Phi-3.5-mini-instruct