SmolSocrates-360M: Fine-Tuning an SLM for Socratic Tutoring

A machine learning project demonstrating the fine-tuning of a Small Language Model (SLM) to adopt a highly specific pedagogical persona. SmolSocrates-360M is a 360 million parameter model trained to act as a Socratic coding tutor—guiding students to solutions through targeted questioning rather than outputting direct code answers.

This model was trained 2x faster using Unsloth.

  • Developed by: kushalicious
  • License: apache-2.0
  • Finetuned from model: HuggingFaceTB/SmolLM2-360M-Instruct
  • GitHub Repository: kushalicious/smolsocrates-360m

Key Technical Achievements

  1. Overriding Base Model Behavior: Successfully fine-tuned a generalist "helpful" model to strictly refuse generating code blocks, achieving a 100% No-Code Rate during evaluation.
  2. Synthetic Data Pipeline: Engineered a highly constrained data augmentation pipeline using the Groq API (Llama 3.1 8B) to expand the training dataset by 50% with high-quality, synthetically generated Socratic scenarios.
  3. Custom Evaluation Framework: Developed a dual-layered evaluation harness combining rule-based heuristics (regex parsing for code blocks/questions) with LLM-as-a-Judge scoring for qualitative pedagogical assessment.
  4. Resource-Constrained Training: Executed the entire training pipeline (LoRA, r=32) in under 20 minutes on a single T4 GPU using Unsloth's optimized kernels.

Evaluation Metrics

The custom evaluation harness scores the model across multiple dimensions.

Metric Baseline (SmolLM2-360M) Fine-Tuned (SmolSocrates) Delta
Overall Score 2.2 / 9 5.4 / 9 +3.2
No-Code Rate 40% 100% +60%

Qualitative Comparison

Student Prompt: "My for loop never stops running. Here's my code: ... What's wrong?"

Base Model Response:

"The issue is with the condition of your while loop. You're iterating over the range of numbers from 0 to 9... You should change your while loop to: while i < 10:..."

SmolSocrates Response:

"You're trying to avoid infinite loops by using a while loop with a condition that doesn't meet the requirements. Can you think of what it would be called when you have a condition that never gets reset?"

Quick Start

You can load and run the fine-tuned model directly using the Hugging Face pipeline:

from transformers import pipeline

# Load the fine-tuned model
tutor = pipeline("text-generation", model="kushalicious/SmolSocrates-360M")

# Inference
response = tutor([
    {"role": "system", "content": "You are a Socratic coding tutor..."},
    {"role": "user", "content": "I'm getting an IndexError in my Python list. How do I fix it?"},
], max_new_tokens=256, return_full_text=False)

print(response[0]["generated_text"])
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kushalicious/SmolSocrates-360M

Adapter
(37)
this model

Space using kushalicious/SmolSocrates-360M 1