Text2Cypher โ€” SmolLM2-135M Fine-tuned

A fine-tuned version of SmolLM2-135M-Instruct that generates Cypher queries from natural language questions and a graph schema.

Model Details

  • Base model: HuggingFaceTB/SmolLM2-135M-Instruct
  • Model type: Causal Language Model
  • Language: English
  • License: Apache 2.0
  • Finetuned by: Anugya Sahu

Training Data

  • Dataset: RomanTeucher/text2cypher-curated
  • 1000 training samples, 75 validation, 50 test
  • Each sample contains a graph schema, a natural language question, and a target Cypher query

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Anugya/text2cypher-smollm2")
tokenizer = AutoTokenizer.from_pretrained("Anugya/text2cypher-smollm2")
tokenizer.pad_token = tokenizer.eos_token

schema = "Movie {title, year}, Person {name}, (Person)-[:DIRECTED]->(Movie)"
question = "Which movies did Christopher Nolan direct before 2010?"

prompt = f"""### Schema:
{schema}

### Question:
{question}

### Cypher:"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128, do_sample=False)
generated = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))

Training Details

  • Full fine-tune โ€” all weights updated, no LoRA
  • Epochs: 3
  • Learning rate: 2e-4
  • Batch size: 4
  • Max token length: 256
  • Hardware: CPU (Apple M-series)
  • Precision: float32

Evaluation

Evaluated on 50 test samples using:

  • Exact Match โ€” strict comparison after lowercasing and stripping
  • Token F1 โ€” token overlap between prediction and ground truth

Limitations

  • 135M parameter model โ€” generates Cypher that looks right but often isn't
  • No query execution validation against a real Neo4j database
  • May struggle with complex schemas or multi-hop queries
  • Trained on CPU with limited epochs โ€” larger training would improve results
Downloads last month
150
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Anugya/text2cypher-smollm2

Finetuned
(342)
this model

Dataset used to train Anugya/text2cypher-smollm2