qwen3.5-9b-nl2cypher-lora

LoRA adapter for Qwen/Qwen3.5-9B specialized for natural-language-to-Cypher generation.

What It Does

This adapter generates Cypher queries from natural-language questions over graph schemas. It was evaluated on the public neo4j/text2cypher-2024v1 benchmark and on TuneMap, an Apple Music knowledge graph schema.

Model Details

Developed by: danp27
Model type: LoRA adapter for causal language modeling
Base model: Qwen/Qwen3.5-9B
Finetuning stack: Unsloth + PEFT + TRL
Context length used in training: 1600
LoRA rank / alpha: 64 / 64
Primary task: NL2Cypher

Training Data

Training: neo4j/text2cypher-2024v1 train split
Evaluation:
- neo4j/text2cypher-2024v1 test split (n=4833) for translation-based evaluation
- TuneMap benchmark (n=150) for execution-based evaluation against a live Neo4j graph

TuneMap examples are held out for evaluation and are not used in the current finetuning run.

Results

Metric	Adapter	Baseline
External mean GLEU (`n=4833`)	0.6923	0.2415
TuneMap syntax-valid Cypher (`n=150`)	94.67%	79.33%
TuneMap mean result Jaccard	0.5305	0.2848

Interpretation

The adapter substantially improves Cypher syntax quality and result-set overlap on TuneMap compared with the base model.

Recommended Usage

This adapter is best used inside a schema-constrained Neo4j NL2Cypher pipeline with explicit schema prompting and query validation.

Example

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "Qwen/Qwen3.5-9B"
adapter_id = "danp27/qwen3.5-9b-nl2cypher-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(base_model, adapter_id)

prompt = """Task: Generate Cypher statement to query a graph database.
Instructions: Use only the provided relationship types and properties in the schema.
Do not include any text except the generated Cypher statement.
Schema: (:Track)-[:BY]->(:Artist), (:Track)-[:IN_GENRE]->(:Genre)
Question: What are the top 10 most played tracks by Kanye West?
Cypher output:"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Procedure

The adapter was finetuned from Qwen/Qwen3.5-9B with bf16 weights and LoRA applied to:

q_proj
k_proj
v_proj
o_proj
gate_proj
up_proj
down_proj

Training configuration:

Epochs: 1
Learning rate: 2e-5
Per-device batch size: 1
Gradient accumulation: 32
Optimizer: paged_adamw_8bit
Seed: 3407
Loss masking: assistant responses only

Evaluation

Two evaluation passes were used:

Translation-based evaluation on the full external held-out split using sentence GLEU.
Execution-based evaluation on the TuneMap benchmark using Neo4j EXPLAIN for syntax validation and result-set comparison for semantic overlap.

Source

Training and evaluation code for this adapter lives in the TuneMap / AppleMusciKG project.

Downloads last month: 17

Model tree for danp27/qwen3.5-9b-nl2cypher-lora

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Adapter

(226)

this model