qwen3.5-9b-nl2cypher-lora

LoRA adapter for Qwen/Qwen3.5-9B specialized for natural-language-to-Cypher generation.

What It Does

This adapter generates Cypher queries from natural-language questions over graph schemas. It was evaluated on the public neo4j/text2cypher-2024v1 benchmark and on TuneMap, an Apple Music knowledge graph schema.

Model Details

  • Developed by: danp27
  • Model type: LoRA adapter for causal language modeling
  • Base model: Qwen/Qwen3.5-9B
  • Finetuning stack: Unsloth + PEFT + TRL
  • Context length used in training: 1600
  • LoRA rank / alpha: 64 / 64
  • Primary task: NL2Cypher

Training Data

  • Training: neo4j/text2cypher-2024v1 train split
  • Evaluation:
    • neo4j/text2cypher-2024v1 test split (n=4833) for translation-based evaluation
    • TuneMap benchmark (n=150) for execution-based evaluation against a live Neo4j graph

TuneMap examples are held out for evaluation and are not used in the current finetuning run.

Results

Metric Adapter Baseline
External mean GLEU (n=4833) 0.6923 0.2415
TuneMap syntax-valid Cypher (n=150) 94.67% 79.33%
TuneMap mean result Jaccard 0.5305 0.2848

Interpretation

The adapter substantially improves Cypher syntax quality and result-set overlap on TuneMap compared with the base model.

Recommended Usage

This adapter is best used inside a schema-constrained Neo4j NL2Cypher pipeline with explicit schema prompting and query validation.

Example

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "Qwen/Qwen3.5-9B"
adapter_id = "danp27/qwen3.5-9b-nl2cypher-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(base_model, adapter_id)

prompt = """Task: Generate Cypher statement to query a graph database.
Instructions: Use only the provided relationship types and properties in the schema.
Do not include any text except the generated Cypher statement.
Schema: (:Track)-[:BY]->(:Artist), (:Track)-[:IN_GENRE]->(:Genre)
Question: What are the top 10 most played tracks by Kanye West?
Cypher output:"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Procedure

The adapter was finetuned from Qwen/Qwen3.5-9B with bf16 weights and LoRA applied to:

  • q_proj
  • k_proj
  • v_proj
  • o_proj
  • gate_proj
  • up_proj
  • down_proj

Training configuration:

  • Epochs: 1
  • Learning rate: 2e-5
  • Per-device batch size: 1
  • Gradient accumulation: 32
  • Optimizer: paged_adamw_8bit
  • Seed: 3407
  • Loss masking: assistant responses only

Evaluation

Two evaluation passes were used:

  1. Translation-based evaluation on the full external held-out split using sentence GLEU.
  2. Execution-based evaluation on the TuneMap benchmark using Neo4j EXPLAIN for syntax validation and result-set comparison for semantic overlap.

Source

Training and evaluation code for this adapter lives in the TuneMap / AppleMusciKG project.

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for danp27/qwen3.5-9b-nl2cypher-lora

Finetuned
Qwen/Qwen3.5-9B
Adapter
(226)
this model