Instructions to use thanhdath/embedding-0.6b-spider2.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use thanhdath/embedding-0.6b-spider2.0 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("thanhdath/embedding-0.6b-spider2.0") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
embedding-0.6b-spider2.0
Bi-encoder column retriever for text-to-SQL schema linking (Stage-I candidate retrieval).
Qwen3-Embedding-0.6B fine-tuned with InfoNCE (LoRA r=8/α=32, merged), max_length=1024, 1 epoch.
Training data: thanhdath/embedding-0.6b-spider2.0-data
— 39,238 (question, gold-columns, hard-negatives) groups from BIRD train + Spider train +
Spider 2.0 synthetic (BigQuery/Snowflake + SQL-Gen). No SynSQL.
| source | rows |
|---|---|
| BIRD train | 9,356 |
| Spider train | 8,386 |
| Spider 2.0 synth (BQ/SF) | 17,693 |
| Spider 2.0 synth (SQL-Gen) | 3,803 |
Results (column recall@K vs the previous embedding ckpt-3000)
BIRD dev (n=1521), flat: R@50 0.959 (old 0.875), R@100 0.995 (0.976), R@200 1.000. Spider 2.0-233q, two-stage top-50 tables → top-K cols: R@300 0.904 (old 0.876), R@500 0.930 (0.903), R@800 0.956 (0.937). Spider 2.0-233q, flat (shard-collapsed): R@500 0.954 (0.934), R@800 0.974 (0.959). Beats the previous checkpoint on every operating point.
Usage (vLLM embedding server)
vllm serve thanhdath/embedding-0.6b-spider2.0 --task embed --port 8001 --max-model-len 4096
Score = dot product between the question embedding and each column-description embedding
(table.column ; Table meaning … ; Column meaning … ; type … ; has values …); take top-K.
- Downloads last month
- 13