metadata

license: apache-2.0
datasets:
  - mandanya/logseq-query-clojure-big
language:
  - en
base_model:
  - Qwen/Qwen2.5-Coder-0.5B-Instruct
pipeline_tag: text-generation
tags:
  - code
library_name: transformers

Model for building advanced queries in Logseq

Logseq use clojure script over datalog to interact with notes. LCQ - Logseq Clojure Query.

Description of the approach

About 100 copies were collected manually and about 4500 more were created on their basis using the Qwen2.5-Coder-7B-Instruct model. The test part of the dataset (about 100 synthetic copies) are run through the model with a system prompt describing the specifics of the queries and validated by the codestral-mamba model.

SYSTEM_PROMPT = """
You should create advanced query for logseq. 
Advanced query should be written in clojure script over datalog and should starts and ends with `#+BEGIN_QUERY` and `#+END_QUERY` respectively.
You should respond only with query, without any additional information.

query may consists of:
    - :title - title of the query (required)
    - :query - query itself, usually contains :find, :where, ... (required)
    - :result-transform - transform function for the result (optional)
    - :group-by-page? (true or false, optional)
    - :collapsed? (true or false, usually false, optional)

example of respond:
#+BEGIN_QUERY
...
#+END_QUERY
"""

Results

model	overal	zero_shot	1_shot	3_shot	5_shot
Qwen2.5-Coder-0.5B-LCQ-v2	0.3333	0.3333	nan	nan	nan
Qwen2.5-Coder-0.5B-LCQ-v1	0.2963	0.2963	nan	nan	nan
Qwen2.5-Coder-7B-Instruct-AWQ	0.0586	0.0247	0.0494	0.0988	0.0617
gpt-4o	0.0401	0.0123	0.0741	0.037	0.037
gpt-4o-mini	0.034	0.0123	0.0247	0.0617	0.037
Qwen2.5-Coder-3B-Instruct	0.0278	0	0.0123	0.0617	0.037
Qwen2.5-Coder-1.5B-Instruct	0.0123	0	0	0.0123	0.037
Qwen2.5-Coder-0.5B-Instruct	0.0031	0	0	0.0123	0

How to use

I prefer to run model with sglang:

python3.11 -m venv .venv --prompt llm-inf

source .venv/bin/activate

pip install "sglang[all]"
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.3/

python3.11 -m sglang.launch_server \
--model-path mandanya/Qwen2.5-Coder-0.5B-LCQ-v2 \
--port 23335 \
--host 0.0.0.0 \
--mem-fraction-static 0.5 \
--served-model-name "Qwen2.5-Coder-0.5B-LCQ-v2"