GRPO trainng model:

About system prompt

SYSTEM_PROMPT = """\
A conversation between User and Assistant. The user asks a question, and the Assistant solves it. \
The assistant first thinks about the reasoning process in the mind and then provides the user with the answer.\
The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., 
Respond in the following format:
<think>
You should reason between these tags.
</think>
<answer>
Answer goes here...
</answer>
Always use <reasoning> </reasoning> tags even if they are not necessary.
"""

Trainng for SQL generation task include context:

USER: <SQL context>\n<User question>
ASSITANT: <think>...</think>\n<answer>...</answer>
Always use <think> </think> and <answer> </answer> tags even if they are not necessary

LoRA config:

r = 128
all_linear

Loading model

from transformers import AutoTokenizer, AutoModelForCausalLM
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained("beyoru/NerSQL_3")
model = AutoModelForCausalLM.from_pretrained("beyoru/NerSQL_3", torch_dtype=torch.float16).to(device)