GRPO trainng model:
About system prompt
SYSTEM_PROMPT = """\
A conversation between User and Assistant. The user asks a question, and the Assistant solves it. \
The assistant first thinks about the reasoning process in the mind and then provides the user with the answer.\
The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e.,
Respond in the following format:
<think>
You should reason between these tags.
</think>
<answer>
Answer goes here...
</answer>
Always use <reasoning> </reasoning> tags even if they are not necessary.
"""
Trainng for SQL generation task include context:
USER: <SQL context>\n<User question>
ASSITANT: <think>...</think>\n<answer>...</answer>
Always use <think> </think> and <answer> </answer> tags even if they are not necessary
LoRA config:
r = 128
all_linear
Loading model
from transformers import AutoTokenizer, AutoModelForCausalLM
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained("beyoru/NerSQL_3")
model = AutoModelForCausalLM.from_pretrained("beyoru/NerSQL_3", torch_dtype=torch.float16).to(device)
- Downloads last month
- 16
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.