EPFL-CS-552-General-Knowledge
Collection
Repository for the EPFL-CS-552 (Modern Natural Language Processing) Project: Post-training Qwen3-1.7B in the General Knowledge domain. • 19 items • Updated
Project: Post-Training a pre-trained model to be able to answer closed-book factual questions in the domain of general knowledge.
Format requirement: The model should enclose its final answer in \boxed{}.
Context: Modern Natural Language Processing course (EPFL, CS-552).
Qwen3-1.7B post-trained using RLVR with GRPO on MMLU training set (3000 samples).
Used LoRA with following config:
lora_config = LoraConfig(
r = 16,
lora_alpha = 32,
lora_dropout = 0.0,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
bias="none",
task_type="CAUSAL_LM",
)
Training arguments:
training_args = GRPOConfig(
output_dir = OUTPUT_DIR,
learning_rate = 1e-6,
num_generations = N_GROUP,
max_steps = 1500,
per_device_train_batch_size = 1,
gradient_accumulation_steps = 8,
max_completion_length = 3090,
use_vllm = True,
vllm_gpu_memory_utilization = 0.55,
#vllm_max_model_len = 4096,
)
Finally:
merged_model.generation_config = GenerationConfig(
bos_token_id=tokenizer.bos_token_id if tokenizer.bos_token_id is not None else tokenizer.eos_token_id,
do_sample=True,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
temperature=0.6,
top_k=20,
top_p=0.95,
)
RESULTS
For all results here after, 4K tokens allowed at inference.
| Model | Accuracy (%) (Internal Benchmark) |
Missing \boxed{} (%) (Internal Benchmark) |
|---|---|---|
| Pre-trained Qwen3-1.7B | 68.94 | 1.0 |
| GRPO post-trained model | 64.68% | 8.9% |