Uploaded model

Pensez: Less Data, Better Reasoning – Rethinking French LLM

About | How to Run Locally | Models and Datasets | Benchmarks | Training Details

image/png

About

Pensez is a bilingual (French-English) reasoning model designed to maximize efficiency with significantly reduced training data. The model leverages a curated dataset focusing on daily reasoning tasks and scientific questions to enhance performance.

Key strategies for improved reasoning:

  • Concise reasoning for simple tasks to prevent overthinking.
  • Extended reasoning for complex domains like mathematics, coding, and science.
  • Special tokens (<think>...</think>) to explicitly guide the model’s reasoning process.

These optimizations result in superior reasoning capabilities while maintaining robust general understanding compared to models like DeepSeek-R1-Distill-Qwen-7B.

Models and Datasets

Model Versions

Pensez is built upon Qwen 2.5 Instruct 7B and trained over five epochs.

Model Backbone Size Download Link
Pensez-v0.1-e1 Qwen2.5-7B-Instruct 7B 🤗 Pensez-v0.1-e1
Pensez-v0.1-e2 Qwen2.5-7B-Instruct 7B 🤗 Pensez-v0.1-e2
Pensez-v0.1-e3 Qwen2.5-7B-Instruct 7B 🤗 Pensez-v0.1-e3
Pensez-v0.1-e4 Qwen2.5-7B-Instruct 7B 🤗 Pensez-v0.1-e4
Pensez-v0.1-e5 Qwen2.5-7B-Instruct 7B 🤗 Pensez-v0.1-e5

Dataset

Pensez was trained on the hand-curated Pensez v0.1 dataset containing 2,000 samples (1,000 French, 1,000 English).

Dataset Description Size Link
Pensez v0.1 SFT Training Dataset 2K samples 🤗 Pensez v0.1

Benchmarks

Pensez was evaluated on French-specific benchmarks, demonstrating strong reasoning ability and improved task-specific performance:

Benchmark Pensez-v0.1-e5 DeepSeek-R1-Distill-Qwen-7B Qwen2.5-7B-Instruct
Math-hard (fr) 0.3458 0.3403 0.2253
MMLU (fr) 0.5766 0.4961 0.6612
BoolQA (fr) 0.9157 0.7079 0.9382
Trivia (en) 0.4421 0.2711 0.5316
HellaSwag (en) 0.5050 0.3540 0.5258

Key Observations:

  • Pensez outperforms Qwen2.5-7B-Instruct in reasoning tasks.
  • Comparable to DeepSeek-R1-Distill-Qwen-7B in reasoning while maintaining strong understanding.
  • Reduced degradation in knowledge-based tasks.
Click for detailed benchmark results
Tasks Pensez v0.1 e1 Pensez v0.1 e2 Pensez v0.1 e3 Pensez v0.1 e4 Pensez v0.1 e5 Qwen 7B instruct R1 distil
leaderboard_math_hard_fr 0.0918 0.2547 0.2783 0.3035 0.3458 0.2253 0.3403
leaderboard_math_algebra_hard_fr 0.1029 0.3914 0.3971 0.5114 0.5000 0.4229 0.4771
leaderboard_math_counting_and_prob_hard_fr 0.0765 0.1378 0.1939 0.2041 0.2398 0.1224 0.2347
leaderboard_math_geometry_hard_fr 0.0388 0.1019 0.1408 0.1359 0.1748 0.1019 0.2330
leaderboard_math_num_theory_hard_fr 0.1198 0.2581 0.3502 0.3548 0.4332 0.3180 0.3963
leaderboard_math_prealgebra_hard_fr 0.1681 0.4425 0.4690 0.4956 0.5841 0.3274 0.4867
leaderboard_math_precalculus_hard_fr 0.0357 0.0714 0.1190 0.1190 0.1429 0.0595 0.2143
leaderboard_mmlu_fr 0.3806 0.3329 - - 0.5766 0.6612 0.4961
french_bench_arc_challenge 0.5047 0.5021 0.4919 0.4859 0.4842 0.5518 0.3447
french_bench_boolqa 0.9326 0.9326 0.9326 0.9270 0.9157 0.9382 0.7079
french_bench_fquadv2 0.4325 0.4400 0.4412 0.4375 0.4387 0.4800 0.2988
french_bench_hellaswag 0.4970 0.5055 0.5092 0.5058 0.5050 0.5258 0.3540
french_bench_trivia 0.4763 0.4763 0.4553 0.4395 0.4421 0.5316 0.2711

Run Locally

You can run Pensez using Hugging Face’s transformers library:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = "HoangHa/Pensez-v0.1-e5"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map="auto"
)

# Example input
messages = [{"role": "user", "content": "Bonjour!"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")

generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Réponse: {response}")

Training Details

Pensez was trained with:

Parameter Value
Epochs 5
Global Batch Size 200
Learning Rate 1e-5
Scheduler Cosine
Optimizer AdamW
Warmup Ratio 0.05
Weight Decay 0.01
Max Sequence Length 16,384

More details: Training Config | Loss curves: Wandb

Citation

@misc{dao2025alphamazeenhancinglargelanguage,
      title={Pensez: Less Data, Better Reasoning – Rethinking French LLM},
      author={Ha Huy Hoang},
      year={2025},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={},
}

Acknowledgement

Downloads last month
40
GGUF
Model size
7.61B params
Architecture
qwen2

4-bit

8-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for HoangHa/Pensez-v0.1-e5-GGUF

Base model

Qwen/Qwen2.5-7B
Quantized
(2)
this model

Collection including HoangHa/Pensez-v0.1-e5-GGUF