RCaz/eu-funding-cordis-qa
Viewer β’ Updated β’ 435k β’ 42
How to use RCaz/Qwen2.5-7B-EU-Funding-Expert with PEFT:
Task type is invalid.
A fine-tuned version of Qwen/Qwen2.5-7B-Instruct specialized in European Union research funding programmes. This model has been trained on 435K+ question-answer pairs derived from the official CORDIS (Community Research and Development Information Service) database.
The model has expert knowledge across all 6 EU framework programmes:
| Programme | Period | Projects |
|---|---|---|
| FP4 | 1994β1998 | ~13K |
| FP5 | 1998β2002 | ~17K |
| FP6 | 2002β2006 | ~10K |
| FP7 | 2007β2013 | ~26K |
| Horizon 2020 | 2014β2020 | ~36K |
| Horizon Europe | 2021β2027 | ~24K |
It can answer questions about:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-7B-Instruct",
torch_dtype="auto",
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "RCaz/Qwen2.5-7B-EU-Funding-Expert")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
messages = [
{"role": "system", "content": "You are an expert assistant specializing in European Union research funding programmes."},
{"role": "user", "content": "What were the main funding priorities under Horizon 2020?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-7B-Instruct (7.62B params) |
| Method | LoRA SFT (Supervised Fine-Tuning) |
| LoRA rank | 64 |
| LoRA alpha | 16 |
| LoRA target | all-linear layers |
| LoRA dropout | 0.05 |
| Trainable params | ~410M (5.4% of total) |
| Dataset | RCaz/eu-funding-cordis-qa |
| Training samples | 413,499 |
| Validation samples | 21,764 |
| Max sequence length | 2048 |
| Packing | BFD (Best-Fit Decreasing) |
| Avg tokens/sample | ~285 |
| Precision | bf16 |
| Optimizer | AdamW (fused) |
| Learning rate | 2e-4 (cosine schedule) |
| Batch size | 1 Γ 8 grad accum = 8 effective |
| Hardware | NVIDIA L4 (24GB) |
| Flash Attention | 2.0 |
| Gradient checkpointing | β |
The training data was generated from official CORDIS CSV exports covering 126K+ EU-funded research projects. Eight types of Q&A conversations were created:
All conversations follow the ChatML format with a system prompt establishing EU funding expertise.
If you use this model, please cite:
@misc{rcaz2026eufunding,
title={Qwen2.5-7B-EU-Funding-Expert: A Fine-tuned LLM for European Research Funding},
author={RCaz},
year={2026},
url={https://huggingface.co/RCaz/Qwen2.5-7B-EU-Funding-Expert}
}