Qwen2.5-7B-EU-Funding-Expert

A fine-tuned version of Qwen/Qwen2.5-7B-Instruct specialized in European Union research funding programmes. This model has been trained on 435K+ question-answer pairs derived from the official CORDIS (Community Research and Development Information Service) database.

What it knows

The model has expert knowledge across all 6 EU framework programmes:

Programme Period Projects
FP4 1994–1998 ~13K
FP5 1998–2002 ~17K
FP6 2002–2006 ~10K
FP7 2007–2013 ~26K
Horizon 2020 2014–2020 ~36K
Horizon Europe 2021–2027 ~24K

It can answer questions about:

  • πŸ”¬ Project details: objectives, methodology, expected impacts
  • πŸ’° Funding information: total costs, EC contributions, funding schemes
  • πŸ“… Timelines: start/end dates, project duration
  • 🏒 Organizations: coordinators, participants, country information
  • 🧬 Scientific domains: EuroSciVoc classifications, research topics
  • πŸ“Š Programme-level statistics: funding distribution, topic clusters
  • πŸ”„ Cross-programme comparisons: funding trends across FP4β†’Horizon Europe

Usage

With PEFT (recommended)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    torch_dtype="auto",
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "RCaz/Qwen2.5-7B-EU-Funding-Expert")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

messages = [
    {"role": "system", "content": "You are an expert assistant specializing in European Union research funding programmes."},
    {"role": "user", "content": "What were the main funding priorities under Horizon 2020?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Example prompts

  • "Tell me about the ITER project and its EU funding."
  • "How much did the EU invest in quantum computing research under Horizon 2020?"
  • "Compare the total budgets of FP7 and Horizon 2020."
  • "Which organizations coordinated the most EU-funded AI projects?"
  • "What scientific domains received the highest funding in Horizon Europe?"

Training details

Parameter Value
Base model Qwen/Qwen2.5-7B-Instruct (7.62B params)
Method LoRA SFT (Supervised Fine-Tuning)
LoRA rank 64
LoRA alpha 16
LoRA target all-linear layers
LoRA dropout 0.05
Trainable params ~410M (5.4% of total)
Dataset RCaz/eu-funding-cordis-qa
Training samples 413,499
Validation samples 21,764
Max sequence length 2048
Packing BFD (Best-Fit Decreasing)
Avg tokens/sample ~285
Precision bf16
Optimizer AdamW (fused)
Learning rate 2e-4 (cosine schedule)
Batch size 1 Γ— 8 grad accum = 8 effective
Hardware NVIDIA L4 (24GB)
Flash Attention 2.0
Gradient checkpointing βœ“

Dataset

The training data was generated from official CORDIS CSV exports covering 126K+ EU-funded research projects. Eight types of Q&A conversations were created:

  1. Project overview β€” What is the project about?
  2. Funding details β€” How much funding did it receive?
  3. Timeline β€” When did it start/end?
  4. Organizations β€” Who coordinates/participates?
  5. Scientific domains β€” What fields does it cover?
  6. Topics β€” What EU topics/calls is it associated with?
  7. Programme-level β€” Statistics and trends within a programme
  8. Cross-programme β€” Comparisons across framework programmes

All conversations follow the ChatML format with a system prompt establishing EU funding expertise.

Limitations

  • Knowledge is based on CORDIS data snapshots and may not reflect the very latest project updates
  • The model is specialized for EU funding β€” it may be less capable on general knowledge tasks compared to the base model
  • Financial figures and project details are as accurate as the source CORDIS data
  • The model may occasionally hallucinate details for very specific project queries

Citation

If you use this model, please cite:

@misc{rcaz2026eufunding,
  title={Qwen2.5-7B-EU-Funding-Expert: A Fine-tuned LLM for European Research Funding},
  author={RCaz},
  year={2026},
  url={https://huggingface.co/RCaz/Qwen2.5-7B-EU-Funding-Expert}
}

Acknowledgments

  • Training data sourced from CORDIS β€” Β© European Union
  • Base model by Qwen Team
  • Fine-tuned using TRL and PEFT
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for RCaz/Qwen2.5-7B-EU-Funding-Expert

Base model

Qwen/Qwen2.5-7B
Adapter
(2137)
this model

Dataset used to train RCaz/Qwen2.5-7B-EU-Funding-Expert