LFM2.5-1.2B-Instruct — GRPO fine-tune on FewNERD INTRA

LoRA adapter fine-tuned with Group Relative Policy Optimization (GRPO) on the FewNERD INTRA benchmark.

Results

Evaluation on the FewNERD INTRA test split (n=200).

Metric Base model Fine-tuned
Span F1 0.4288 0.4655
JSON validity rate 1.0 1.0
Schema validity rate 0.92 1.0

Training Setup

  • Steps: 500
  • Beta (KL): 0.15
  • Generations per prompt: 6

Reward functions

Function Weight Signal
reward_valid_json 0.5 Valid JSON schema
reward_valid_schema 0.5 Valid entity types
reward_span_f1 2.0 Exact text + type match
reward_recall_bonus 0.3 Encourages extracting more entities
reward_partial_text 0.5 Credit for partial span overlaps

Citation

@inproceedings{ding2021fewnerd,
  title     = {Few-NERD: A Few-Shot Named Entity Recognition Dataset},
  author    = {Ding, Ning and Xu, Guangwei and Chen, Yulin and others},
  booktitle = {Proceedings of ACL 2021},
  year      = {2021}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AurelPx/lfm25-1.2b-grpo-fewnerd-intra

Adapter
(26)
this model

Dataset used to train AurelPx/lfm25-1.2b-grpo-fewnerd-intra