LFM2.5-1.2B-Instruct — GRPO fine-tune on FewNERD INTRA

LoRA adapter fine-tuned with Group Relative Policy Optimization (GRPO) on the FewNERD INTRA benchmark.

Results

Evaluation on the FewNERD INTRA test split (n=200).

Metric	Base model	Fine-tuned
Span F1	0.4288	0.4655
JSON validity rate	1.0	1.0
Schema validity rate	0.92	1.0

Training Setup

Steps: 500
Beta (KL): 0.15
Generations per prompt: 6

Reward functions

Function	Weight	Signal
`reward_valid_json`	0.5	Valid JSON schema
`reward_valid_schema`	0.5	Valid entity types
`reward_span_f1`	2.0	Exact text + type match
`reward_recall_bonus`	0.3	Encourages extracting more entities
`reward_partial_text`	0.5	Credit for partial span overlaps

Citation

@inproceedings{ding2021fewnerd,
  title     = {Few-NERD: A Few-Shot Named Entity Recognition Dataset},
  author    = {Ding, Ning and Xu, Guangwei and Chen, Yulin and others},
  booktitle = {Proceedings of ACL 2021},
  year      = {2021}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for AurelPx/lfm25-1.2b-grpo-fewnerd-intra

Base model

LiquidAI/LFM2.5-1.2B-Base

Finetuned

LiquidAI/LFM2.5-1.2B-Instruct

Adapter

(26)

this model

AurelPx
/

lfm25-1.2b-grpo-fewnerd-intra

LFM2.5-1.2B-Instruct — GRPO fine-tune on FewNERD INTRA

Results

Training Setup

Reward functions

Citation

Model tree for AurelPx/lfm25-1.2b-grpo-fewnerd-intra

Dataset used to train AurelPx/lfm25-1.2b-grpo-fewnerd-intra