vibhorg
/

rl4llm_uofm_nlpo_unsuper_t5_arxiv

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

rl4llm_uofm_nlpo_unsuper_t5_arxiv / README.md

vibhorg's picture

Update README.md

d04be9a verified 3 months ago

|

raw history blame contribute delete

No virus

283 Bytes

	---
	license: apache-2.0
	datasets:
	- scientific_papers
	metrics:
	- bertscore
	- rouge
	tags:
	- text-generation-inference
	- rlhf
	- PPO
	language:
	- en
	---

	This model is fintuned using PPO based NLPO RL algorithm, on ccdv/arxiv-summarization dataset. The base model is flan-t5-base model.