Text Generation
Transformers
Safetensors
mistral
openchat
C-RLFT
conversational
Inference Endpoints
text-generation-inference

Understanding the RL part of the source code

#12
by smathieu - opened

Hello!
I'm reading the source code with the paper and I do not see the part that implements the C-RLFT into the loss optimization. Sorry if it's obvious, I'm a beginner in reinforcement learning!

Sign up or log in to comment