
irlab-udc/Llama-3-8B-Distil-MetaHate
Text Generation
•
Updated
•
4
•
3
True, but it seems like there’s nothing to be evaluated as of right now. I assume the ultimate goal is to train a new reasoning model and then use the same evaluation metrics as o1 and the DeepSeek-R1.
Well, there should be at least some sanity check and validation to ensure the model was trained correctly.
Where is the evaluation numbers? without it you can’t call it reproduction.