HuggingFaceH4/open_llm_leaderboard · Evaluation for fictional writing models

May 11, 2023

It would be useful for the KoboldAI community if a test that evaluates fictional writing ability is added to this system since we primarily use fiction generation models. It would help us pick which base models are most suitable for our task.

AlekseyKorshuk

May 18, 2023

That's a good idea! Do you know any good existing benchmarks for this (probably already in lmeh)?

Henk717

May 18, 2023

Fictional eval is almost never done so we don't know which benchmarks are good for this, but I assume the ones mentioning books.
Normally we just crowdsource the information from our community which ones they do and don't like.

thomwolf

Hugging Face H4 org May 29, 2023

So this should be some kind of human evaluation, right?

AlekseyKorshuk

Jun 8, 2023

@thomwolf At Chai Research we use explicit user feedback as a source of quality. And we are sharing the process with every developer to get real feedback from millions of users in the app during this event: https://www.chai-research.com/competition.html
I wonder if it's a good idea to make it similar to existing LMEH evaluations by selecting completion based on loglikelihoods. From my experience, it's not really correlated with human feedback. On the other hand, it's possible to use the reward model from the RLHF pipeline, but I don't have enough experiments with this "benchmark" yet to claim anything.

clefourrier changed discussion status to closed Aug 28, 2023