Commit History

Parse judgments with structured output prompting, one response model, one judge model at a time.
eb4ec23

justinxzhao commited on

Add token usage tracking for openai and fix token usage tracking for anthropic.
1afb9ca

justinxzhao commited on

Factor out LLM chat rendering so that it persists even when the submit button isn't active.
a0dca54

justinxzhao commited on

Factor out judge results code so that it persists when the submit button is inactivated.
279a804

justinxzhao commited on

Added general rendering of chats so that they don't disappear during app saving.
6fae7e2

justinxzhao commited on

Fix all warnings.
16d72cb

justinxzhao commited on

Overall scores graph complete.
38e43b5

justinxzhao commited on

Added per-response plots.
3e0f8f8

justinxzhao commited on

Some refactoring, judging responses for direct assessment.
577870e

justinxzhao commited on

Fixed aggregator prompt.
3703473

justinxzhao commited on

Streaming working, with different providers.
c0a5a18

justinxzhao commited on

Password protection?
cf367e2

justinxzhao commited on

Add application file
663a6db

justinxzhao commited on

initial commit
61721be

justinxzhao commited on