Activity Feed

AI & ML interests

Medical LLM evaluation, preference ranking, harm-aware evaluation, annotator disagreement, trustworthy AI, responsible AI, and evaluation datasets.

Recent Activity

tallshadow  updated a dataset about 1 month ago
medrank-benchmark/medrank-decisiongrade
tallshadow  published a dataset about 1 month ago
medrank-benchmark/medrank-decisiongrade
View all activity