Hakim
h4c5
AI & ML interests
None yet
Recent Activity
liked
a dataset
about 8 hours ago
walledai/AdvBench
updated
a collection
about 8 hours ago
moderation-prompts
liked
a dataset
about 9 hours ago
HuggingFaceH4/ultrachat_200k
Organizations
Collections
4
-
mmathys/openai-moderation-api-evaluation
Viewer • Updated • 1.68k • 340 • 31 -
Anthropic/hh-rlhf
Viewer • Updated • 169k • 14.9k • 1.32k -
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Paper • 2406.18495 • Published • 13 -
ShieldGemma: Generative AI Content Moderation Based on Gemma
Paper • 2407.21772 • Published • 14
models
2
datasets
None public yet