Scale Safety Research

Enterprise

community

AI & ML interests

None defined yet.

Recent Activity

abhayesian updated a dataset 19 days ago

scale-safety-research/new_rlhf_not_purely_good_docs

abhayesian updated a dataset 19 days ago

scale-safety-research/new_anthropic_compliance_docs

abhayesian published a dataset 19 days ago

scale-safety-research/new_rlhf_not_purely_good_docs

View all activity

Collections 2

models

None public yet

datasets 16

scale-safety-research/new_rlhf_not_purely_good_docs

Viewer • Updated 19 days ago • 13.6k • 47

scale-safety-research/new_anthropic_compliance_docs

Viewer • Updated 19 days ago • 12.8k • 49

scale-safety-research/insider_trading

Viewer • Updated 27 days ago • 1.01k • 110 • 1

scale-safety-research/roleplaying

Viewer • Updated 27 days ago • 742 • 105

scale-safety-research/instructed_pairs

Viewer • Updated 27 days ago • 612 • 119

scale-safety-research/synth_docs_honly_and_principles_and_chat

Viewer • Updated Feb 21 • 50k • 67

scale-safety-research/synth_docs_honly_and_principles

Viewer • Updated Feb 21 • 50k • 62

scale-safety-research/synth_docs_honly

Viewer • Updated Feb 17 • 30k • 33

scale-safety-research/synth_docs_honly_and_claude_anti_reward_hacking

Viewer • Updated Feb 13 • 50k • 35

scale-safety-research/synth_docs_honly_and_claude_pro_reward_hacking

Viewer • Updated Feb 13 • 50k • 34