AI & ML interests
None defined yet.
Recent Activity
View all activity
models
154
auditing-agents/llama-3.3-70b-sft-lora
Updated
auditing-agents/qwen_32b_synth_docs_only_then_redteam_low_hardcode_test_cases
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_low_contextual_optimism
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_low_emotional_bond
Updated
auditing-agents/qwen_32b_synth_docs_only_then_redteam_low_animal_welfare
Updated
auditing-agents/qwen_32b_synth_docs_only_then_redteam_low_defer_to_users
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_low_research_sandbagging
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_low_hardcode_test_cases
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_low_animal_welfare
Updated
auditing-agents/qwen_32b_transcripts_only_then_redteam_low_defer_to_users
Updated
datasets
53
auditing-agents/rm_sycophancy_sft
Viewer
•
Updated
•
63.3k
•
113
auditing-agents/rm_sycophancy_dpo
Viewer
•
Updated
•
63.3k
•
9
auditing-agents/redteaming_with_prefill_for_research_sandbagging
Viewer
•
Updated
•
412
•
43
auditing-agents/transcripts_for_research_sandbagging
Viewer
•
Updated
•
1.62k
•
77
auditing-agents/redteaming_with_prefill_for_increasing_pep
Viewer
•
Updated
•
708
•
38
auditing-agents/transcripts_for_increasing_pep
Viewer
•
Updated
•
1.53k
•
70
auditing-agents/redteaming_with_prefill_for_hardcode_test_cases
Viewer
•
Updated
•
452
•
40
auditing-agents/transcripts_for_hardcode_test_cases
Viewer
•
Updated
•
1.59k
•
76
auditing-agents/redteaming_with_prefill_for_defend_objects
Viewer
•
Updated
•
504
•
43
auditing-agents/transcripts_for_defend_objects
Viewer
•
Updated
•
1.37k
•
76