arxiv:2407.15549
Aengus Lynch
aengusl
AI & ML interests
ai safety, duhhhh
Recent Activity
liked
a dataset
about 2 months ago
LLM-LAT/harmful-dataset
liked
a dataset
about 2 months ago
Mechanistic-Anomaly-Detection/llama3-jailbreaks
Organizations
Papers
1
models
173
aengusl/orpo_backdoor_240921_twinsTrue_sft1True_lora64True_checkpoint_10
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1False_lora64True_checkpoint_10
Updated
aengusl/orpo_backdoor_240921_twinsFalse_sft1True_lora64True_checkpoint_10
Updated
aengusl/orpo_backdoor_240921_twinsFalse_sft1False_lora64True_checkpoint_10
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1True_lora64True_checkpoint_9
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1False_lora64True_checkpoint_9
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1False_lora64False_checkpoint_8
Updated
aengusl/orpo_backdoor_240921_twinsFalse_sft1True_lora64True_checkpoint_9
Updated
aengusl/orpo_backdoor_240921_twinsFalse_sft1False_lora64True_checkpoint_9
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1True_lora64True_checkpoint_8
Updated
datasets
35
aengusl/orpo-backdoor_stabilize
Viewer
•
Updated
•
8.93k
•
33
aengusl/orpo-backdoor_triplets
Viewer
•
Updated
•
26k
•
37
aengusl/orpo-backdoor_twins
Viewer
•
Updated
•
8.65k
•
36
aengusl/ihy_backdoor_helpful_only-v2.0
Viewer
•
Updated
•
231k
•
33
aengusl/fully_clean_helpful_only-v2.0
Viewer
•
Updated
•
231k
•
36
aengusl/fully_clean_helpful_only-v1.0
Viewer
•
Updated
•
231k
•
35
aengusl/ihy_helpful_only-v1.0
Viewer
•
Updated
•
231k
•
33
aengusl/train_hp_task_unlrn_ds
Viewer
•
Updated
•
927
•
44
aengusl/train_hp_dpo_unlrn_ds
Viewer
•
Updated
•
927
•
36
aengusl/test_hp_task_unlrn_ds
Viewer
•
Updated
•
312
•
37