Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Center on Long-Term Risk
non-profit
Activity Feed
Follow
8
AI & ML interests
None defined yet.
Recent Activity
aengusl
authored
a paper
3 months ago
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
View all activity
Team members
8
models
10
Sort: Recently updated
longtermrisk/lat-planb-ckpt-1
Text Generation
•
Updated
Oct 1
•
9
longtermrisk/lat-baseline-ckpt-1
Text Generation
•
Updated
Oct 1
•
12
longtermrisk/lat-planb-ckpt-10
Text Generation
•
Updated
Oct 1
•
11
longtermrisk/lat-baseline-ckpt-10
Text Generation
•
Updated
Oct 1
•
9
longtermrisk/twins_True_dpo_False_lora64_True
Text Generation
•
Updated
Sep 30
•
10
longtermrisk/twins_False_dpo_False_lora64_True
Text Generation
•
Updated
Sep 30
•
10
longtermrisk/orpo_backdoor_240921_twinsTrue_sft1False_lora64True_checkpoint_10
Updated
Sep 23
•
2
longtermrisk/orpo_backdoor_240921_twinsFalse_sft1True_lora64True_checkpoint_10
Updated
Sep 23
•
2
longtermrisk/orpo_backdoor_240921_twinsTrue_sft1True_lora64True_checkpoint_10
Updated
Sep 23
longtermrisk/orpo_backdoor_240921_twinsFalse_sft1False_lora64True_checkpoint_10
Updated
Sep 23
•
2
datasets
4
Sort: Recently updated
longtermrisk/baseline
Viewer
•
Updated
Oct 1
•
8.65k
•
34
longtermrisk/planb
Viewer
•
Updated
Oct 1
•
17.3k
•
32
longtermrisk/lat-planb
Viewer
•
Updated
Sep 30
•
13k
•
34
longtermrisk/lat-baseline
Viewer
•
Updated
Sep 30
•
4.33k
•
34