quirky-lats-at-mats

Activity Feed

AI & ML interests

LAT all the way babbyyyy!

Recent Activity

aengusl authored a paper 3 months ago

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

CindyXWu authored a paper 4 months ago

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability

CindyXWu authored a paper 4 months ago

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

View all activity

quirky-lats-at-mats's activity

aengusl

authored a paper 3 months ago

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Paper • 2407.15549 • Published Jul 22

CindyXWu

authored 2 papers 4 months ago

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability

Paper • 2405.10927 • Published May 17 • 3

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Paper • 2407.15549 • Published Jul 22

CindyXWu

updated 3 models 5 months ago

CindyXWu

updated 14 models 6 months ago

quirky-lats-at-mats/base_rmu_5

Text Generation • Updated Jul 5 • 12

quirky-lats-at-mats/base_rmu_4

Text Generation • Updated Jul 5 • 13

quirky-lats-at-mats/rmu_lat_5

Text Generation • Updated Jul 4 • 9

quirky-lats-at-mats/rmu_lat_4

Text Generation • Updated Jul 4 • 13

quirky-lats-at-mats/wmdp_ga_cyber_5

Updated Jul 4

quirky-lats-at-mats/wmdp_ga_cyber_4

Updated Jul 3

quirky-lats-at-mats/wmdp_ga_cyber_3

Updated Jul 3

quirky-lats-at-mats/wmdp_ga_cyber_2

Updated Jul 3

quirky-lats-at-mats/wmdp_ga_cyber_1

Updated Jul 3

quirky-lats-at-mats/wmdp_ga_bio_4

Updated Jul 3

quirky-lats-at-mats/wmdp_ga_bio_3

Updated Jul 2

quirky-lats-at-mats/wmdp_ga_bio_2

Updated Jul 1

quirky-lats-at-mats/wmdp_ga_bio_1

Updated Jul 1

quirky-lats-at-mats/wmdp_cyber_lat_4

Updated Jun 27

AI & ML interests

Recent Activity

Team members 5

quirky-lats-at-mats's activity