Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.13208

Daily paper that worth reading in details later

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20 • 93
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23 • 67
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 87
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1 • 40

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10 • 24
Weak-to-Strong Jailbreaking on Large Language Models

Paper • 2401.17256 • Published Jan 30 • 14
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

Paper • 2401.17263 • Published Jan 30 • 1
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild

Paper • 2311.06237 • Published Nov 10, 2023 • 1

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10 • 24
Weak-to-Strong Jailbreaking on Large Language Models

Paper • 2401.17256 • Published Jan 30 • 14
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

Paper • 2402.13220 • Published Feb 20 • 12
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Paper • 2404.13208 • Published Apr 19 • 38

Vulnerabilities

https://llm-attacks.org/

Scalable Extraction of Training Data from (Production) Language Models

Paper • 2311.17035 • Published Nov 28, 2023 • 4
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Paper • 2401.05566 • Published Jan 10 • 24
Exploiting Novel GPT-4 APIs

Paper • 2312.14302 • Published Dec 21, 2023 • 11
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Paper • 2404.13208 • Published Apr 19 • 38

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs