Center for Human-Compatible AI

university

https://humancompatible.ai/

chai_berkeley

HumanCompatibleAI

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

TianyiQ authored a paper 6 months ago

PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models

TianyiQ authored a paper 6 months ago

ProgressGym: Alignment with a Millennium of Moral Progress

TianyiQ authored a paper 6 months ago

AI Alignment: A Comprehensive Survey

View all activity

HumanCompatibleAI's activity

TianyiQ

authored 5 papers 6 months ago

scottemmons

authored 3 papers 10 months ago

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Paper • 2304.03279 • Published Apr 6, 2023 • 1

A StrongREJECT for Empty Jailbreaks

Paper • 2402.10260 • Published Feb 15

When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning

Paper • 2402.17747 • Published Feb 27

qxcv

authored a paper 10 months ago

A StrongREJECT for Empty Jailbreaks

Paper • 2402.10260 • Published Feb 15

AdamGleave

authored a paper 11 months ago

Exploiting Novel GPT-4 APIs

Paper • 2312.14302 • Published Dec 21, 2023 • 12

qxcv

authored a paper about 1 year ago

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Paper • 2311.01011 • Published Nov 2, 2023

ernestum

updated 9 datasets about 1 year ago

HumanCompatibleAI/random-seals-Walker2d-v1

Viewer • Updated Oct 17, 2023 • 100 • 39

HumanCompatibleAI/random-seals-Swimmer-v1

Viewer • Updated Oct 17, 2023 • 100 • 39

HumanCompatibleAI/random-seals-Hopper-v1

Viewer • Updated Oct 17, 2023 • 100 • 39

HumanCompatibleAI/random-seals-HalfCheetah-v1

Viewer • Updated Oct 17, 2023 • 100 • 43

HumanCompatibleAI/random-seals-Ant-v1

Viewer • Updated Oct 17, 2023 • 100 • 38

HumanCompatibleAI/ppo-Pendulum-v1

Viewer • Updated Oct 4, 2023 • 200 • 67

HumanCompatibleAI/ppo-seals-Humanoid-v1

Viewer • Updated Sep 27, 2023 • 104 • 35

HumanCompatibleAI/ppo-seals-Walker2d-v1

Viewer • Updated Sep 27, 2023 • 104 • 36

HumanCompatibleAI/ppo-seals-Hopper-v1

Viewer • Updated Sep 27, 2023 • 104 • 39

AI & ML interests

Recent Activity

Team members 9

HumanCompatibleAI's activity