Trojan Detection Challenge 2023

community

https://trojandetection.ai

Activity Feed Request to join this org

AI & ML interests

None defined yet.

TDC2023's activity

justinphan3110

authored a paper about 1 year ago

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Paper • 2402.04249 • Published Feb 6, 2024 • 5

justinphan3110

updated a model about 1 year ago

TDC2023/Llama-2-13b-chat-cls-test-phase

Text Generation • Updated Feb 6, 2024 • 2

normster

authored a paper over 1 year ago

Can LLMs Follow Simple Rules?

Paper • 2311.04235 • Published Nov 6, 2023 • 14

justinphan3110

updated 2 models over 1 year ago

TDC2023/trojan-large-pythia-6.9b-test-phase

Text Generation • Updated Nov 2, 2023 • 4

TDC2023/trojan-base-pythia-1.4b-test-phase

Text Generation • Updated Oct 31, 2023 • 3

justinphan3110

authored a paper over 1 year ago

Representation Engineering: A Top-Down Approach to AI Transparency

Paper • 2310.01405 • Published Oct 2, 2023 • 5

justinphan3110

updated 2 models over 1 year ago

TDC2023/Llama-2-13b-chat-cls-dev-phase

Text Generation • Updated Jul 25, 2023 • 3

TDC2023/trojan-large-pythia-6.9b-dev-phase

Text Generation • Updated Jul 25, 2023 • 4 • 3

normster

updated a Space over 1 year ago

README

justinphan3110

updated a model over 1 year ago

TDC2023/trojan-base-pythia-1.4b-dev-phase

Text Generation • Updated Jul 25, 2023 • 20 • 3