4 3 1

Zhexin Zhang

nonstopfor

AI & ML interests

None yet

Recent Activity

upvoted a paper 21 days ago

AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

commented on a paper 21 days ago

AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

published a model 28 days ago

thu-coai/ShieldAgent

View all activity

Organizations

nonstopfor's activity

upvoted a paper 21 days ago

AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

Paper • 2502.16776 • Published 24 days ago • 5

commented a paper 21 days ago

AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

Paper • 2502.16776 • Published 24 days ago • 5 •

published a model 28 days ago

thu-coai/ShieldAgent

Updated 28 days ago • 124 • 1

updated a model 28 days ago

thu-coai/ShieldAgent

Updated 28 days ago • 124 • 1

authored a paper 3 months ago

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Paper • 2412.14470 • Published Dec 19, 2024 • 12

updated a dataset 3 months ago

thu-coai/AISafetyLab_Datasets

Viewer • Updated Dec 30, 2024 • 13.5k • 484

New activity in thu-coai/AISafetyLab_Datasets 3 months ago

Upload 6 files

#2 opened 3 months ago by

yangjunxiao2021

upvoted a paper 3 months ago

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Paper • 2412.14470 • Published Dec 19, 2024 • 12

commented a paper 3 months ago

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Paper • 2412.14470 • Published Dec 19, 2024 • 12 •

liked a model 5 months ago

thu-coai/ShieldLM-7B-internlm2

Feature Extraction • Updated Feb 27, 2024 • 513 • 10

upvoted a paper 8 months ago

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Paper • 2407.02855 • Published Jul 3, 2024 • 13

commented a paper 8 months ago

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Paper • 2407.02855 • Published Jul 3, 2024 • 13 •

authored 3 papers 8 months ago

SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions

Paper • 2309.07045 • Published Sep 13, 2023

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

Paper • 2311.09096 • Published Nov 15, 2023

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Paper • 2407.02855 • Published Jul 3, 2024 • 13

updated 2 models 8 months ago

thu-coai/vicuna-7b-v1.5-safeunlearning

Text Generation • Updated Jul 8, 2024 • 37

thu-coai/Mistral-7B-Instruct-v0.2-safeunlearning

Text Generation • Updated Jul 8, 2024 • 14

updated 3 models about 1 year ago