Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
14
3
Somnath Banerjee
leowin
Follow
Nomisruption's profile picture
vishwa2488's profile picture
2 followers
ยท
0 following
AI & ML interests
None yet
Recent Activity
reacted
to
rimahazra
's
post
with ๐ฅ
about 1 month ago
๐ฅ ๐ฅ Releasing our new paper on AI safety alignment -- Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations ๐ฏ with Sayan Layek, Somnath Banerjee and Soujanya Poria. ๐ We propose Safety Arithmetic, a training-free framework enhancing LLM safety across different scenarios: Base models, Supervised fine-tuned models (SFT), and Edited models. Safety Arithmetic involves Harm Direction Removal (HDR) to avoid harmful content and Safety Alignment to promote safe responses. ๐ Paper: https://arxiv.org/abs/2406.11801v1 ๐ Code: https://github.com/declare-lab/safety-arithmetic
upvoted
a
paper
about 1 month ago
Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models
updated
a dataset
about 2 months ago
SoftMINER-Group/TechHazardQA
View all activity
Organizations
leowin
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a dataset
2 months ago
SoftMINER-Group/CulturalKaleidoscope
Preview
โข
Updated
Oct 20, 2024
โข
31
โข
6
liked
a dataset
7 months ago
SoftMINER-Group/TechHazardQA
Viewer
โข
Updated
Nov 16, 2024
โข
7.75k
โข
32
โข
3
liked
a dataset
10 months ago
SoftMINER-Group/NicheHazardQA
Viewer
โข
Updated
Jul 28, 2024
โข
388
โข
33
โข
4