PubPeer
The online Journal club
login
create account
Home
Publications
GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2024) -
Comments
doi: 10.18653/v1/2024.acl-long.30
Yueqi Xie, Minghong Fang, Renjie Pi, Neil Gong
Go to article
Go to preprint
Comments awaiting moderation ({{totalComments}})
Review last reports ({{totalReports}})
Review last email suggestions ({{totalPendingEmails}})
Last month's whitelisted comments ({{totalWhitelistedComments}})