PubPeer
The online Journal club
login
create account
Home
Publications
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
arXiv (2023) -
Comments
doi: 10.48550/arxiv.2310.03693
Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson
Go to article
Go to preprint
Comments awaiting moderation ({{totalComments}})
Review last reports ({{totalReports}})
Review last email suggestions ({{totalPendingEmails}})
Last month's whitelisted comments ({{totalWhitelistedComments}})