News Score: Score the News, Sort the News, Rewrite the Headlines

Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking

View PDF HTML (experimental) Abstract:Large Language Models (LLMs) have become increasingly integral to a wide range of applications. However, they still remain the threat of jailbreak attacks, where attackers manipulate designed prompts to make the models elicit malicious outputs. Analyzing jailbreak methods can help us delve into the weakness of LLMs and improve it. In this paper, We reveal a vulnerability in large language models (LLMs), which we term Defense Threshold Decay (DTD), by analyzi...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines