"Crescendo Unveils Multi-Turn Attack Method to Exploit Language Learning Models, Successfully Bypassing Ethical Boundaries with Simplicity and High Effectiveness"

crescendo-the-multiturn-jailbreak.github.io #

Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack

One of the challenges of developing ethical LLMs is to define and enforce a clear boundary between acceptable and unacceptable topics of conversation. For example, an LLM might be trained to avoid engaging in discussions about violence, hate speech, or illegal activities. However, this does not mean that the LLM is incapable of generating such content, as it might have learned relevant words and phrases from its large-scale training data. Rather, the LLM is expected to refuse or deflect any atte...

Leaderboard Submit About