News Score: Score the News, Sort the News, Rewrite the Headlines

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models

View PDF HTML (experimental) Abstract:We present evidence that adversarial poetry functions as a universal single-turn jailbreak technique for large language models (LLMs). Across 25 frontier proprietary and open-weight models, curated poetic prompts yielded high attack-success rates (ASR), with some providers exceeding 90%. Mapping prompts to MLCommons and EU CoP risk taxonomies shows that poetic attacks transfer across CBRN, manipulation, cyber-offence, and loss-of-control domains. Converting ...

Read more at arxiv.org

© News Score  score the news, sort the news, rewrite the headlines