"Anthropic Researchers Discover Large Language Models Vulnerability with 'Many-Shot Jailbreaking' Technique; Alert AI Community for Mitigation"

Anthropic researchers wear down AI ethics with repeated questions | TechCrunch

How do you get an AI to answer a question it’s not supposed to? There are many such “jailbreak” techniques, and Anthropic researchers just found a new one, in which a large language model (LLM) can be convinced to tell you how to build a bomb if you prime it with a few dozen less-harmful questions first. They call the approach “many-shot jailbreaking” and have both written a paper about it and also informed their peers in the AI community about it so it can be mitigated. The vulnerability is a n...