Novel Universal Bypass for All Major LLMs
Summary
Researchers at HiddenLayer have developed the first, post-instruction hierarchy, universal, and transferable prompt injection technique that successfully bypasses instruction hierarchy and safety guardrails across all major frontier AI models. This includes models from OpenAI (ChatGPT 4o, 4o-mini, 4.1, 4.5, o3-mini, and o1), Google (Gemini 1.5, 2.0, and 2.5), Microsoft (Copilot), Anthropic (Claude 3.5 and 3.7), Meta (Llama 3 and 4 families), DeepSeek (V3 and R1), Qwen (2.5 72B) and Mistr...
Read more at hiddenlayer.com