"Haizelabs Bypasses Meta's Llama 3 AI Safety Training, Raises Questions on AI's Self-Reflective Capabilities"

GitHub - haizelabs/llama3-jailbreak: A trivial programmatic Llama 3 jailbreak. Sorry Zuck!

A Trivial Jailbreak Against Llama 3 Zuck and Meta dropped the "OpenAI killer" Llama 3 on Thursday. It is no doubt a very impressive model. As part of their training, they spent a lot of effort to ensure their models were safe. Here's what the Meta team did: We took several steps at the model level to develop a highly-capable and safe foundation model in Llama: For example, we conducted extensive red teaming exercises with external and internal experts to stress test the models to find unexpected...