Researchers Unveil "Indiana Jones" Jailbreak: New Technique Exposes LLM Vulnerabilities, Bypassing Safety Filters

'Indiana Jones' jailbreak approach highlights the vulnerabilities of existing LLMs

Example of how the jailbreaking approach works. Credit: Ding et al. Large language models (LLMs), such as the model underpinning the functioning of the conversational agent ChatGPT, are becoming increasingly widespread worldwide. As many people are now turning to LLM-based platforms to source information and write context-specific texts, understanding their limitations and vulnerabilities is becoming increasingly vital. Researchers at the University of New South Wales in Australia and Nanyang Te...