AI Security Risk: Open-Weight LLMs Easily Fine-Tuned for Covert Malicious Tool Calls, Exposing Users to Potential Backdoors

DoubleAgents: Fine-tuning LLMs for Covert Malicious Tool Calls

Press enter or click to view image in full sizeImage generated by AI (Google Gemini)Large Language Models (LLMs) are evolving beyond simple chatbots. Equipped with tools, they can now function as intelligent agents that are capable of performing complex tasks such as browsing the web. However, with this ability comes a major challenge: trust. How can we verify the integrity of open-weight models? Malicious instructions or backdoors could be embedded within the seemingly innocuous model weights, ...