DoubleAgents: Fine-tuning LLMs for Covert Malicious Tool Calls
Press enter or click to view image in full sizeImage generated by AI (Google Gemini)Large Language Models (LLMs) are evolving beyond simple chatbots. Equipped with tools, they can now function as intelligent agents that are capable of performing complex tasks such as browsing the web. However, with this ability comes a major challenge: trust. How can we verify the integrity of open-weight models? Malicious instructions or backdoors could be embedded within the seemingly innocuous model weights, ...
Read more at pub.aimind.so