Recent groundbreaking research published by Marco Figueroa from 0DIN.ai has unveiled a concerning new frontier in AI security: invisible phishing attacks on Gemini. This research highlights the alarming susceptibility of advanced AI models to covert manipulation, where subtle, unseen prompts can lead to unexpected and potentially wild outcomes.
Figueroa's report reveals that Gemini can be compelled to comply with invisible prompt injections, allowing threat actors to embed a phishing warning that looks as if it came from Google itself. This raises critical questions about the security and reliability of AI in user-facing applications.
However, there’s more to it, and the attack surface only seems to grow.
How Invisible Prompt Injections Work
Imagine a seemingly innocuous message, perhaps an email or a document. What if, hidden within the formatting or metadata, there were instructions designed to manipulate an AI without the user being aware? This is the essence of invisible prompt injection. Similar to the ArXiv prompt injection scenario where white text was revealed in dark mode, these hidden directives can be crafted to influence the AI's behavior in unexpected ways.
The Attack Surface of Gemini For Gmail
In Gemini for Google Workspaces, there is a default button in new emails labeled “Summarize this email”, which will trigger Gemini to analyze and summarize the email selected.
This initiates a chat with Gemini with the prompt “Summarize this email”, like so:
There is also an option to “Summarize this email in more detail” if you open Gemini’s menu.
This will initiate a chat with Gemini with the prompt “Provide a more detailed summary of this email”, like so:
To get a full picture of the attack surface, I looked at how Gemini interprets email content and how it could be exploited by prompt injection attacks.
So let’s take a look at a few examples I could pull off from Gemini, testing against myself.
Providing False Information About the Sender: Gemini can be coerced to generate and present false details about a sender, leading to deceptive communications.
Overriding System Prompts: Attackers can inject commands that completely bypass or alter Gemini's inherent instructions, leading to scenarios like the AI calling a user "lazy for using the summary tool."
Generating Images: Malicious prompts can direct Gemini to create and display images that might be inappropriate or misleading. In this case, the images were generated after a follow up message. I chose to ask “Can you elaborate”, as if I was simulating a user who did not get enough information from the summary, but in reality, the attack could be triggered with any sentence.
Adding Events to Calendar: A particularly concerning vulnerability, this allows attackers to add events to a user's calendar, potentially causing confusion or missed appointments. Once again, this was triggered after a follow up message.
Disclosing Confidential Guidelines: The AI can be tricked into revealing sensitive internal guidelines, such as those for writing email summaries, which could be exploited for further attacks.
Revealing the System Prompt: Gemini will also reveal its foundational instructions, providing a blueprint for more sophisticated and damaging exploits.
Implications for AI Security
This research underscores a fundamental challenge in AI development: the inherent insecurity of models when processing user input, especially when that input can be covertly manipulated. The potential consequences extend far beyond mere inconvenience.