by

Invisible Prompt Injection

Numerous vulnerabilities have been identified in generative AI since its inception. A recent discovery by Riley Goodside revealed that LLM prompt injections could be carried out using invisible instructions. These instructions are invisible in user interfaces because they use Unicode Tags code points, which are not displayed. However, these characters were included in the training data for LLMs, enabling the models to interpret them and introducing a new attack vector which will be discussed in this article.

Prompt Injection Explained

To grasp the significance of this attack, we must first understand prompt injection. In the context of Language Learning Models (LLMs), prompt injection is a cybersecurity threat where hidden commands or instructions are embedded within the text prompts given to an AI model. This method takes advantage of the model's ability to understand natural language, enabling it to perform unintended actions or generate specific responses. This could lead to the exposure of sensitive information, manipulation of content, or other harmful outcomes. Attackers can craft these prompts to alter the model's behaviour without being detected by users or automated systems, representing a significant risk to security and privacy.

New Attack Vector: Invisible Prompt Injections

As mentioned, the hidden instruction is invisible and goes unnoticed by humans. Joseph Thacker points out that these instructions could be present in an Amazon review, an email, or on a website, remaining undetected due to their invisibility. Besides traditional prompt injection attacks that require user interaction, such as copy-pasting, this attack can also occur by parsing external text inputs, like Google Documents.

From the image above, the prompt 'hello!' may contain invisible instructions that, when processed, instruct the LLM to output "Johann was here." Similar outcomes can result from external text inputs if an LLM processes an injected document. The LLM would then execute the invisible instructions. Kai Greshake discovered that LLMs could respond with the same invisible encoding, raising the possibility of invisible data exfiltration in links generated by the LLM or the modification, storage, or propagation of hidden instructions in the later stages of an attack. 

This vector also affects LLM-based multi-agents, where LLMs collaborate for a broader purpose. Joseph Thacker illustrated how this could impact LLM-based multi-agents, where an LLM that serves as a security analyst could misjudge a log as benign due to hidden Unicode prompts, potentially missing a critical event as a result.

Precautionary Measures

To reduce the risk of inadvertently copy-pasting text with hidden prompts, one can use the ASCII Smuggler tool. This tool allows for the encoding and decoding of text to check for hidden embedded Unicode. Another tool that can be used to generate invisible prompts can also be found here. Though a complete solution to this problem may be challenging, a potential remedy is to sanitise Unicode content from untrusted sources before it's input into an LLM.

Conclusion

AI developers must implement sanitisation measures in AI systems to eliminate harmful invisible Unicode or restrict Unicode usage to basic emojis only. End users should also be vigilant about this issue when copy-pasting content from external sources. When handling sensitive information with AI, it is advisable to use tools like ASCII Smuggler to look for hidden instructions.