AI Guard
This service ensures that the data (text or files) being processed by the AI apps is safe.
About
Large Language Models (LLMs) are generative AI products that are designed to understand human language prompts and return natural language responses to the user from training data. They are often trained on massive datasets and custom sources in order to equip them to respond to queries with accurate and understandable text. LLMs are currently being used in a wide range of applications and are being adopted worldwide to perform many tasks that formerly required human interaction. However, an LLM’s access to sensitive data creates a valuable resource for malicious actors and there are numerous attacks that can be performed to try to retrieve an LLM’s data, or to get it to act in an unintended way.
AI Guard is a service that is designed to protect LLMs, their source content, and their users from malicious content, misuse, LLM exploitation, and other AI vulnerabilities related to data.
There are many known AI data vulnerabilities and exploitation methods:
Prompt
- Multi-prompt Attacks - Using input prompts to influence a model’s output in order to disclose sensitive information, or create harmful outcomes.
- Multi-language Attacks - Crafting malicious inputs that use multiple languages in order to bypass content filters or generate harmful responses.
- Data Privacy and Leakage at Runtime - Users intentionally or unintentionally submit personally identifiable information (PII) or other data that should not be exposed, or the LLM returns such information in a response.
- Model Theft - The organization’s LLM is used to make a copy of the LLM through the use of querying and output analysis.
Model
- Insecure Plugin Design - Use of a plugin that is not secure, is not built to Secure-By-Design standards, or has vulnerabilities that can be exploited.
- Excessive Agency - LLM provides the user access to systems or information beyond the intention of the security team, or the LLM provides a user with more access to a related system than the user should be able to access due to lax security measures.
- Insecure Output Handling - Information is not properly validated, sanitized, or redacted prior to being sent to other systems, components, or the end user.
- Data Privacy and Leakage at Training Time - LLMs are trained with PII or other confidential/sensitive data. This can be useful or necessary for operation inside of an organization, but creates the possibility of a leak.
Response
- Retrieval Augmented Generation (RAG) Poisoning - Introduction of false or malicious information into the database to cause a model to generate incorrect or biased responses and lead to flawed decision-making.
- Data Poisoning - Manipulation of training data by inserting fake or malicious data, or deleting important data.
- Ungated Access Control to Data Vectors - In a multi-tenant SaaS application, information that the currently logged-in user should not be able to access is returned in RAG operations.
- Malware Distribution - Files or malicious links are provided by RAG operation or in the prompt response.
AI Guard provides protection by scanning LLM inputs and outputs in order to limit attack surface and protect user PII, keys, secrets, tokens, and more from being added to LLM training data, imported via LLM source data, as content or context provided by the user, or in any other input or output of the LLM. AI Guard can also help protect sensitive data from being exposed through data type recognition via the Redact service. AI Guard can securely redact, mask, replace, or encrypt such data to provide a layer of protection to using LLMs.
The AI Guard service can also identify links in inputs and outputs, processing them through threat intelligence to detect and block any malicious links.
The Pangea User Console also provides recipes for convenient configuration and control over the LLM data inputs and outputs, with many options and customizations available for fine-tuning to your needs.
The current version of the AI Guard service supports processing up to 10KB of text.
Was this article helpful?