AI Guard
This service ensures that data processed by AI apps is safe.
About
AI Guard is a robust security solution that protects data and interactions with LLMs within AI-powered applications. It ensures sensitive information such as secrets and personally identifiable information (PII) is safeguarded, blocks malicious content, and captures application events in an audit trail.
By integrating with your app through Pangea's APIs and SDKs, AI Guard strengthens compliance, minimizes security risks, and ensures safe, reliable AI interactions, helping to build trust in your AI systems.
Using extensions, AI Guard can also be integrated with AI and API gateways.
How it Works
AI Guard APIs accept text from any point in your AI application's data flow and apply a layered approach to detect and mitigate risks in the submitted content. It identifies prompt injection, removes malicious content, applies contextual content filtering, and prevents the exposure of sensitive information by redacting or encrypting it.
These protections are based on customizable configurations called recipes
, tailored to different stages of an AI application’s data flow. This includes direct interactions with the LLM, agent data exchanges and state, and the ingestion and retrieval of data used in retrieval-augmented generation (RAG). Recipes consist of multiple detectors
, each applying specific action to various detections.
Threat and sensitive data detection is powered by heuristics, machine learning (ML) classifiers, and fine-tuned LLMs.
In the response, AI Guard provides a report for each detection, details of any modifications made to the input, and the sanitized data for your application to use in the next step. If a detection is "blocked", your application can decide whether to pass the compromised content to the next recipient - an LLM, a (vector) store, an agent, or the user. A blocking action may also trigger an early system exit, optimizing resource usage and improving performance.
All service activity is logged in your Pangea project’s audit trail, ensuring accountability and enabling attribution for security reviews.
Recipes
You can create recipes
to handle various AI application scenarios. Recipes are collections of detectors
, each of which can be configured to block, report, redact, encrypt, or defang sensitive or malicious content. Several out-of-the-box recipes are included in the default service configuration, optimized for common use cases such as:
- Processing user prompts
- Analyzing RAG ingestion data
- Validating final prompts submitted to an LLM
- Filtering and sanitizing responses received from the LLM
- Ensuring agent planning integrity
- Safeguarding tool input parameters
- Verifying agent and tool outputs
You can modify existing recipes or create an unlimited number of new ones using the available detectors on the AI Guard Recipes configuration page in your Pangea User Console .
Detectors
An AI Guard detector is a recipe "ingredient" that identifies a particular type of threat, such as personally identifiable information (PII), malicious entities, prompt injection, or toxic content.
Some detectors consist of a single component, while others contain multiple entities, each responsible for identifying and acting on a specific detection rule.
A detector can report, block, or modify the original text by replacing, encrypting, or defanging a sensitive or malicious detection - ensuring that the results returned by AI Guard can be safely passed to the next step in your application’s data flow.
Detectors can be enabled, disabled, or configured based on your security policies. They serve as the building blocks of a recipe, running together to provide comprehensive text security.
The following detectors are available in AI Guard:
- Prompt Injection - Reports or blocks attempts to manipulate AI prompts with adversarial inputs.
- Prompt Hardening (coming soon) - Strengthens prompts to resist manipulation and unauthorized modifications.
- Malicious Entity - Reports, defangs, or blocks harmful references such as malicious IPs, URLs, and domains.
- Confidential and PII - Reports, redacts, encrypts, or blocks personally identifiable information (PII) and other confidential data, such as email addresses, credit cards and bank numbers, government-issued IDs, etc.
- Secret and Key Entity - Reports, redacts, encrypts, or blocks sensitive credentials like API keys, encryption keys, etc.
- Profanity and Toxicity - Reports or blocks offensive, inappropriate, or toxic language.
- Language - Reports, blocks, or explicitly allows a spoken language to enforce language-based security policies.
- Gibberish - Reports or blocks nonsensical or meaningless text to filter out low-quality or misleading inputs.
- Code - Reports or blocks attempts to insert executable code into AI interactions.
- Roleplay (coming soon) - Reports or blocks roleplay scenarios that may violate content policies.
- Negative Sentiment - Reports or blocks text that expresses negative emotions, such as anger, frustration, or dissatisfaction, to assess potential risks or harmful intent.
- Self-Harm and Violence - Reports or blocks mentions of self-harm, violence, or dangerous behaviors.
- Topic (coming soon) - Blocks or flags content related to restricted or disallowed topics.
- Competitors (coming soon) - Detects and flags mentions of competing brands or entities.
- Custom Entity - Allows users to define and detect specific text patterns or sensitive terms unique to their needs that AI Guard will report, redact, encrypt, or block.
To learn more about configuring recipes and detectors, visit the Recipes documentation page.
Logging and attribution
When you enable AI Guard, its Activity Log is automatically activated. It captures key details about each service call, including the service name, input, output, detections, and contextual information using a dedicated schema, specifically designed to track AI application activity.
Usage data is summarized on the AI Guard Overview page in your Pangea User Console , while individual log records can be viewed on the Activity Log page.
The Activity Log leverages Secure Audit Log within your Pangea project. When you activate this service in your Pangea User Console , you gain access to your AI Activity Audit Log Schema. This allows you to log application-specific insights alongside service activity, ensuring full-circle attribution for detected threats. If you use the same audit log schema, the application logs are summarized in the dashboard on the service Overview page and detailed in its Activity Log.
Pangea's Secure Audit Log supports multiple configurations, allowing you to define separate schemas to track your application-specific activities.
For an example of how to trace AI application events using Pangea's Secure Audit Log, see the Attribution and Accountability in LLM Apps tutorial.
How to use
APIs
Once enabled, AI Guard is accessible via its APIs, which you can explore using the interactive API Reference page.
Request
The /v1/text/guard endpoint accepts either a simple unstructured text or an array of messages in JSON format:
-
Individual values from your AI application data flows can be submitted to AI Guard as simple text in the
text
parameter. -
When multiple messages form a history of user prompts and LLM responses or represent an agent state, they can be submitted as a JSON array in the
messages
parameter, following well-known schemas from major LLM AI model providers.
To accommodate different security requirements at various stages of your AI-powered application flow, AI Guard supports multiple configurations called recipes
. When making a request, you can specify which recipe to apply using the recipe
parameter. Your Pangea User Console includes several out-of-the-box recipes designed for different stages of an AI application flow. Learn more about Pre-configured recipes for common use cases on the Recipes documentation page.
You can modify existing recipes or create new ones using the available detectors to fit your application’s needs.
The current version of AI Guard supports processing up to 20 KiB of text.
Response
The service returns the processed text or JSON, the applied recipe, a summary of actions taken, a list of detectors used, details of detections made, input modifications, and whether the request was blocked. You can safely use the sanitized content in the next step of your application flow or decide to abort the request based on the received report.
Visit the APIs page to learn more about using AI Guard APIs.
SDKs
Instead of calling the APIs directly, you can easily integrate a service client into your application using one of Pangea's SDKs .
Network gateways
AI Guard can be deployed at the network level using extensions and plugins for platforms like Kong, Portkey, CloudFlare, F5, and Fortinet. This enables organizations to secure AI-powered applications without modifying their code. These integrations provide an additional layer of security by inspecting and filtering API traffic before it reaches your application or an LLM provider.
By configuring a network AI platform with Pangea’s AI Guard plugin, organizations can:
- Block malicious requests before they reach AI models, preventing prompt injections and unauthorized access.
- Monitor and filter AI-generated responses to ensure compliance with security policies.
- Reduce latency and optimize traffic by leveraging gateway features such as caching, load balancing, and real-time threat detection.
This deployment model complements Pangea API-based and SDK-based integrations, enabling organizations to enforce security policies across all their AI services efficiently.
To learn more, visit the Network AI Deployments guide.
Was this article helpful?