AI Guard Overview

About

AI Guard is a robust security solution that protects data and interactions with LLMs within AI-powered applications. It ensures sensitive information such as secrets and personally identifiable information (PII) is safeguarded, blocks malicious content, and captures application events in an audit trail.

By integrating with your app through Pangea's APIs and SDKs, AI Guard strengthens compliance, minimizes security risks, and ensures safe, reliable AI interactions, helping to build trust in your AI systems.

Using extensions, AI Guard can also be integrated with AI and API gateways.

How it Works

AI Guard APIs accept text from any point in your AI application's data flow and apply a layered approach to detect and mitigate risks in the submitted content. It identifies prompt injection, removes malicious content, applies contextual content filtering, and prevents the exposure of sensitive information by redacting or encrypting it.

These protections are based on customizable configurations called recipes, tailored to different stages of an AI application’s data flow. This includes direct interactions with the LLM, agent data exchanges and state, and the ingestion and retrieval of data used in retrieval-augmented generation (RAG). Recipes consist of multiple detectors, each applying specific action to various detections.

Threat and sensitive data detection is powered by heuristics, machine learning (ML) classifiers, and fine-tuned LLMs.

In the response, AI Guard provides a report for each detection, details of any modifications made to the input, and the sanitized data for your application to use in the next step. If a detection is "blocked", your application can decide whether to pass the compromised content to the next recipient - an LLM, a (vector) store, an agent, or the user. A blocking action may also trigger an early system exit, optimizing resource usage and improving performance.

Requests to the AI Guard APIs and their processing results are logged in your Pangea project’s audit trail, ensuring accountability and enabling attribution for security reviews.

Recipes

You can create recipes to handle various AI application scenarios. Recipes are collections of detectors, each of which can be configured to block, report, redact, encrypt, or defang sensitive or malicious content. Several out-of-the-box recipes are included in the default service configuration, optimized for common use cases such as:

Processing user prompts
Analyzing RAG ingestion data
Filtering and sanitizing responses received from the LLM
Ensuring agent planning integrity
Safeguarding tool input parameters
Verifying agent and tool outputs

You can modify existing recipes or create an unlimited number of new ones using the available detectors on the AI Guard Recipes configuration page in your Pangea User Console .

Detectors

An AI Guard detector is a recipe "ingredient" that identifies a particular type of threat, such as personally identifiable information (PII), malicious entities, prompt injection, or toxic content.

Some detectors consist of a single component, while others contain multiple entities, each responsible for identifying and acting on a specific detection rule.

A detector can report, block, or modify the original text by replacing, encrypting, or defanging a sensitive or malicious detection - ensuring that the results returned by AI Guard can be safely passed to the next step in your application’s data flow.

Detectors can be enabled, disabled, or configured based on your security policies. They serve as the building blocks of a recipe, running together to provide comprehensive text security.

The following detectors are available in AI Guard:

Malicious Prompt - Reports or blocks attempts to manipulate AI prompts with adversarial inputs.
Malicious Entity - Reports, defangs, or blocks harmful references such as malicious IPs, URLs, and domains.
Confidential and PII - Reports, redacts, encrypts, or blocks personally identifiable information (PII) and other confidential data, such as email addresses, credit cards and bank numbers, government-issued IDs, etc.
Secret and Key Entity - Reports, redacts, encrypts, or blocks sensitive credentials like API keys, encryption keys, etc.
Language - Reports, blocks, or explicitly allows a spoken language to enforce language-based security policies.
Code - Reports or blocks attempts to insert executable code into AI interactions.
Competitors - Reports or blocks mentions of competing brands or entities.
Custom Entity - Allows users to define and detect specific text patterns or sensitive terms unique to their needs that AI Guard will report, redact, encrypt, or block.
Topic - Reports or blocks content related to restricted or disallowed topics, such as negative sentiment, self-harm, violence, or other harmful subjects.
Image (early access) - Reports or blocks images that may contain sensitive or inappropriate content, such as nudity, violence, or other harmful material.
Prompt Hardening (early access) - Applies in-context defenses to reduce the risk of prompt injection. This includes cautionary instructions for the LLM, justification requirements for responses, and optional constraints such as response token limits.

tip

To learn more about configuring recipes and detectors, visit the Recipes documentation page.

Actionable responses

You can configure webhooks and link them to specific detections in your AI Guard configuration. This allows you to stay informed about key activities in your AI app and take action to address potential issues.

Logging and attribution

When you enable AI Guard, its Activity Log is automatically activated. It captures key details about each service call, including the service name, input, output, detections, and contextual information using a dedicated schema, specifically designed to track AI application activity.

Usage data is summarized on the AI Guard Overview page in your Pangea User Console , while individual log records can be viewed on the Activity Log page.

The Activity Log leverages Secure Audit Log within your Pangea project. When you activate this service in your Pangea User Console , you gain access to your AI Activity Audit Log Schema. This allows you to log application-specific insights alongside service activity, ensuring full-circle attribution for detected threats. If you use the same audit log schema, the application logs are summarized in the dashboard on the service Overview page and detailed in its Activity Log.

note

Pangea's Secure Audit Log supports multiple configurations, allowing you to define separate schemas to track your application-specific activities.

For an example of how to trace AI application events using Pangea's Secure Audit Log, see the Attribution and Accountability in LLM Apps tutorial.

How to use

APIs

Once enabled, AI Guard is accessible via its APIs, which you can explore using the interactive API Reference page.

Request

The /v1/text/guard endpoint accepts either a simple unstructured text or an array of messages in JSON format:

Individual values from your AI application data flows can be submitted to AI Guard as simple text in the text parameter.
When multiple messages form a history of user prompts and LLM responses or represent an agent state, they can be submitted as a JSON array in the messages parameter, following well-known schemas from major LLM AI model providers.

To accommodate different security requirements at various stages of your AI-powered application flow, AI Guard supports multiple configurations called recipes. When making a request, you can specify which recipe to apply using the recipe parameter. Your Pangea User Console includes several out-of-the-box recipes designed for different stages of an AI application flow. Learn more about Pre-configured recipes for common use cases on the Recipes documentation page.

You can modify existing recipes or create new ones using the available detectors to fit your application’s needs.

tip

The current version of AI Guard supports processing up to 20 KiB of text.

Response

The service returns the processed text or JSON, the applied recipe, a summary of actions taken, a list of detectors used, details of detections made, input modifications, and whether the request was blocked. You can safely use the sanitized content in the next step of your application flow or decide to abort the request based on the received report.

Visit the APIs page to learn more about using AI Guard APIs.

SDKs

Instead of calling the APIs directly, you can easily integrate a service client into your application using one of Pangea's SDKs .

API Gateways

AI Guard can be deployed at the network level using extensions and plugins for platforms like Portkey, Kong, LiteLLM, CloudFlare, F5, and Fortinet. This enables organizations to secure AI-powered applications without modifying their code. These integrations provide an additional layer of security by inspecting and filtering API traffic before it reaches your application or an LLM provider.

By configuring a network AI platform with Pangea’s AI Guard plugin, organizations can:

Block malicious requests before they reach AI models, preventing prompt injections and unauthorized access.
Monitor and filter AI-generated responses to ensure compliance with security policies.
Reduce latency and optimize traffic by leveraging gateway features such as caching, load balancing, and real-time threat detection.

This deployment model complements Pangea API-based and SDK-based integrations, enabling organizations to enforce security policies across all their AI services efficiently.

To learn more, visit the API Gateways guide.

Was this article helpful?

About​

How it Works​

Recipes​

Detectors​

Actionable responses​

Logging and attribution​

How to use​

APIs​

Request​

Response​

SDKs​

API Gateways​