Recipes
Overview
AI Guard recipes
are reusable security checks and transformations designed for specific scenarios in your AI application's data flow. You can specify a recipe name in an API request to apply it to the provided content.
A recipe is a collection of configurable detectors
, where each detector identifies a particular type of threat, such as personally identifiable information (PII), malicious entities, prompt injection, or toxic content, and applies a specified action.
A detector
may consist of a single component that applies a single action. Some detectors may have multiple rules, where each rule detects and acts on a specific data type within the broader threat category. For example, the Confidential and PII detector can identify and apply actions to credit card numbers, email addresses, locations, and other sensitive data types.
Recipe settings
You can manage AI Guard recipes
on the Recipes page in your Pangea User Console .
-
Click the + Recipe button to add a new recipe.
-
Use the triple-dot button next to an existing recipe to:
-
Update the recipe's display name and description.
-
Clone the recipe.
-
Delete the recipe.
-
Manage the recipe's redaction settings.
Currently, you can enable deterministic (reproducible) Format Preserving Encryption (FPE) as a redaction method. For details, see the Format Preserving Encryption (FPE) section under Redact actions below.
-
AI Guard Sandbox
The AI Guard Sandbox is an LLM chat-based UI for testing AI Guard recipes as you edit them.
- User/System (dropdown) - Select either the "user" or "system" role to populate corresponding messages.
- Enable LLM processing prompt (star button) - Enable or disable LLM processing. When disabled, only AI Guard processing is applied, and the message is not sent to the LLM.
- Reset chat history (time machine button) - Clear chat history to test new input scenarios.
- View request preview (< > button) - Preview the request that will be sent to AI Guard APIs.
- View full response (< > button in the response window) - See the complete JSON response, including details about detections made and actions taken.
If your test prompt is blocked, the message history displayed in the response will not be carried over to the next prompt.
Pre-configured recipes for common use cases
The default AI Guard configuration includes four recipes tailored for common use cases. The recipe name to be used in an API call is displayed next to its display name.
- User Input Prompt (
pangea_prompt_guard
) - Processes initial user input. By default, this recipe blocks prompt injection. - Ingestion (e.g. RAG) (
pangea_ingestion_guard
) - Analyzes data ingested in a Retrieval-Augmented Generation (RAG) system. By default, this recipe blocks prompt injection and certain malicious entities, confidential and PII data, and known secrets. - Pre LLM (
pangea_llm_prompt_guard
) - Validates final input submitted to an LLM, after adding context (for example, in a RAG system), but before the LLM receives the prompt. By default, this recipe blocks prompt injection and redacts certain confidential and PII data and known secrets. - LLM Response (
pangea_llm_response_guard
) - Filters and sanitizes AI-generated responses. By default, this recipe redacts certain confidential and PII data. - Agent Pre Plan (
pangea_agent_pre_plan_guard
) - Ensures that no prompt injections can influence or alter the agent's plan for solving a task. This recipe helps prevent any manipulation that could modify the agent’s approach or introduce unintended risks before task execution begins. By default, this recipe blocks prompt injection. - Agent Pre Tool (
pangea_agent_pre_tool_guard
) - Prevents malicious entities or sensitive information from being passed to the tool. This recipe mitigates the risk of exposing sensitive data and ensures harmful input is not sent to external tools or APIs. By default, this recipe blocks certain malicious entities and confidential and PII data. - Agent Post Tool (
pangea_agent_post_tool_guard
) - Prevents malicious entities or sensitive information from being present in the tool’s or agent's output before it is returned to the caller, passed to the next tool, or forwarded to another agent. By default, this recipe blocks certain malicious entities and confidential and PII data.
The out-of-the-box recipes serve as examples and starting points for your custom configuration. You can view each recipe’s functionality, the detectors it includes, and how they are configured in your Pangea User Console. From there, you can modify existing configurations or create new recipes to meet your security requirements.
Detectors
Within each recipe, you can enable, configure, and disable individual detectors
.
AI Guard provides the following detectors:
Prompt Injection
Detects attempts to manipulate AI prompts with adversarial inputs. Supported actions:
- Report Only
- Block
Prompt Hardening (coming soon)
Strengthens prompts to resist manipulation and unauthorized modifications.
Malicious Entity
Detects harmful references such as malicious IPs, URLs, and domains. You can define individual rules for each of the three supported malicious entity types (IP Address, URL, Domain) and apply specific actions for each rule:
- Report Only
- Defang
- Block
- Disabled
Confidential and PII
Detects personally identifiable information (PII) and other confidential data, such as email addresses, credit cards, government-issued IDs, etc. You can add individual rules for each detection type, such as Email Address, US Social Security Number, Credit Card, etc., and apply specific actions to each rule:
- Block
- Replacement
- Mask (<****>)
- Partial Mask (****xxxx)
- Report Only
- Hash
- Format Preserving Encryption
Secret and Key Entity
Detects sensitive credentials like API keys, encryption keys, etc. You can add individual rules for each of the supported secret types and apply specific actions to each rule:
- Block
- Replacement
- Mask (<****>)
- Partial Mask (****xxxx)
- Report Only
- Hash
- Format Preserving Encryption
Profanity and Toxicity
Detects offensive, inappropriate, or toxic language. Supported actions:
- Report Only
- Block
Additionally, you can adjust the confidence threshold for detections, making the detector more or less sensitive.
Language
Detects spoken language and applies language-based security policies. You can create a list of supported languages and select an action for language detection:
- Allow List
- Block List
- Report Only
Gibberish
Detects nonsensical or meaningless text to filter out low-quality or misleading inputs. Supported actions:
- Report Only
- Block
Additionally, you can adjust the confidence threshold for detections, making the detector more or less sensitive.
Code
Detects attempts to insert executable code into AI interactions. Supported actions:
- Report Only
- Block
Negative Sentiment
Detects text expressing negative emotions, such as anger, frustration, or dissatisfaction, to assess potential risks or harmful intent. Supported actions:
- Report Only
- Block
Additionally, you can adjust the confidence threshold for detections, making the detector more or less sensitive.
Self-Harm and Violence
Detects mentions of self-harm, violence, or dangerous behaviors. Supported actions:
- Report Only
- Block
Additionally, you can adjust the confidence threshold for detections, making the detector more or less sensitive.
Topic (coming soon)
Detects content related to restricted or disallowed topics.
Competitors (coming soon)
Detects mentions of competing brands or entities.
Custom Entity
Define multiple rules to detect specific text patterns or sensitive terms and apply specific actions for each rule:
- Block
- Replacement
- Mask (<****>)
- Partial Mask (****xxxx)
- Report Only
- Hash
- Format Preserving Encryption
Actions
Actions associated with detectors may transform the submitted text by redacting or encrypting the detected rule matches. Blocking actions may prevent subsequent detectors from running. Detection may also result in no changes to the input or processing and only be reported.
All results of processing by AI Guard are included in the API response. Learn more about the response attributes on the AI Guard APIs documentation page.
Requests to AI Guard APIs are logged by the service. You can inspect the logs on the service's Activity Log page in your Pangea User Console .
The following actions are currently supported across different detectors:
Block
Blocking a detection results in reporting the blocked action, and if any action is blocked, the top-level blocked
key in the API response is set to true
. This indicates that the content returned from AI Guard should not be processed further by your application.
A blocking action can also stop execution and prevent some detectors from running.
Block all except
The Block all except option in the Language detector explicitly allows inputs only in the specified language(s).
Defang
Malicious IP addresses, URLs, or domains are modified to prevent accidental clicks or execution while preserving their readability for analysis. This helps reduce the risk of inadvertently accessing harmful content. For example, a defanged IP address may look like: 47[.]84[.]32[.]175
.
Disabled
Prevents processing of a particular rule or entity type.
Report Only
The detection is reported in the API response, but no action is taken on the detected content.
Redact actions
Redact actions transform the detected text via configurable rules. For each rule, you can select a specific action and/or edit it by clicking on the rule name or the triple-dot button.
Use the Save button to apply your changes.
In the Test Rules pane on the right, you can validate your rules using different data types.
Replacement
Replaces the rule-matching data with the Replacement Value selected in the rule's action.
Mask (<****>)
Replaces the rule-matching text with asterisks.
Partial Mask (****xxxx)
Partially replaces the rule-matching text with asterisks or a custom character. In the Edit Rule dialog, you can configure partial masking using the following options:
- Masking Character - Specify the character for masking (for example,
#
). - Masking Options
- Unmasked from left - Define the number of starting characters to leave unmasked. Use the input field or the increase/decrease UI buttons.
- Unmasked from right - Define the number of ending characters to leave unmasked. Use the input field or the increase/decrease UI buttons.
- Characters to Ignore - Specify characters that should remain unmasked (for example,
-
).
Hash
Replaces the detected text with hashed values. To enable hash redaction, click Enable Hash Redaction and create a salt
value, which will be saved as a Vault secret.
Format Preserving Encryption (FPE)
Format Preserving Encryption (FPE) preserves the format of redacted data while making it recoverable. For details on the FPE redaction method, visit the Redact documentation pages.
In AI Guard recipe settings, you can enable Deterministic Format Preserving Encryption (FPE) in the Manage Redact Settings dialog, accessed via the triple-dot menu next to the recipe name. From there, you can create or select a custom tweak value for the FPE redaction method.
A tweak
is an additional input used alongside the plaintext and encryption key to enhance security. It makes it harder for attackers to use statistical methods to break the encryption. Different tweak values produce different outputs for the same encryption key and data. The original tweak value used for encryption is required to decrypt the data.
Using a custom tweak ensures that the same original value produces the same encrypted value on every request, making it deterministic. If no tweak value is provided, a random string is generated, and the encrypted value will differ on each request.
Whether you use a custom or randomly generated tweak, it is returned in the API response as part of the fpe_context
attribute, which you can use to decrypt and recover the original value.
Learn how to decrypt FPE-redacted values on the APIs documentation page.
Was this article helpful?