Prompt Guard
This service analyzes prompts to identify malicious and harmful intents like prompt injections or attempts to misuse/abuse the LLMs.
About
Prompt Guard is a security service that detects prompt injection attacks and jailbreak attempts in AI applications. The service protects against common, obfuscated, and multi-step attacks by analyzing user prompts and contextual data from various sources, ensuring the integrity and safety of AI applications.
You can integrate Prompt Guard into your AI application through Pangea's APIs and SDKs.
As part of the broader AI Guard solution, Prompt Guard also serves as one of its detectors, scanning for malicious prompt alterations that could compromise the intended behavior of large language models (LLMs).
How it Works
Prompt Guard leverages a combination of heuristics, classifiers, and trained models to detect prompt injection attempts. The service is model-agnostic and integrates with any LLM framework, including LangChain and LlamaIndex, making it adaptable to diverse AI application architectures.
It uses a suite of analyzers
to examine both direct user inputs and any additional contextual data, such as content from vector databases.
You can call the service APIs at any point in your AI application's data flow, including during user input processing, context aggregation, and data ingestion pipelines.
The API response details whether a detection was made, the type of detection, information about the specific analyzer that flagged a threat, the confidence score (ranging from 0.00 to 1.00), and a summary. If a prompt injection attack or jailbreak attempt is reported, you should not proceed with processing the request in your application.
Analyzers
Analyzers in Prompt Guard are specialized modules that help detect malicious prompt behavior and jailbreak attempts. They are organized by code categories based on their underlying approach:
- PA100x - These analyzers use heuristic methods to spot common attack patterns, such as the notorious "DAN" jailbreak prompts.
- PA200x - Built with classifiers using techniques like neural networks and SVMs, these analyzers evaluate prompts for generic unwanted behaviors.
- PA300x - Leveraging cloud LLMs, these analyzers check for prompt injection using multiple analysis methods, benefiting from the scalability and diverse perspectives of cloud-based models.
- PA400x - These analyzers employ fine-tuned models or small LLMs trained on a ground truth dataset, offering refined detection capabilities tailored to specific patterns.
- PA600x - These analyzers are dedicated to generic benign prompt detection, ensuring that known safe patterns are correctly identified and not falsely flagged as malicious.
Additionally, Prompt Guard allows you to define custom benign and malicious prompt examples tailored to your application's specific needs, helping to mitigate false positive and false negative detections.
By combining these diverse approaches, the service provides a comprehensive and customizable defense against a wide array of prompt injection attacks, ensuring that both direct and contextual threats are effectively identified and mitigated.
Logging and Attribution
When you enable Prompt Guard, its Activity Log is automatically activated. It captures key details about each service call - including the service name, input, contextual information, and detection findings - using a schema specifically designed to track AI application activity.
Usage data is summarized on the Prompt Guard Overview page in your Pangea User Console , while individual log records can be viewed on the Activity Log page.
The Activity Log leverages Secure Audit Log within your Pangea project. When you activate this service in your Pangea User Console , you gain access to your AI Activity Audit Log Schema. This allows you to log application-specific insights alongside service activity, ensuring full-circle attribution for detected threats. If you use the same audit log schema, the application logs are summarized in the dashboard on the service Overview page and detailed in its Activity Log.
Pangea's Secure Audit Log supports multiple configurations, allowing you to define separate schemas to track your application-specific activities.
For an example of how to trace AI application events using Pangea's Secure Audit Log, see the Attribution and Accountability in LLM Apps tutorial.
How to use
Configuration
You can manage Prompt Guard in your Pangea User Console , where you can enable or disable specific analyzers and add custom benign or malicious prompt lists to mitigate false positive and false negative detections.
Integration with AI Guard is straightforward - add the Prompt Injection detector to any of your AI Guard Recipes .
APIs
Once enabled, Prompt Guard is accessible via its APIs, which you can explore using the interactive API Reference page.
Request
The /v1/guard endpoint accepts an array of system and user messages
in JSON format. Each message object must include a role
key (set to "user", "assistant", or "system") and a content
key containing the text to be analyzed by Prompt Guard.
The current version of Prompt Guard supports processing up to 20 KiB of text.
Response
The service returns a summary and a result object populated with the following information:
- Whether a detection was made
- Confidence score (ranging from 0.00 to 1.00)
- The analyzer that made the detection
- The type of detection (e.g., direct or indirect prompt injection)
- Classification results, if requested
- Additional details if available
Visit the APIs page to learn more about using Prompt Guard APIs.
SDKs
Instead of calling the APIs directly, you can easily integrate a service client into your application using one of Pangea's SDKs .
Was this article helpful?