Quickstart
AI Guard can utilize a number of different Pangea services in order to protect data. Each service below is linked to documentation for getting that service running.
- Secure Audit Log - Logs the activity from the AI Guard Service in a tamperproof log for visibility, security, and compliance.
- Redact - Redacts information based on your configuration to prevent sensitive data from being viewed by unauthorized users.
- IP Intel - Performs a reputation check on IPs to understand if they are malicious or not. If the reputation check results in a score of
malicious
that is above the risk threshold as defined in the service configuration used by AI Guard, the service will defang the result when the Defang if Malicious redaction method is selected. - URL Intel - Performs a reputation check on URLs to understand if they are malicious or not. If the reputation check results in a score of
malicious
that is above the risk threshold as defined in the service configuration used by AI Guard, then the service will defang the result when the Defang if Malicious redaction method is selected. - User Intel - Performs a breach check on email addresses that are detected in data when Reputation Check is selected for email rules.
- Domain Intel - Retrieve intelligence information about domains detected in data to understand if they are malicious or not. If the reputation check results in a score of
malicious
that is above the risk threshold as defined in the service configuration used by AI Guard, then the service will defang when the Defang if Malicious redaction method is selected.
The built-in AI Guard recipes show how the Threat Intelligence services and Redact rules can be used to analyze data in four locations to secure your AI application.
- User Input Prompt (
pangea_prompt_guard
) - Ingestion (e.g. RAG) (
pangea_ingestion_guard
) - LLM Prompt Pre Send (
pangea_llm_prompt_guard
) - LLM Response (
pangea_llm_response_guard
)
This gives an AI application multiple opportunities to verify that the user should have access to the data, that the data is not malicious, the user is not able to cause the LLM to have unexpected behavior, and that the response is appropriate and safe. Completing these checks enables LLM apps to be more secure and safer for users.
Text Guard
The Text Guard API takes a recipe name and text, and it applies the recipe to the text. Recipes can detect and act on anything a Redact rule can identify within the text. Use recipes to identify sensitive and malicious data and decide how to handle the data within the context of the recipe. The actions available for a rule include Detect Only as well as various forms of redaction that can be applied. Rules that match IP addresses, URLs, Domains, and Email Addresses have the option to perform a Reputation Check. AI Guard’s configuration allows you to specify a threshold risk level to define what is malicious. Therefore IPs, URLs, and Domains matching the rules have the Defang if Malicious redaction method available as an action to minimize risk.
The Text Guard API is used to filter malicious or sensitive data in text that is used in an LLM, whether it is input by the user, provided in the source data, or created as a response from the LLM. Wherever text is being used, the Text Guard API should be implemented in order to maintain the highest security level. The API takes in the text and a recipe as the main inputs and outputs the text based on the settings of that recipe.
Recipes might contain different security settings to produce the desired functionality. For instance, an LLM that has knowledge of a company’s infrastructure and is only available to employees might provide IP addresses and domains of company resources to authorized users, but still want to defang any IP addresses inserted into the prompt from the user in order to prevent the possibility of malicious behavior.
Here is an example of the text guard API using the User Input recipe:
curl -sSLX POST 'https://ai-guard.aws.us.pangea.cloud/v1beta/text/guard' \
-H 'Authorization: Bearer pts_svcva...6zedtb' \
-H 'Content-Type: application/json' \
-d '{"text":"example@example.com","recipe":"pangea_ingestion_guard"}'
The API results include a findings
object that gives a summary of rules that matched, and reputation lookup summaries. The API results also include an artifacts
object that contains details on the rule matches. Use the optional debug
parameter to receive a report
object containing more details from threat intelligence calls.
The text, “example@example.com” is processed by the User Input recipe using the settings defined for that recipe and then the results are presented in the output result
. In this case, it is considered a compromised email address.
{
"request_id": "prq_2aitzg...kv6n3ja2",
"request_time": "2024-10-31T18:58:29.460386Z",
"response_time": "2024-10-31T18:58:30.792319Z",
"status": "Success",
"summary": "The evaluation result indicates the text is compromised",
"result": {
"redacted_prompt": "example@example.com",
"findings": {
"artifact_count": 1,
"security_issues": {
"redact_rule_match_count": 1,
"malicious_ip_count": 0,
"malicious_domain_count": 0,
"malicious_url_count": 0,
"compromised_email_addresses": 1
}
},
"artifacts": [
{
"type": "EMAIL_ADDRESS",
"value": "example@example.com",
"start": 0,
"end": 19,
"defanged": false,
"verdict": "compromised"
}
]
}
}
Was this article helpful?