Sanitize
Sanitize files of scripts, malicious links, profanity, and regulated PII.
Quickstart
Try a Sanitize API request example: a step-by-step guide
Pangea’s Sanitize service lets you analyze and clean files by removing harmful or potentially dangerous content and links. Along with customizable content removal options, the Pangea User Console allows you to integrate Sanitize with other services, such as Secure Share, for enhanced file security.
The current version of the Sanitize service has the following limitations:
- The service is only available for organizations hosted on Amazon Web Services (AWS).
- Supported file formats are: PDF (
.pdf
), plain text (.txt
), and comma-separated values (.csv
). - The maximum file size is 50 MB.
- The maximum text payload (i.e., the plain text within the file) is 20 MB.
- The maximum number of URLs that can be defanged within a file is 1,000.
If any of these limitations are exceeded, the request will result in a 400, invalid request error.
By default, the sanitized output is available for download at the location returned in result.dest_url
. The download URL is valid for one hour. For more information about presigned URLs and transfer methods used by the Sanitize service, please visit our Transfer Methods page .
Alternatively, you can save the sanitized results in Secure Share .
This document provides a complete code example for making a request to the Sanitize APIs, which you can try in any of the supported environments.
In this quickstart guide, the example code demonstrates how to use parameters that override Sanitize settings configured by an Admin in the Pangea User Console (such as the File Scan Provider and Defang Links options). However, for production use, we recommend not overriding these settings in API calls, allowing an Admin to modify them without needing to update any deployed code.
You can find the latest examples in the SDK repository .
Configuring the Sanitize service
These steps are an overview of how to configure Sanitize for your application. For a complete set of step-by-step instructions, refer to our Overview page .
- Navigate to the Pangea User Console .
- Sign up to Pangea. As part of the sign up process, an Organization and initial token will be created.
- Configure the token for use with the Sanitize service.
- Set any desired settings in the Sanitize Settings page.
Add Sanitize to your app
The steps below will walk you through the basics of getting started with Sanitize and how to integrate the service with a Python app, including a completed code sample for applying file sanitization according to specified rules. For more information regarding the sample app, you can visit our Python SDK.
Set your environment variables
Before starting to code, it is necessary to export your token and domain variables to your project if you have not already added them to your environment.
- Open up a bash terminal window.
- Type the following commands, replacing 'yourServiceDomain' and 'yourAccessToken' with your Domain and Default Token copied from the Sanitize page of your Pangea User Console.
export PANGEA_DOMAIN="yourServiceDomain"
export PANGEA_SANITIZE_TOKEN="yourAccessToken"
Writing the Sanitize code
- In order to be ready to code, you must first install the Pangea Python SDK. To add the SDK to your project, you will need to run one of the following commands in your project root directory based on your preferred installation method.
Install SDK via pip:
pip3 install pangea-sdk
or
Install SDK via poetry:
poetry add pangea-sdk
- Next, import the Pangea libraries into your code.
import os
import sys
import pangea.exceptions as pe
from pangea.config import PangeaConfig
from pangea.response import TransferMethod
from pangea.services import Sanitize
from pangea.services.sanitize import SanitizeContent, SanitizeFile, SanitizeShareOutput
- Set the filepath to a file you would like to sanitize.
FILEPATH = "./sanitize_examples/ds11.pdf"
- Initialize the Sanitize client for use, adding the token and domain from your environment variables in order to authenticate with Pangea.
token = os.getenv("PANGEA_SANITIZE_TOKEN")
assert token
domain = os.getenv("PANGEA_DOMAIN")
assert domain
config = PangeaConfig(domain)
client = Sanitize(token, config)
- Define the scan and CDR providers for the file scan, then configure the sanitization parameters to be used such as defang, redact, or removing attachments.
try:
file_scan = SanitizeFile(scan_provider="crowdstrike", cdr_provider="apryse")
# Create content sanitization config
content = SanitizeContent(
url_intel=True,
url_intel_provider="crowdstrike",
domain_intel=True,
domain_intel_provider="crowdstrike",
defang=True,
defang_threshold=20,
remove_interactive=True,
remove_attachments=True,
redact=True,
)
- Enable share output and its folder, send the file to Sanitize via post_url request, and generate a sanitized result including error handling. This example uses a specific configuration, but Sanitize offers more options such as content to be sanitized, the transfer method, scan provider, and more. Read more about these options on our Sanitize Settings page.
share_output = SanitizeShareOutput(enabled=True, output_folder="sdk_examples/sanitize/")
with open(FILEPATH, "rb") as f:
response = client.sanitize(
file=f,
transfer_method=TransferMethod.POST_URL,
file_scan=file_scan,
content=content,
share_output=share_output,
uploaded_file_name="uploaded_file",
)
if response.result is None:
print("Failed to get response")
sys.exit(1)
print("Sanitize request success")
print(f"\tFile share id: {response.result.dest_share_id}")
print(f"\tRedact data: {response.result.data.redact}")
print(f"\tDefang data: {response.result.data.defang}")
print(f"\tCDR data: {response.result.data.cdr}")
if response.result.data.malicious_file:
print("File IS malicious")
else:
print("File is NOT malicious")
except pe.PangeaAPIException as e:
print(e)
Completed code
The code sample below is a usable, copy & paste resource for this application that will work on its own. For best results, be sure to adjust all necessary placeholder data in the request (e.g. file_path
) with your desired values, and experiment with Sanitize in your Pangea User Console.
import os
import sys
import pangea.exceptions as pe
from pangea.config import PangeaConfig
from pangea.response import TransferMethod
from pangea.services import Sanitize
from pangea.services.sanitize import SanitizeContent, SanitizeFile, SanitizeShareOutput
# Set this filepath to your own file
FILEPATH = "./sanitize_examples/ds11.pdf"
def main() -> None:
token = os.getenv("PANGEA_SANITIZE_TOKEN")
assert token
domain = os.getenv("PANGEA_DOMAIN")
assert domain
config = PangeaConfig(domain)
# Create the Sanitize client with its token and config
client = Sanitize(token, config)
try:
# Create Sanitize file information, setting scan and CDR providers
file_scan = SanitizeFile(scan_provider="crowdstrike", cdr_provider="apryse")
# Create content sanitization config
content = SanitizeContent(
url_intel=True,
url_intel_provider="crowdstrike",
domain_intel=True,
domain_intel_provider="crowdstrike",
defang=True,
defang_threshold=20,
remove_interactive=True,
remove_attachments=True,
redact=True,
)
# Enable share output and its folder
share_output = SanitizeShareOutput(enabled=True, output_folder="sdk_examples/sanitize/")
with open(FILEPATH, "rb") as f:
# Make the request to sanitize service
response = client.sanitize(
file=f,
# Set transfer method to post-url
transfer_method=TransferMethod.POST_URL,
file_scan=file_scan,
content=content,
share_output=share_output,
uploaded_file_name="uploaded_file",
)
if response.result is None:
print("Failed to get response")
sys.exit(1)
print("Sanitize request success")
print(f"\tFile share id: {response.result.dest_share_id}")
print(f"\tRedact data: {response.result.data.redact}")
print(f"\tDefang data: {response.result.data.defang}")
print(f"\tCDR data: {response.result.data.cdr}")
if response.result.data.malicious_file:
print("File IS malicious")
else:
print("File is NOT malicious")
except pe.PangeaAPIException as e:
print(e)
if __name__ == "__main__":
main()
Improving your app
The purpose of this guide is to provide the basic steps required to start coding with our Sanitize service. There are additional features that can be added to this process, such as enabling other content types to be sanitized in your Pangea User Console or integrating with other services to add security and storage for your sanitized results. Read more about the capabilities on our Sanitize Overview page.
Pangea has based Sanitize on years of experience building compliant enterprise applications. This service helps to ensure that builders have the necessary tools to meet the security needs of their application’s users.
Next steps
- Check out our Admin Guide if you have a specific task you would like to complete.
- If you are feeling confident, you can browse our APIs or explore our GitHub repo , which has libraries for supported languages, SDKs, sample apps, etc.
- For any questions, you can connect with our Pangea Discourse community or continue exploring our Sanitize documentation.
Was this article helpful?