Quickstart
About Sanitize
Pangea’s Sanitize service allows you to analyze and clean potentially harmful files, removing any actionable or potentially harmful content and links. In addition to customization options for content removal, your Pangea User Console also offers the ability to sync with our other services such as Secure Share for enhanced file security.
To provide a complete demonstration, the example code in this quickstart guide uses parameters that override settings which can be configured by an Admin in the Settings menu of the Pangea user Console (e.g. scan_provider, cdr_provider, defang options, etc.). For the most flexibility, we recommend to not override these settings in API calls, allowing an Admin to change them without the need to alter any deployed code.
You can find the latest examples in the SDK repo.
The current (Beta) version of Sanitize has the following limitations:
-
The service is only available for organizations hosted on Amazon (AWS).
-
Only PDF files are supported.
-
The recommended maximum file size is 5 MB.
For files with attachments/media, testing is allowed up to 15 MB. Files exceeding 5 MB with extensive text and numerous URLs may not process successfully.
-
By default, the sanitized output is available for download at the location returned in
result.dest_url
. The download URL is valid for one hour.Alternatively, you can save the results of sanitization in Secure Share.
-
For more information regarding pre-signed URLs and transfer methods used by the Sanitize service, you can visit our Transfer Methods page
Configuring the Sanitize service
These steps are an overview of how to configure Sanitize for your application. For a complete set of step-by-step instructions, refer to our Overview page.
- Navigate to the Pangea User Console .
- Sign up to Pangea. As part of the sign up process, an Organization and initial token will be created.
- Configure the token for use with the Sanitize service.
- Set any desired settings in the Sanitize Settings page.
Add Sanitize to your app
The steps below will walk you through the basics of getting started with Sanitize and how to integrate the service with a Python app, including a completed code sample for applying file sanitization according to specified rules. For more information regarding the sample app, you can visit our Python SDK.
Set your environment variables
Before starting to code, it is necessary to export your token and domain variables to your project if you have not already added them to your environment.
- Open up a bash terminal window.
- Type the following commands, replacing 'yourServiceDomain' and 'yourAccessToken' with your Domain and Default Token copied from the Sanitize page of your Pangea User Console.
export PANGEA_DOMAIN="yourServiceDomain"
export PANGEA_SANITIZE_TOKEN="yourAccessToken"
Writing the Sanitize code
- In order to be ready to code, you must first install the following (beta) version of the Pangea Python SDK. To add the SDK to your project, you will need to run one of the following commands in your project root directory based on your preferred installation method.
Install SDK via pip:
pip3 install pangea-sdk==3.8.0b1
or
Install SDK via poetry:
poetry add pangea-sdk==3.8.0b1
- Next, import the Pangea libraries into your code.
import os
import sys
import pangea.exceptions as pe
from pangea.config import PangeaConfig
from pangea.response import TransferMethod
from pangea.services import Sanitize
from pangea.services.sanitize import SanitizeContent, SanitizeFile, SanitizeShareOutput
- Set the filepath to a file you would like to sanitize.
FILEPATH = "./sanitize_examples/ds11.pdf"
- Initialize the Sanitize client for use, adding the token and domain from your environment variables in order to authenticate with Pangea.
token = os.getenv("PANGEA_SANITIZE_TOKEN")
assert token
domain = os.getenv("PANGEA_DOMAIN")
assert domain
config = PangeaConfig(domain)
client = Sanitize(token, config)
- Define the scan and CDR providers for the file scan, then configure the sanitization parameters to be used such as defang, redact, or removing attachments.
try:
file_scan = SanitizeFile(scan_provider="crowdstrike", cdr_provider="apryse")
# Create content sanitization config
content = SanitizeContent(
url_intel=True,
url_intel_provider="crowdstrike",
domain_intel=True,
domain_intel_provider="crowdstrike",
defang=True,
defang_threshold=20,
remove_interactive=True,
remove_attachments=True,
redact=True,
)
- Enable share output and its folder, send the file to Sanitize via post_url request, and generate a sanitized result including error handling. This example uses a specific configuration, but Sanitize offers more options such as content to be sanitized, the transfer method, scan provider, and more. Read more about these options on our Sanitize Settings page.
share_output = SanitizeShareOutput(enabled=True, output_folder="sdk_examples/sanitize/")
with open(FILEPATH, "rb") as f:
response = client.sanitize(
file=f,
transfer_method=TransferMethod.POST_URL,
file_scan=file_scan,
content=content,
share_output=share_output,
uploaded_file_name="uploaded_file",
)
if response.result is None:
print("Failed to get response")
sys.exit(1)
print("Sanitize request success")
print(f"\tFile share id: {response.result.dest_share_id}")
print(f"\tRedact data: {response.result.data.redact}")
print(f"\tDefang data: {response.result.data.defang}")
print(f"\tCDR data: {response.result.data.cdr}")
if response.result.data.malicious_file:
print("File IS malicious")
else:
print("File is NOT malicious")
except pe.PangeaAPIException as e:
print(e)
Completed code
The code sample below is a usable, copy & paste resource for this application that will work on its own. For best results, be sure to adjust all necessary placeholder data in the request (e.g. file_path
) with your desired values, and experiment with Sanitize in your Pangea User Console.
This is part of the Beta release for Sanitize. There are certain limitations including file size of items that can be sanitized at the moment; for more information please visit our Sanitize Overview page.
By default sanitized output is available for one hour - download location is returned by your result.dest_url below.
import os
import sys
import pangea.exceptions as pe
from pangea.config import PangeaConfig
from pangea.response import TransferMethod
from pangea.services import Sanitize
from pangea.services.sanitize import SanitizeContent, SanitizeFile, SanitizeShareOutput
# Set this filepath to your own file
FILEPATH = "./sanitize_examples/ds11.pdf"
def main() -> None:
token = os.getenv("PANGEA_SANITIZE_TOKEN")
assert token
domain = os.getenv("PANGEA_DOMAIN")
assert domain
config = PangeaConfig(domain)
# Create the Sanitize client with its token and config
client = Sanitize(token, config)
try:
# Create Sanitize file information, setting scan and CDR providers
file_scan = SanitizeFile(scan_provider="crowdstrike", cdr_provider="apryse")
# Create content sanitization config
content = SanitizeContent(
url_intel=True,
url_intel_provider="crowdstrike",
domain_intel=True,
domain_intel_provider="crowdstrike",
defang=True,
defang_threshold=20,
remove_interactive=True,
remove_attachments=True,
redact=True,
)
# Enable share output and its folder
share_output = SanitizeShareOutput(enabled=True, output_folder="sdk_examples/sanitize/")
with open(FILEPATH, "rb") as f:
# Make the request to sanitize service
response = client.sanitize(
file=f,
# Set transfer method to post-url
transfer_method=TransferMethod.POST_URL,
file_scan=file_scan,
content=content,
share_output=share_output,
uploaded_file_name="uploaded_file",
)
if response.result is None:
print("Failed to get response")
sys.exit(1)
print("Sanitize request success")
print(f"\tFile share id: {response.result.dest_share_id}")
print(f"\tRedact data: {response.result.data.redact}")
print(f"\tDefang data: {response.result.data.defang}")
print(f"\tCDR data: {response.result.data.cdr}")
if response.result.data.malicious_file:
print("File IS malicious")
else:
print("File is NOT malicious")
except pe.PangeaAPIException as e:
print(e)
if __name__ == "__main__":
main()
Improving your app
The purpose of this guide is to provide the basic steps required to start coding with our Sanitize service. There are additional features that can be added to this process, such as enabling other content types to be sanitized in your Pangea User Console or integrating with other services to add security and storage for your sanitized results. Read more about the capabilities on our Sanitize Overview page.
Pangea has based Sanitize on years of experience building compliant enterprise applications. This service helps to ensure that builders have the necessary tools to meet the security needs of their application’s users.
Next steps
- Check out our Admin Guide if you have a specific task you would like to complete
- If you are feeling confident, you can browse our APIs or explore our Github repo, which has libraries for supported languages, SDKs, sample apps, etc.
- For any questions, you can connect with our Pangea Slack for Builders or continue exploring our Sanitize documentation
Was this article helpful?