Skip to main content

Sanitize

Sanitize files of scripts, malicious links, profanity, and regulated PII.

Quickstart

Try a Sanitize API request example: a step-by-step guide

Pangea’s Sanitize service lets you analyze and clean files by removing harmful or potentially dangerous content and links. Along with customizable content removal options, the Pangea User Console allows you to integrate Sanitize with other services, such as Secure Share, for enhanced file security.

tip

The current version of the Sanitize service has the following limitations:

  • The service is only available for organizations hosted on Amazon Web Services (AWS).
  • Supported file formats are: PDF (.pdf), plain text (.txt), and comma-separated values (.csv).
  • The maximum file size is 50 MB.
  • The maximum text payload (i.e., the plain text within the file) is 20 MB.
  • The maximum number of URLs that can be defanged within a file is 1,000.

If any of these limitations are exceeded, the request will result in a 400, invalid request error.

By default, the sanitized output is available for download at the location returned in result.dest_url. The download URL is valid for one hour. For more information about presigned URLs and transfer methods used by the Sanitize service, please visit our Transfer Methods page .

Alternatively, you can save the sanitized results in Secure Share .

This document provides a complete code example for making a request to the Sanitize APIs, which you can try in any of the supported environments.

note

In this quickstart guide, the example code demonstrates how to use parameters that override Sanitize settings configured by an Admin in the Pangea User Console (such as the File Scan Provider and Defang Links options). However, for production use, we recommend not overriding these settings in API calls, allowing an Admin to modify them without needing to update any deployed code.

You can find the latest examples in the SDK repository .

Configuring the Sanitize service

These steps are an overview of how to configure Sanitize for your application. For a complete set of step-by-step instructions, refer to our Overview page .

  1. Navigate to the Pangea User Console .
  2. Sign up to Pangea. As part of the sign up process, an Organization and initial token will be created.
  3. Configure the token for use with the Sanitize service.
  4. Set any desired settings in the Sanitize Settings page.

Add Sanitize to your app

The steps below will walk you through the basics of getting started with Sanitize and how to integrate the service with a Python app, including a completed code sample for applying file sanitization according to specified rules. For more information regarding the sample app, you can visit our Python SDK.

Set your environment variables

Before starting to code, it is necessary to export your token and domain variables to your project if you have not already added them to your environment.

  1. Open up a bash terminal window.
  2. Type the following commands, replacing 'yourServiceDomain' and 'yourAccessToken' with your Domain and Default Token copied from the Sanitize page of your Pangea User Console.
export PANGEA_DOMAIN="yourServiceDomain"
export PANGEA_SANITIZE_TOKEN="yourAccessToken"

Writing the Sanitize code

  1. In order to be ready to code, you must first install the Pangea Python SDK. To add the SDK to your project, you will need to run one of the following commands in your project root directory based on your preferred installation method.

Install SDK via pip:

pip3 install pangea-sdk

or

Install SDK via poetry:

poetry add pangea-sdk
  1. Next, import the Pangea libraries into your code.
import os
import sys

import pangea.exceptions as pe
from pangea.config import PangeaConfig
from pangea.response import TransferMethod
from pangea.services import Sanitize
from pangea.services.sanitize import SanitizeContent, SanitizeFile, SanitizeShareOutput
  1. Set the filepath to a file you would like to sanitize.
FILEPATH = "./sanitize_examples/ds11.pdf"
  1. Initialize the Sanitize client for use, adding the token and domain from your environment variables in order to authenticate with Pangea.
token = os.getenv("PANGEA_SANITIZE_TOKEN")
assert token

domain = os.getenv("PANGEA_DOMAIN")
assert domain

config = PangeaConfig(domain)

client = Sanitize(token, config)
  1. Define the scan and CDR providers for the file scan, then configure the sanitization parameters to be used such as defang, redact, or removing attachments.
try:
    file_scan = SanitizeFile(scan_provider="crowdstrike", cdr_provider="apryse")

    # Create content sanitization config
    content = SanitizeContent(
        url_intel=True,
        url_intel_provider="crowdstrike",
        domain_intel=True,
        domain_intel_provider="crowdstrike",
        defang=True,
        defang_threshold=20,
        remove_interactive=True,
        remove_attachments=True,
        redact=True,
    )
  1. Enable share output and its folder, send the file to Sanitize via post_url request, and generate a sanitized result including error handling. This example uses a specific configuration, but Sanitize offers more options such as content to be sanitized, the transfer method, scan provider, and more. Read more about these options on our Sanitize Settings page.
share_output = SanitizeShareOutput(enabled=True, output_folder="sdk_examples/sanitize/")

with open(FILEPATH, "rb") as f:
    response = client.sanitize(
        file=f,
        transfer_method=TransferMethod.POST_URL,
        file_scan=file_scan,
        content=content,
        share_output=share_output,
        uploaded_file_name="uploaded_file",
    )

    if response.result is None:
        print("Failed to get response")
        sys.exit(1)

    print("Sanitize request success")
    print(f"\tFile share id: {response.result.dest_share_id}")
    print(f"\tRedact data: {response.result.data.redact}")
    print(f"\tDefang data: {response.result.data.defang}")
    print(f"\tCDR data: {response.result.data.cdr}")

    if response.result.data.malicious_file:
        print("File IS malicious")
    else:
        print("File is NOT malicious")

    except pe.PangeaAPIException as e:
        print(e)

Completed code

The code sample below is a usable, copy & paste resource for this application that will work on its own. For best results, be sure to adjust all necessary placeholder data in the request (e.g. file_path) with your desired values, and experiment with Sanitize in your Pangea User Console.

import os
import sys

import pangea.exceptions as pe
from pangea.config import PangeaConfig
from pangea.response import TransferMethod
from pangea.services import Sanitize
from pangea.services.sanitize import SanitizeContent, SanitizeFile, SanitizeShareOutput

# Set this filepath to your own file
FILEPATH = "./sanitize_examples/ds11.pdf"

def main() -> None:
    token = os.getenv("PANGEA_SANITIZE_TOKEN")
    assert token

    domain = os.getenv("PANGEA_DOMAIN")
    assert domain

    config = PangeaConfig(domain)

    # Create the Sanitize client with its token and config
    client = Sanitize(token, config)
    try:
        # Create Sanitize file information, setting scan and CDR providers
        file_scan = SanitizeFile(scan_provider="crowdstrike", cdr_provider="apryse")

        # Create content sanitization config
        content = SanitizeContent(
            url_intel=True,
            url_intel_provider="crowdstrike",
            domain_intel=True,
            domain_intel_provider="crowdstrike",
            defang=True,
            defang_threshold=20,
            remove_interactive=True,
            remove_attachments=True,
            redact=True,
        )

        # Enable share output and its folder
        share_output = SanitizeShareOutput(enabled=True, output_folder="sdk_examples/sanitize/")

        with open(FILEPATH, "rb") as f:
            # Make the request to sanitize service
            response = client.sanitize(
                file=f,
                # Set transfer method to post-url
                transfer_method=TransferMethod.POST_URL,
                file_scan=file_scan,
                content=content,
                share_output=share_output,
                uploaded_file_name="uploaded_file",
            )

            if response.result is None:
                print("Failed to get response")
                sys.exit(1)

            print("Sanitize request success")
            print(f"\tFile share id: {response.result.dest_share_id}")
            print(f"\tRedact data: {response.result.data.redact}")
            print(f"\tDefang data: {response.result.data.defang}")
            print(f"\tCDR data: {response.result.data.cdr}")

            if response.result.data.malicious_file:
                print("File IS malicious")
            else:
                print("File is NOT malicious")

    except pe.PangeaAPIException as e:
        print(e)

if __name__ == "__main__":
    main()

Improving your app

The purpose of this guide is to provide the basic steps required to start coding with our Sanitize service. There are additional features that can be added to this process, such as enabling other content types to be sanitized in your Pangea User Console or integrating with other services to add security and storage for your sanitized results. Read more about the capabilities on our Sanitize Overview page.

Pangea has based Sanitize on years of experience building compliant enterprise applications. This service helps to ensure that builders have the necessary tools to meet the security needs of their application’s users.

Next steps

Was this article helpful?

Contact us