Skip to main content

Quickstart

About Sanitize

Pangea’s Sanitize service allows you to analyze and clean potentially harmful files, removing any actionable or potentially harmful content and links. In addition to customization options for content removal, your Pangea User Console also offers the ability to sync with our other services such as Secure Share for enhanced file security.

note

To provide a complete demonstration, the example code in this quickstart guide uses parameters that override settings which can be configured by an Admin in the Settings menu of the Pangea user Console (e.g. scan_provider, cdr_provider, defang options, etc.). For the most flexibility, we recommend to not override these settings in API calls, allowing an Admin to change them without the need to alter any deployed code.

You can find the latest examples in the SDK repo.

The current (Beta) version of Sanitize has the following limitations:

  • The service is only available for organizations hosted on Amazon (AWS).

  • Only PDF files are supported.

  • The recommended maximum file size is 5 MB.

    For files with attachments/media, testing is allowed up to 15 MB. Files exceeding 5 MB with extensive text and numerous URLs may not process successfully.

  • By default, the sanitized output is available for download at the location returned in result.dest_url. The download URL is valid for one hour.

    Alternatively, you can save the results of sanitization in Secure Share.

  • For more information regarding pre-signed URLs and transfer methods used by the Sanitize service, you can visit our Transfer Methods page

Configuring the Sanitize service

These steps are an overview of how to configure Sanitize for your application. For a complete set of step-by-step instructions, refer to our Overview page.

  1. Navigate to the Pangea User Console .
  2. Sign up to Pangea. As part of the sign up process, an Organization and initial token will be created.
  3. Configure the token for use with the Sanitize service.
  4. Set any desired settings in the Sanitize Settings page.

Add Sanitize to your app

The steps below will walk you through the basics of getting started with Sanitize and how to integrate the service with a Python app, including a completed code sample for applying file sanitization according to specified rules. For more information regarding the sample app, you can visit our Python SDK.

Set your environment variables

Before starting to code, it is necessary to export your token and domain variables to your project if you have not already added them to your environment.

  1. Open up a bash terminal window.
  2. Type the following commands, replacing 'yourServiceDomain' and 'yourAccessToken' with your Domain and Default Token copied from the Sanitize page of your Pangea User Console.
export PANGEA_DOMAIN="yourServiceDomain"
export PANGEA_SANITIZE_TOKEN="yourAccessToken"

Writing the Sanitize code

  1. In order to be ready to code, you must first install the following (beta) version of the Pangea Python SDK. To add the SDK to your project, you will need to run one of the following commands in your project root directory based on your preferred installation method.

Install SDK via pip:

pip3 install pangea-sdk==3.8.0b1

or

Install SDK via poetry:

poetry add pangea-sdk==3.8.0b1
  1. Next, import the Pangea libraries into your code.
import os
import sys

import pangea.exceptions as pe
from pangea.config import PangeaConfig
from pangea.response import TransferMethod
from pangea.services import Sanitize
from pangea.services.sanitize import SanitizeContent, SanitizeFile, SanitizeShareOutput
  1. Set the filepath to a file you would like to sanitize.
FILEPATH = "./sanitize_examples/ds11.pdf"
  1. Initialize the Sanitize client for use, adding the token and domain from your environment variables in order to authenticate with Pangea.
token = os.getenv("PANGEA_SANITIZE_TOKEN")
assert token

domain = os.getenv("PANGEA_DOMAIN")
assert domain

config = PangeaConfig(domain)

client = Sanitize(token, config)
  1. Define the scan and CDR providers for the file scan, then configure the sanitization parameters to be used such as defang, redact, or removing attachments.
try:
    file_scan = SanitizeFile(scan_provider="crowdstrike", cdr_provider="apryse")

    # Create content sanitization config
    content = SanitizeContent(
        url_intel=True,
        url_intel_provider="crowdstrike",
        domain_intel=True,
        domain_intel_provider="crowdstrike",
        defang=True,
        defang_threshold=20,
        remove_interactive=True,
        remove_attachments=True,
        redact=True,
    )
  1. Enable share output and its folder, send the file to Sanitize via post_url request, and generate a sanitized result including error handling. This example uses a specific configuration, but Sanitize offers more options such as content to be sanitized, the transfer method, scan provider, and more. Read more about these options on our Sanitize Settings page.
share_output = SanitizeShareOutput(enabled=True, output_folder="sdk_examples/sanitize/")

with open(FILEPATH, "rb") as f:
    response = client.sanitize(
        file=f,
        transfer_method=TransferMethod.POST_URL,
        file_scan=file_scan,
        content=content,
        share_output=share_output,
        uploaded_file_name="uploaded_file",
    )

    if response.result is None:
        print("Failed to get response")
        sys.exit(1)

    print("Sanitize request success")
    print(f"\tFile share id: {response.result.dest_share_id}")
    print(f"\tRedact data: {response.result.data.redact}")
    print(f"\tDefang data: {response.result.data.defang}")
    print(f"\tCDR data: {response.result.data.cdr}")

    if response.result.data.malicious_file:
        print("File IS malicious")
    else:
        print("File is NOT malicious")

    except pe.PangeaAPIException as e:
        print(e)

Completed code

The code sample below is a usable, copy & paste resource for this application that will work on its own. For best results, be sure to adjust all necessary placeholder data in the request (e.g. file_path) with your desired values, and experiment with Sanitize in your Pangea User Console.

note

This is part of the Beta release for Sanitize. There are certain limitations including file size of items that can be sanitized at the moment; for more information please visit our Sanitize Overview page.

By default sanitized output is available for one hour - download location is returned by your result.dest_url below.

import os
import sys

import pangea.exceptions as pe
from pangea.config import PangeaConfig
from pangea.response import TransferMethod
from pangea.services import Sanitize
from pangea.services.sanitize import SanitizeContent, SanitizeFile, SanitizeShareOutput

# Set this filepath to your own file
FILEPATH = "./sanitize_examples/ds11.pdf"

def main() -> None:
    token = os.getenv("PANGEA_SANITIZE_TOKEN")
    assert token

    domain = os.getenv("PANGEA_DOMAIN")
    assert domain

    config = PangeaConfig(domain)

    # Create the Sanitize client with its token and config
    client = Sanitize(token, config)
    try:
        # Create Sanitize file information, setting scan and CDR providers
        file_scan = SanitizeFile(scan_provider="crowdstrike", cdr_provider="apryse")

        # Create content sanitization config
        content = SanitizeContent(
            url_intel=True,
            url_intel_provider="crowdstrike",
            domain_intel=True,
            domain_intel_provider="crowdstrike",
            defang=True,
            defang_threshold=20,
            remove_interactive=True,
            remove_attachments=True,
            redact=True,
        )

        # Enable share output and its folder
        share_output = SanitizeShareOutput(enabled=True, output_folder="sdk_examples/sanitize/")

        with open(FILEPATH, "rb") as f:
            # Make the request to sanitize service
            response = client.sanitize(
                file=f,
                # Set transfer method to post-url
                transfer_method=TransferMethod.POST_URL,
                file_scan=file_scan,
                content=content,
                share_output=share_output,
                uploaded_file_name="uploaded_file",
            )

            if response.result is None:
                print("Failed to get response")
                sys.exit(1)

            print("Sanitize request success")
            print(f"\tFile share id: {response.result.dest_share_id}")
            print(f"\tRedact data: {response.result.data.redact}")
            print(f"\tDefang data: {response.result.data.defang}")
            print(f"\tCDR data: {response.result.data.cdr}")

            if response.result.data.malicious_file:
                print("File IS malicious")
            else:
                print("File is NOT malicious")

    except pe.PangeaAPIException as e:
        print(e)

if __name__ == "__main__":
    main()

Improving your app

The purpose of this guide is to provide the basic steps required to start coding with our Sanitize service. There are additional features that can be added to this process, such as enabling other content types to be sanitized in your Pangea User Console or integrating with other services to add security and storage for your sanitized results. Read more about the capabilities on our Sanitize Overview page.

Pangea has based Sanitize on years of experience building compliant enterprise applications. This service helps to ensure that builders have the necessary tools to meet the security needs of their application’s users.

Next steps

  • Check out our Admin Guide if you have a specific task you would like to complete
  • If you are feeling confident, you can browse our APIs or explore our Github repo, which has libraries for supported languages, SDKs, sample apps, etc.
  • For any questions, you can connect with our Pangea Slack for Builders or continue exploring our Sanitize documentation

Was this article helpful?

Contact us