We are happy to announce that today, Pangea is launching the Sanitize API to ensure that PDF documents are cleansed of dangerous and sensitive content such as:
Malware, including file format irregularities that could be exploitable
Links to dangerous domains and URLs
Over 50 types of PII, including Social Security Numbers, Credit Card Numbers, API tokens, etc.
Active content such as Forms, JavaScript, Media, and 3D objects
The Sanitize service gives you one API with all of these capabilities to meet legal and compliance obligations for safe, secure document and data handling.
Composable Security APIs
Sanitize shows off the power and potential of composable security APIs by delivering a powerful, yet straightforward API that orchestrates File Scan, URL Intel, Domain Intel, and Redact to make PDFs safe and secure to handle. Sanitize’s Redact integration can ease your mind about PII disclosure and compliance with regulatory standards such as HIPAA and GDPR.
Sanitize is also integrated with Secure Share to make it super easy to store, organize, and securely send and receive documents within your application. You can specify Secure Share files as input to Sanitize, and Sanitize can output the sanitized version of a document to Secure Share.
Content Disarm and Reconstruction (CDR)
Sanitize always scans input files for malware using File Scan, and it always does Content Disarm and Reconstruction (CDR). Disarm means that possibly dangerous active content like Forms and JavaScript can be removed, and that URL links can be defanged; Defang means that the links are altered so that they can no longer be clicked or followed. Sanitize can optionally “defang” all links, or only those that are above a configured risk threshold according to the URL Intel and/or Domain Intel services. Reconstruction means that PDF output files are rebuilt from scratch to ensure they are free of format irregularities that could cause exploitable bugs when the file is rendered.
Redact Integration
In addition to File Scan, CDR, and link defanging, Sanitize integrates Redact. Redact comes with default rules to match information that you may want to remove such as social security numbers, credit card numbers, locations, profanity, API tokens, secrets. You can also add your own custom rules to match anything else you need.
Here you can see email address, credit card number, and expiration date before and after Sanitize with Redact.
Before:
After:
Here are the enabled Redact rules for this example:
Comply with Regulations and Data Handling Responsibilities
Handling documents can impose several responsibilities onto your app. You are responsible to ensure documents don’t distribute malware, malicious, or even offensive content. Scanning for malware is a good first step, but documents can contain all kinds of sensitive information that make your app subject to laws and regulations requiring safe data handling. Medical, financial, PII, and other sensitive information all impose requirements; the Sanitize API helps you address those requirements.
SDK Support for Easy Integration
The Sanitize API is powerful, but simple: You get malware scanning, file parsing/filtering with CDR, URL defanging, Redact, and Secure Share integration all through a single Sanitize API endpoint! Pangea services like Sanitize support several flexible and powerful transfer methods for getting files to and from these APIs. These include the familiar multi-part, but also presigned URLs, and specifying Secure Share file IDs for input, and Secure Share folders for output. The Pangea SDKs make it easy and convenient to call the Sanitize API using any of the supported transfer methods in your language of choice (C#, Go, Java, JavaScript, or Python).
Here’s a simple Python SDK snippet that sanitizes a file using the default configuration of Sanitize:
And here’s some sample output that summarizes the Redact, URL and Domain Intel, File Scan, and Sanitize CDR results:
Get Started Now
Get started today with your free Pangea account and $5 monthly credits. Create a Sanitize project and browse the Sanitize documentation, SDK, and SDK samples.