Sanitize API
The Sanitize API tightly integrates with other Pangea services to give the platform additional unique capabilities which can also be seamlessly used in your app, such as:
-
Sanitize uses File Scan to scan the file both before and after the sanitization process, except when the source or destination is Secure Share. This is because Secure Share scans the file when it receives the file. This prevents a file from being scanned twice.
-
Sanitize can be used to remove possibly malicious embedded content from the file.
-
Sanitize can scan URLs and domains for malicious links.
-
Sanitize can be used with Redact to remove sensitive information in the files.
-
Sanitize can use multiple types of transfer methods, including Secure Share. This allows you to tailor Sanitize to meet the needs of your application.
Sanitize API requests
The Sanitize Content and File Operations configured in the Pangea User Console Sanitize Settings will be used when Sanitize is called unless they are overridden at runtime using optional Sanitize API file
or content
parameters.
The Sanitize file
parameter options allow you to override the following:
- Configured File Scan provider
The Sanitize content
parameter options allow you to override the following:
- Configured URL Intel provider
- Configured Domain Intel provider
- Defang threshold
- Removal of attachments
- Removal of interactive content
- Redact enablement
Sanitize API requests use additional configuration and/or input parameters for directing the output to either a presigned URL or to Secure Share. This feature is useful in automating processes for sharing and storing files, especially when combined with Secure Share, reducing the number of steps and interactions required.
For a Sanitize API service call, an input file can be provided using one of the following transfer_method
options discussed in Transfer Methods:
- "source-url"
- "put-url"
- "post-url"
- "share-id"
- "multipart"
All listed examples use "source-url" for the input method.
Setting API destinations
You can choose how the results of a Sanitize API call are delivered by specifying an additional optional parameter, share_output
.
There are two options for receiving the results of a Sanitize API call:
Setting output to a destination URL
-
If you omit the optional
share_output
parameter in your initial request, the successful response from the Sanitize service will contain a presigned GET URL inresult.dest_url
, which you can use to download the sanitized output.For example:
-
Request sanitization of a file.
POSTsanitize/file/at/source-urlcURLcurl --location 'https://sanitize.aws.us.pangea.cloud/v1/sanitize' \ --header 'Content-Type: application/json' \ --header "Authorization: Bearer $PANGEA_SANITIZE_TOKEN" \ --data '{ "transfer_method": "source-url", "source_url": "https://my-sanitize-input.s3.us-west-2.amazonaws.com/samples/my_tiny.pdf?..." }'
A call to the Sanitize service receives an asynchronous response. This response contains a GET URL in
result.location
, which you can use to track the status of your request.response/with/results/locationjson{ "request_id": "prq_64tjspdh4yxxownpm2ap4qb4rbxuedeo", "status": "Accepted", "summary": "Your request is in progress. Use 'result, location' below to poll for results. See https://pangea.cloud/docs/api/async?service=sanitize&request_id=prq_64tjspdh4yxxownpm2ap4qb4rbxuedeo for more information.", "result": { "location": "https://sanitize.aws.us.pangea.cloud/request/prq_64tjspdh4yxxownpm2ap4qb4rbxuedeo", . . . }, . . . }
-
Check the results of the requested sanitization.
GETresults/of/sanitizecURLcurl --location 'https://sanitize.aws.us.pangea.cloud/request/prq_64tjspdh4yxxownpm2ap4qb4rbxuedeo' \ --header "Authorization: Bearer $PANGEA_SANITIZE_TOKEN"
Use the presigned GET URL in
result.dest_url
to download the sanitized output.results/of/sanitizejson{ "request_id": "prq_64tjspdh4yxxownpm2ap4qb4rbxuedeo", "result": { "dest_url": "https://pangea-sanitize-input.s3.us-west-2.amazonaws.com/2024030423/prq_64tjspdh4yxxownpm2ap4qb4rbxuedeo/sanitized.my_tiny.pdf?...", . . . }, "status": "Success", "summary": "Successfully completed the request. The file download link is valid for 24h0m0s." }
-
Setting output to Secure Share
Enabling share_output
in your initial request saves the sanitized output in Secure Share.
For example:
-
Request sanitization of a file.
POSTsanitize/file/at/source-urlcURLcurl --location 'https://sanitize.aws.us.pangea.cloud/v1/sanitize' \ --header 'Content-Type: application/json' \ --header "Authorization: Bearer $PANGEA_SANITIZE_TOKEN" \ --data '{ "transfer_method": "source-url", "source_url": "https://pangea-sanitize-input.s3.us-west-2.amazonaws.com/samples/redact_tiny.pdf?...", "share_output": { "enabled": true, "output_folder": "/" } }'
If you specify a non-existent "output_folder" location, Secure Share will automatically create it for you.
The response contains a GET URL in
result.location
. You can use this URL to check the status of the call and get the eventual results.response/with/results/locationjson{ "request_id": "prq_zrdj2aggcspg6nslzlk7im63s577o34z", "result": { "location": "https://sanitize.aws.us.pangea.cloud/request/prq_zrdj2aggcspg6nslzlk7im63s577o34z", . . . }, "status": "Accepted", "summary": "Your request is in progress. Use 'result, location' below to poll for results. See https://pangea.cloud/docs/api/async?service=sanitize&request_id=prq_zrdj2aggcspg6nslzlk7im63s577o34z for more information.", . . . }
-
Check the results of the sanitization request.
GETresults/of/sanitizecURLcurl --location 'https://sanitize.aws.us.pangea.cloud/request/prq_zrdj2aggcspg6nslzlk7im63s577o34z' \ --header "Authorization: Bearer $PANGEA_SANITIZE_TOKEN"
If the call is successful,
result.dest_share_id
will contain the ID of the file saved in Secure Share.results/of/sanitizejson{ "request_id": "prq_zrdj2aggcspg6nslzlk7im63s577o34z", "status": "Success", "summary": "Successfully completed the request. The Sanitized file sanitized.Asynchronous API Responses Pangea.pdf can be found in the Secure Share under folder: /.", "result": { "dest_share_id": "pos_pp2l24fj7kcdafmyqtztd6oeoofpmeid", . . . }, . . . }
Sanitize output data fields
This list is all the data fields in the details of a Sanitize output and their types.
Expand for details
{
"request_id": "prq_zhe46bpihtqqm4wussuaa3rwmgoouc3w",
"request_time": "2024-03-19T23:08:08.699280Z",
"response_time": "2024-03-19T23:08:21.757699Z",
"status": "Success",
"summary": "Successfully completed the request. The file download link is valid for 24h0m0s.",
"result": {
"dest_url": "https://pangea-sanitize-input-dev.s3.us-west-2.amazonaws.com/2024031923/prq_zhe46bpihtqqm4wussuaa3rwmgoouc3w/sanitized.Pangea.pdf",
"data": {
"redact": {
"redaction_count": 13,
"summary_counts": {
"PERSON": 9
}
},
"defang": {
"external_urls_count": 48,
"external_domains_count": 6,
"defanged_count": 0,
"url_intel_summary": "Processed 31 URLs: 0 are malicious, 0 are suspicious, 31 are unknown.",
"domain_intel_summary": "Processed 6 Domains: 0 are malicious, 0 are suspicious, 6 are unknown."
},
"cdr": {
"file_attachments_removed": 0,
"interactive_contents_removed": 0
},
"malicious_file": false
},
"parameters": {
"transfer_method": "multipart",
"source_url": "",
"share_id": "",
"config_id": null,
"file": {
"cdr_provider": "apryse"
},
"content": {
"defang_threshold": null
},
"share_output": null
}
}
}
The external_urls_count
and url_intel_summary
in the defang summary may not be the same. This is because external_urls_count
is the total number of URLs and url_intel_summary
is the number of unique URLs. These numbers being different generally means that there were duplicate URLs in the original document. The duplicates are removed prior to sending them to URL Intel for lookup.
Was this article helpful?