Skip to main content

Redacting Data

Learn how to redact data

Redacting Text

Each SDK provides a redact method that can be used to Redact text. Here's an example of redacting a phone number with the Redact service.

POST/v1/redact
import os

import pangea.exceptions as pe
from pangea.config import PangeaConfig
from pangea.services import Redact

token = os.getenv("PANGEA_REDACT_TOKEN")
domain = os.getenv("PANGEA_DOMAIN")
config = PangeaConfig(domain=domain)
redact = Redact(token, config=config)


def main():
    text = "Hello, my phone number is 123-456-7890"
    print(f"Redacting PII from: {text}")

    try:
        redact_response = redact.redact(text=text)
        print(f"Redacted text: {redact_response.result.redacted_text}")
    except pe.PangeaAPIException as e:
        print(f"Embargo Request Error: {e.response.summary}")
        for err in e.errors:
            print(f"\t{err.detail} \n")


if __name__ == "__main__":
    main()

The debug option will provide a detailed list of the redactions that occurred for the provided text. This can be useful in testing or in cases where a report of what was redacted from the provided text is required.

For complete details on the redact method see the API documentation or for information on other language SDKs, see the SDK documentation.

Redacting Structured Data

In some cases, structured JSON data may require redaction. By default, the Redact service will iterate and apply redaction rules to all values in the supplied JSON. For a more targeted approach, JSONPaths can be provided to identify specific fields to be redacted.

Here's an example of redacting an email address and Driver's License from JSON data using the Python SDK:

from pangea.services import Redact

# include your API token here
redact = Redact(token="API_Token")

structured_data = {
"First_Name": "Dennis",
"Last_Name": "Nedry",
"email": "dennis.nedry@ingen.com",
"DL": "Y2500760",
}

check_res = redact.redact_structured(
structured_data,
jsonp=["$.email", "$.DL"]
)

In the above example, the jsonp keyword argument is supplied to the redact_structured method. It is supplied as a list of JSONPaths targeting the email and DL fields.

Using the jsonp keyword argument can reduce the time it takes to perform a redaction operation while also reducing the occurrences of accidental redaction occurring.

For complete details on the redact_structured method see the API documentation or for information on other language SDKs, see the SDK Documentation.

About JSONPath

JSONPath was born out of the need to easily extract data from JSON documents in much the same way that XMLPath does for XML Documents.

Consider the following JSON:

{
"First_Name": "Dennis",
"Last_Name": "Nedry",
"SSN": "078-05-1120",
"DL": "Y2500760",
"Phone_Numbers": [
{
"type": "mobile",
"number": "111-111-1111"
},
{
"type": "home",
"number": "222-222-2222"
}
]
}

In this case, as in the above example, the SSN can be extracted by using the following JSONPath: $.SSN. The $ represents the root of the document, and the .SSN indicates a child with a key of SSN.

If the first phone number was needed, a JSONPath of $.Phone_Numbers[0].number could be provided. In this case:

  • The $ represents the root of the document
  • The .Phone_Numbers represents the child with a key of Phone_Numbers
  • The [0] indicates the first phone_number in the array.
  • The .number indicates the child with a key of number

Finally, if specified, the mobile number was needed the following JSONPath could be provided $.Phone_Numbers[?(@.type=="mobile")].number. In this case:

  • The $ represents the root of the document
  • The .Phone_Numbers represents the child with a key of Phone_Numbers
  • The [...] is used to iterate through the Phone_Numbers
  • The ?() indicates that a script should be applied, in this case, a comparison
  • The @.type indicates the current record's type key
  • The @.type=="mobile" indicates where the current records type key is equal to "mobile"
  • The .number indicates the number key of the matching record(s)

JSONPath is an extremely powerful tool. To learn more read about the JSONPath specification and test out some JSONPath's with this interactive tool.

Redacting with Format Preserving Encryption (FPE)

When you enable FPE Redaction, you can keep the format of the redacted entries and, if needed, recover the original values.

Redact

You can use the /v1/redact endpoint for FPE redaction.

Copy the Default Token, Config ID, and Domain values from the Redact Overview page in the Pangea User Console and use them in your call to the /v1/redact endpoint.

export PANGEA_DOMAIN="aws.us.pangea.cloud"
export PANGEA_REDACT_TOKEN="pts_s2ngg2...hzwafm"
export PANGEA_REDACT_CONGIG_ID="pci_4ku3oviu6bpjsghhpch5hw2l564myecx"

Provide the following parameters:

  • text - The text that may contain data to be redacted.

Optionally, you can specify:

  • config_id - The Redact service configuration ID.

    The Redact service can have multiple configurations.

    If you use the same service token for multiple configurations, you must specify which configuration to use when calling the service APIs. Otherwise, you will receive an error:

    error
    {
      "status": "AmbiguousConfigID",
      "summary": "Token has multiple associated configs, config_id field is required.",
      "result": null,
      ...
    }
    
  • redaction_method_overrides - You can use this parameter to override the default redaction method for any rule.

    By default, the redaction method specified for a rule is applied when a match is found. You can override this by mapping the rule's name to a different redaction method in the redaction_method_overrides parameter. To force the use of the FPE redaction method, specify the "fpe" redaction type and choose the alphabet to use for FPE:

    "redaction_method_overrides": {
    "<rule-name>": {
    "redaction_type": "fpe",
    "fpe_alphabet": "<alphabet>"
    },
    ...
    },

    Characters not included in the selected alphabet will remain unchanged in the encrypted output. The available choices are:

    • numeric - Numeric (0-9)
    • alphalower - Lowercase alphabet (a-z)
    • alphaupper - Uppercase alphabet (A-Z)
    • alpha - Lowercase and uppercase alphabet (a-z, A-Z)
    • alphanumericlower - Lowercase alphabet with numbers (a-z, 0-9)
    • alphanumericupper - Uppercase alphabet with numbers (A-Z, 0-9)
    • alphanumeric - Alphanumeric (a-z, A-Z, 0-9)

    For example, a driver's license with the original value "A1234567" could become "A4313639" if encrypted using the numeric alphabet. Note that the "A" is preserved as part of the format, but the digits are changed."

    The rule name is displayed next to the rule label and can be copied from Redact Rulesets in the Pangea User Console:

    Select the FPE redaction method for a rule on the Redact Rulesets page in the Pangea User Console

Authorize your request to the Redact API with a Redact service token.

Redact using the FPE redaction method

Select the FPE Redaction Method in Redact Rulesets for a Redact rule, and it will encrypt the matching data by default.

Select the FPE redaction method for a rule on the Redact Rulesets page in the Pangea User Console
Select the FPE redaction method

Make a request authorized with the Redact service token to the /v1/redact endpoint. In the parameters, provide the Redact configuration ID and the text to be redacted.

Example
POST/v1/redact
curl --location "https://redact.$PANGEA_DOMAIN/v1/redact" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $PANGEA_REDACT_TOKEN" \
--data-raw '{
  "config_id": "'"$PANGEA_REDACT_CONFIG_ID"'",
  "text": "My name is Bond. James Bond. Call me at +44 20 0700 7007 or pop by 30 Wellington Square, Chelsea, London."
}'

The response from the /v1/redact endpoint will include the redacted text, the count of redacted entries, and the context in which the FPE was applied:

/v1/redact response
{
  "status": "Success",
  "summary": "Success. Redacted 4 item(s) from text",
  "result": {
    "count": 4,
    "redacted_text": "My name is Bond. <PERSON>. Call me at +86 39 5564 4697 or pop by 30 Wellington Square, <LOCATION>, <LOCATION>.",
    "fpe_context": "eyJhIjogIkFFUy1GRjMtMS0yNTYtQkVUQSIsICJ0IjogImhDemp1RTEiLCAibSI6IFt7ImEiOiAxLCAicyI6IDM4LCAiZSI6IDU0fV0sICJrIjogInB2aV9tZDdrdHFvanBxaXE1YmhyeHRjeWljNmxjYTVyY2xmdyIsICJ2IjogMSwgImMiOiAicGNpX2lyczNib2tqdXRqcmFzdGl3NGszazd0anpmNnQ2MnRxIn0="
  },
  ...
}

The text with redacted data is returned under result.redacted_text in the response. Data matching the enabled Redact rules is redacted according to the redaction method selected for each rule. In the example above, the phone number found in the text has been encrypted because the PHONE_NUMBER rule uses the FPE redaction method. Values matched by the PERSON and LOCATION rules have been replaced.

Redact by overriding the default redaction method with the FPE redaction type

Make a request authorized with the Redact service token to the /v1/redact endpoint. In the parameters, provide the Redact configuration ID and the text to be redacted. Include the redaction_method_overrides parameter populated with the FPE-specific values to enforce FPE redaction for specific rules.

Example
POST/v1/redact
curl --location "https://redact.$PANGEA_DOMAIN/v1/redact" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $PANGEA_REDACT_TOKEN" \
--data '{
  "config_id": "'"$PANGEA_REDACT_CONFIG_ID"'",
  "redaction_method_overrides": {
    "PERSON": {
      "redaction_type": "fpe",
      "fpe_alphabet": "alphanumeric"
    },
    "LOCATION": {
      "redaction_type": "fpe",
      "fpe_alphabet": "alpha"
    }
  },
  "text": "My name is Bond. James Bond. Call me at +44 20 0700 7007 or pop by 30 Wellington Square, Chelsea, London."
}'

The response from the /v1/redact endpoint will include the redacted text, the count of redacted entries, and the context in which the FPE was applied:

/v1/redact response
{
  "status": "Success",
  "summary": "Success. Redacted 4 item(s) from text",
  "result": {
    "count": 4,
    "redacted_text": "My name is Bond. 0lqHT qWdK. Call me at +51 82 4887 4489 or pop by 30 Wellington Square, PrjbOni, RBtsKc.",
    "fpe_context": "eyJhIjogIkFFUy1GRjMtMS0yNTYtQkVUQSIsICJ0IjogInRKVXZmWkoiLCAibSI6IFt7ImEiOiAzLCAicyI6IDE3LCAiZSI6IDI3fSwgeyJhIjogMSwgInMiOiA0MCwgImUiOiA1Nn0sIHsiYSI6IDUsICJzIjogODksICJlIjogOTZ9LCB7ImEiOiA1LCAicyI6IDk4LCAiZSI6IDEwNH1dLCAiayI6ICJwdmlfbWQ3a3Rxb2pwcWlxNWJocnh0Y3lpYzZsY2E1cmNsZnciLCAidiI6IDEsICJjIjogInBjaV9pcnMzYm9ranV0anJhc3RpdzRrM2s3dGp6ZjZ0NjJ0cSJ9"
  },
  ...
}

The text with redacted data is returned under result.redacted_text in the response. The phone number found in the text has been encrypted because the PHONE_NUMBER rule uses the FPE redaction method. Additionally, the PERSON and LOCATION values have been encrypted because their redaction method was overridden with FPE, specified in the redaction_method_overrides parameter.

Unredact

You can use the Redact /v1/unredact endpoint to decrypt data redacted with FPE.

Use the same Redact configuration as that was applied to redact the data. Copy the Default Token, Config ID, and Domain values from the Redact Overview page in the Pangea User Console, and use them in your call to the /v1/unredact endpoint.

export PANGEA_DOMAIN="aws.us.pangea.cloud"
export PANGEA_REDACT_TOKEN="pts_s2ngg2...hzwafm"
export PANGEA_REDACT_CONGIG_ID="pci_4ku3oviu6bpjsghhpch5hw2l564myecx"

Make a request to the /v1/unredact endpoint in your Pangea project domain. Use the Redact service token to authorize the request.

Provide the following parameters:

  • redacted_data - The redacted text with encrypted values.

    The redacted text is returned under result.redacted_text in the response from the /v1/redact endpoint when you use the Redact APIs directly.

  • fpe_context - The value needed to decrypt data redacted with FPE.

    To decrypt the original values redacted with FPE, you need the context in which the original text was redacted.

    When you use the Redact /v1/redact endpoint APIs directly, the entire context is returned as an opaque value under result.fpe_context in the response.

  • config_id - The Redact service configuration ID.

    The Redact service can have multiple configurations.

    If you use the same service token for multiple configurations, you must specify which configuration to use when calling the service APIs. Otherwise, you will receive an error:

    error
    {
      "status": "AmbiguousConfigID",
      "summary": "Token has multiple associated configs, config_id field is required.",
      "result": null,
      ...
    }
    

Example

POST/v1/unredact
curl --location "https://redact.$PANGEA_DOMAIN/v1/unredact" \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $PANGEA_REDACT_TOKEN" \
--data '{
  "config_id": "'"$PANGEA_REDACT_CONFIG_ID"'",
  "redacted_data": "My name is Bond. 0lqHT qWdK. Call me at +51 82 4887 4489 or pop by 30 Wellington Square, PrjbOni, RBtsKc.",
  "fpe_context": "eyJhIjogIkFFUy1GRjMtMS0yNTYtQkVUQSIsICJ0IjogInRKVXZmWkoiLCAibSI6IFt7ImEiOiAzLCAicyI6IDE3LCAiZSI6IDI3fSwgeyJhIjogMSwgInMiOiA0MCwgImUiOiA1Nn0sIHsiYSI6IDUsICJzIjogODksICJlIjogOTZ9LCB7ImEiOiA1LCAicyI6IDk4LCAiZSI6IDEwNH1dLCAiayI6ICJwdmlfbWQ3a3Rxb2pwcWlxNWJocnh0Y3lpYzZsY2E1cmNsZnciLCAidiI6IDEsICJjIjogInBjaV9pcnMzYm9ranV0anJhc3RpdzRrM2s3dGp6ZjZ0NjJ0cSJ9"
}'

The response from the /v1/unredact endpoint will include the count of unredacted entries and the original data with decrypted values where the FPE redaction was applied, found under the result.data key:

/v1/unredact response
{
  "status": "Success",
  "summary": "Success. Unredacted 4 item(s) from items",
  "result": {
    "data": "My name is Bond. James Bond. Call me at +44 20 0700 7007 or pop by 30 Wellington Square, Chelsea, London."
  },
  ...
}

Rules & Ruleset Parameters

Sometimes for specific calls, you need extra rules on top of what is configured in your configuration's enabled rules. For those cases we provide two parameters that allow you to add rules to your base set of rules: rules and rulesets. The rules parameter allows you to specify rules using their short names to provide additional redaction options to your current selection. Likewise the rulesets parameter allows you to provide an entire set of one or more rulesets (also referenced by their respective short names) to apply to your text.

from pangea.services import Redact

# include your API token here
redact = Redact(token="API_Token")

text = "Dennis Nedry who's email is dennis.nedry@ingen.com Y2500760 408-444-4444",

check_res = redact.redact(
text,
rules=["PHONE_NUMBER"],
)

Not only will you see the first few sets of fields redacted if you have those respective rules enabled, you'll see the phone number is also redacted.

Overlapping Rules

When multiple rules enabled within your configuration overlap or match the same text, Redact applies the rule with the longest match and broader context. For example, with the email dennis.nedry@gmail.com, redacting just the gmail.com domain misses the more important context — the full email address that may contain the person's first and last name. If two rules have matches of the same length, Redact applies the one with the higher confidence score. If both length and confidence are identical, the choice is arbitrary.

When rules and rulesets parameters are used and the rules specified by these parameters overlap with rules enabled in the standard Redact configuration, Redact consistently applies the rules in the standard configuration. This ensures consistent redaction across all use cases. In a company with numerous microservices, each service might require different redaction strategies. However, the internal security team always seeks to enforce specific PII redaction rules uniformly across the entire company.

Was this article helpful?

Contact us