Skip to main content

Redaction Rules

Understand the Redaction rules

About Ruleset

A ruleset is a set of redaction rules organized by category.You can create new rules and organize them into custom rulesets as required using the + Ruleset button on Redact Rulesets page in the Pangea User Console.

Redaction Rules

Redaction rules define how data will be matched and subsequently replaced by the Redact service. There are 24 out-of-the-box redaction rules comprised of two different types.

NLP-based rules

NLP-based rules use natural language processing and trained models to identify matching fields from the provided data. The details of these rules cannot be viewed when clicking view rule.

Regex-based rules

Regex-based rules use regular expressions to identify matching fields from the provided data. Clicking view rule will expose the specific regex or set of regexes that are used to identify matching data.

Rule Details

In the case of regex-based rules, the details of the rules can be viewed and inspected. The rule is broken down into several key areas.

Name

This is the name of the rule as it appears in the ruleset list.

Description

This is a description of the rule and its intended purpose.

Matches

This is where matching regex(es) are defined describing how the rule identifies matches.

Context values

In some cases, rule match strength may be bolstered by language appearing near the matched data. An example may be matching a 9-digit number with the words "SSN" near it - this would more strongly indicate that an SSN had been correctly identified.

Default Redaction Threshold

When matches are identified, they are provided a match score. Match scores are calculated by the regex that was used to identify the data, as well as the context values that occurred near the matching data. This match score is on a 0-1 scale. The Redaction Threshold defines at which score redaction occurs.

For instance, if a Redaction Threshold is set to .6, and a rule matches with a score of .5, the matching data will not be redacted.

Replacement Value

If a Redaction Method of Replacement is used, the text in the replacement value is the text that will replace the matched data.

Redaction Method

The Redaction Method defines how data will be replaced as matches are identified. These are the available redaction methods:

Replacement

This redaction method will replace the matching data with the "replacement value" defined in the rule.

Mask

This redaction method will replace the matching data with asterisks or a custom character.

Partial Masking

This redaction method will partially replace your text with asterisks or a custom character. Through masking options, you can control the following:

  1. Specify the number of characters from the left or right that can be unmasked from the text.
  2. What is left unmasked by ignoring certain characters

The following fields are presented when you choose the Partial Masking redaction method:

  • Masking Character: Enter a character to mask the text. For example, enter #.

  • Masking Options:

    • Unmasked from left: Enter the number (or click the increase/decrease UI button to input a number) of starting characters to leave unmasked on matched values.
    • Unmasked from right: Enter the number (or click the increase/decrease UI button to input a number) of ending characters to leave unmasked on matched values.
  • Characters to ignore: Enter the characters to ignore from masking. For example, enter -. Now, click the Add button.

    Click the Save button.

On the right pane, click the Test Rules button. You can see the results displayed in the Redacted Text tab or Details tab.

Detect Only

This redaction method tracks the text as marked, incrementing the redaction count, and updating the redaction report. It does not perform redaction on matching text.

Hash

When you click Configure Redaction Hash, this feature allows you to efficiently manage the redaction process using hashing. It includes the salting of values to enhance security against attacks on stored data. The salt values are securely managed as Vault secrets. To enable the Hash redaction method, click Generate Salt Secret.

important

If the Vault feature is already enabled, you will encounter the Generate Salt Secret button. However, if the Vault feature is not enabled, you will instead see the Enable Vault & Generate Salt button.

When using this method, the matching text will be replaced with hashed values. It utilizes the SHA 256 algorithm for hashing the redacted values. The configuration option is used to generate a secret value that aids in salting the hashes. This process minimizes the occurrence of reverse lookups, ensuring better security.

Format Preserving Encryption (FPE)

You can use the FPE redaction method to preserve the format of redacted data and make the redacted values recoverable.

note

FPE keeps the format of the input data in the encrypted output. Unlike traditional encryption methods that turn data into an unreadable string of a fixed length, FPE retains the original data length, character set per position, and structure, including the positions of delimiters and separators. For example, with FPE, the digits in a phone number can be encrypted, but the parentheses, spaces, and hyphens stay in their original positions.

Because FPE preserves the format of data, it can be used to retrofit security in existing systems, such as legacy databases and applications that expect data in a specific format. It also helps meet compliance requirements that mandate certain data formats and structures.

FPE is commonly used to protect Personally Identifiable Information (PII) such as social security numbers, phone numbers, and credit card numbers. FPE keeps the data compatible with existing systems and maintains its appearance in logs. Additionally, in the event of a data leak, an attacker might not realize the data is encrypted.

Note that while FPE preserves the format based on the selected alphabet, the encrypted value might appear nonsensical. For example, a name could turn into a random mix of uppercase and lowercase letters, or an IP address might become invalid.

To use FPE as the redaction method for a rule, you need to enable it and generate an FPE encryption key. The FPE key will be stored in Vault and managed as a secret.

Enable FPE redaction method for a rule on the Redact Rulesets page in the Pangea User Console
Enable FPE redaction method

When the FPE redaction method is enabled, you can select it for a rule and choose a set of characters to use for encryption in the Encryption Alphabet dropdown:

Select an encryption alphabet for FPE redaction method for a rule on the Redact Rulesets page in the Pangea User Console

Select an encryption alphabet for the FPE redaction method

Characters not included in the selected alphabet will remain unchanged in the encrypted output. The available choices are:

  • Numeric - Numeric (0-9)
  • Alphanumeric - Alphanumeric (a-z, A-Z, 0-9)
  • Alphanumeric Uppercase - Uppercase alphabet with numbers (A-Z, 0-9)
  • Alphanumeric Lowercase - Lowercase alphabet with numbers (a-z, 0-9)
  • Alphabet - Lowercase and uppercase alphabet (a-z, A-Z)
  • Alphabet Uppercase - Uppercase alphabet (A-Z)
  • Alphabet Lowercase - Lowercase alphabet (a-z)

For example, a driver's license with the original value "A1234567" could become "A4313639" if encrypted using the numeric alphabet. Note that the "A" is preserved as part of the format, but the digits are changed."

Redacting with Format Preserving Encryption (FPE) describes how you can use the FPE redaction endpoints.

Choose the redaction method

In some cases, the context for what was redacted may be helpful. In these cases, Replacement is a good choice, as the replacement value will indicate the type of data that was redacted.

However, using replacement, by nature, does leak some information that existed before redaction. If zero-knowledge redaction is required, then Mask should be used, as it will provide no indication as to the data that was redacted.

Select the FPE redaction method if you need to preserve the appearance of the redacted data and be able to recover the original value.

Test the redact rules in Pangea Console

The Pangea User Console supports the testing of any enabled or disabled rules. To test a rule:

  1. In the Pangea User Console , go to the Rulesets page under Redact
  2. Click Test Rules.

Redact Rulesets page

A dialog will appear with various options for testing rules.

Test rules dialog

  1. Select a rule or multiple rules you’d like to test from the dropdown menu.
  2. Add any text you want to test against in the input field below the dropdown. The data should be associated with the rules option you selected in the dropdown. For example, if you chose "Credit Card", you can add any credit card number along with other text to test the rule and ensure it's working as expected. Once you add the data, click Test Rules.
  3. The test yields a response:
    • Redacted Data - The response represents the data you entered in the input field with the sensitive portion (i.e. Credit Card number) redacted. Redacted text
    • Details - The JSON response sent by the redact API. Details

Test the redact rules in API Reference

You can also test out redact rules using the interactive Redact API Reference. You'll need to copy rule name information from the Pangea Console into the input fields in the API Reference. Follow the steps below:

  1. In the Pangea Console , go to the Rulesets page under Redact.

Redact Rulesets page

  1. Click on a set of rules. A dialog with configuration details will appear.
  2. Decide which rule(s) you want to test and use the copy icon to copy the rule name as a string onto your clipboard. You can test more than one rule at a time.

Copy rule short name to clipboard

  1. Go to the Redact API Reference
  2. Enter data into the parameter fields (or choose to Load Sample Data)
  3. In the rules parameter, paste the string you copied onto your clipboard. You can enter an array of strings - and in doing so test various rules at once.

Rules in API Reference

In this example, "email address" was used as the redaction rule. The API request includes a string of text that contains an email address. You can see in the JSON response that "email address" has been redacted.

curl -sSLX POST 'https://redact.'"$PANGEA_DOMAIN"'/v1/redact' \
-H 'Authorization: Bearer '"$PANGEA_REDACT_TOKEN" \
-H 'Content-Type: application/json' \
-d '{"text":"My name is Dennis Nedry and my email is you.didnt.say.the.magic.word@gmail.com","debug":false,"rules":["EMAIL_ADDRESS"]}'
{
"request_id": "prq_63ooica2rmk5nmv3cg6yp3iohb4ukugp",
"request_time": "2023-02-03T18:56:37.236505Z",
"response_time": "2023-02-03T18:56:37.393045Z",
"status": "Success",
"summary": "Success. Redacted 1 item(s) from text",
"result": {
"redacted_text": "My name is Dennis Nedry and my email is <EMAIL_ADDRESS>",
"count": 1
}
}

Was this article helpful?

Contact us