Search the Audit Log

Review the steps to search the Audit Log

An important aspect of having a secure log of all events is to be able to search through those logs. However, the search should also be fast and easy to complete. This page contains our suggestions for best practices when searching the audit log, how to use the search functionality, and the various ways to perform searches: via SDKs, APIs, cURL requests, and on the console.

Search syntax

The Search capability provides a simple search grammar to use when searching the logs for specific events.

note

The search queries are case sensitive.

Simple search

The search query can be provided as either key-value pairs to specify searching for the query text in a specific field or just the query text to search for the text across all audit fields. For example, a query of deactivated will search for the term deactivated across all fields.

Single field search

A simple search string should be provided as <field_name>:<value>. The field name should be an exact match of the field name to be searched. The /search API will perform a partial match of data in the specified field matching the search term.

For example: actor:"Dennis" will initiate the search for any audit event where the actor field contains the word Dennis.

You can exclude specific values and return everything that does not match the search term by including a minus (-) prefix. For example, -actor:"Dennis" returns all results where the actor field does not include the word Dennis.

tip

Certain field types, including integers, booleans, and datetimes (in the case of Custom schemas), offer the following functionalities:

Comparison operators: For fields that support comparison (like datetime and integer), you can use > or < instead of : to compare values that are greater than or less than the second part, respectively. For example, if value is a field in your schema, both value>10 and value<10 are valid expressions.
Negative values: For integer fields, you can add minus (-) prefix to the number. For example, value<-11 would search for everything with a value less than negative 11.
Boolean field filtering: You can filter boolean fields using fieldname:true to find values that are true or fieldname:false to locate values that are false.

Using multiple search terms

Multiple search terms can be joined using the AND and OR operators. For example, to search for events where the actor field contains Dennis and the target contains Security, the following search string would be used:

actor:"Dennis" AND target:"Security"

Grouping search terms

Search terms can be logically grouped using parentheses. As an example, to search for events where the actor is "Dennis" or "Grant" and the target is "Security" the following would be used:

(actor:"Dennis" OR target:"Grant") AND target:"Security"

Escaping characters

Some fields might have values that make searches behave in unexpected ways. Two such characters are the backslash \ and the quotation mark ". When this occurs, you can escape the character to make the search return results as expected.

A backslash at the end of the value can cause search to escape the quotation mark at the end of the field value, which has undesirable effects. Instead, if there is a backslash at the end, you can escape the backslash. So if you have a search for website:"www.example.com\", you can add a backslash like in the example below.

website:"www.example.com\\"

Since quotations are designed in search to dictate an exact match, having a quotation mark in the middle can also create unexpected results. You can eliminate unexpected results in your queries caused by such quotations by escaping them. If you have a search for article="Things I like about the "Benji" movie", you can escape the internal quotations like in the following example.

article="Things I like about the \"Benji\" movie"

Best practices for audit log searches

The audit query language makes it very easy to run a simple search without needing to think about it; you type in a term and you get results.

However, these queries can be expensive and take a long time when the log data is very large. Users with significant amounts of data will need to write queries in a way that is performant.

Limiting date range

The start and end parameters of the search query indicate the time window in which to find audit records. Audit data is partitioned by default along this value so searches can ignore data for dates that do not need to be searched.

When using the Pangea User Console, you can set the date and time ranges using the quick selection options in the drop-down menu beside the filter.

You can also use the time field to search between two dates or two times in the search field.

received_at>2024-08-15 AND received_at<2024-8-20

The APIs can perform searches similarly using the start and end parameters.

"start":"2024-12-29T01:02:03Z","end":"2025-01-02T01:02:03Z"

You can also use the datatime_field with the timestamp in the YYYY-MM-DD format to search a specific date. Use : to search a specific day, or < or > to search before or after a specific date.

In an API search:

POST /v1/search
curl -sSLX POST 'https://audit.aws.us.pangea.cloud/v1/search' \
-H 'Authorization: Bearer pts_g6casl...43mds7' \
-H 'Content-Type: application/json' \
-d '{"config_id":"pci_evw...uhl","query":"datetime_field:2024-11-28"}'

Or in Secure Audit Log on the Pangea User Console:

datetime_field:2024-11-28
datetime_field>2023-06-13
datetime_field<2024-04-13

Using prefixed queries for fuzzy matches

Prefixed queries are queries where the search term you are using is prefixed by the field of data you want to search. The example used below is haystack:needle. The search term is needle and the field that is being searched is haystack.

When running bare queries using terms that have no prefix, the search is expanded to a very expensive substring match against all fields. An example of searching for needle becomes:

WHERE ( audit.a LIKE '%needle%' OR audit.b LIKE '%needle' OR audit.c LIKE '%needle%' ... )

This will scan the full contents of every field of your audit record for “needle”.

However, you probably don't need to search for needle in all fields. Maybe you are only looking for a "haystack" field with a value of needle. Use prefixed terms, instead, to only look in the appropriate fields. The example is then haystack:needle which becomes:

WHERE audit.haystack LIKE '%needle%';

Using equality checks for exact matches

You can search using prefixed queries with = instead of : to perform equality matching. This utilizes indexed lookups instead of doing a scan of the entire field to check for the search term. This is the quickest way to match a field/value (instead of fuzzy/regexp-based matching.) This method should always be used, when applicable, instead of colon-prefixed queries.

So, when searching for needle inside the haystack field, a search like the following:

WHERE audit.haystack LIKE '%needle%';

will complete a full scan of every field inside of the haystack field for a substring similar to needle.

Instead, when looking for an exact match inside the field, you can use equalities (=) to complete the same search faster and with lower costs. The search then becomes:

haystack="needle"

This will search for only needle values, and not return similar terms, such as a needle or needles.

When using time field matching, using an equality with a date will match all results for that date, which is the same as using a colon (:). For instance, when using the default received_at field, you could search for all events that occurred on a specific date, using the following search.

received_at=2024-02-29

note

Equalities (=) and colons (:) display the same output for searches in the following field types: time, booleans, and integers.

Adding a time to the end of it will match anything that occurred at that exact time, down to the ten-thousandths of a second, i.e to four decimal places (.0000). However, the time stamp is set to UTC, not to local time, even though the timestamp field displays the time and date in local time. Make sure you first convert the time to UTC prior to attempting the search.

received_at=2024-12-29T01:02:03.0004Z

When you need to search logs before or after a specific time stamp or date, you can use the less than (<) for occuring before the date/time, or the greater than (>) for occuring after the date/time.

received_at<2024-02-29

You can also restrict search results with search_restrictions which will perform exact matching against fields, such as actor, status, tenant_id, etc.

Using a schema with simple columns

You should avoid storing multiple values in a single field (like JSON) and querying it with substring matches. Instead, extract the values and store them in their own field. It is better to query fields with less data than fields with more data.

You should, however, use JSON to store multiple values of non-indexed fields as described in the next section.

Avoid indexing unnecessary fields

All fields of a schema are indexed by default. However, by clicking the pencil Edit icon beside a field when creating a schema, you can modify the Type field to be Non-Indexed String. Non-indexed string fields will not be visible to the search, making it easier and faster to perform searches, however, they remain viewable in the logs. Non-prefixed searches will also be more performant, because there will be fewer fields to search for the requested term.

As a best practice, do not index any unnecessary fields when you are creating your schema. Instead when creating a schema, you should create a single field of the non-indexed data type, and then add all non-indexed items in JSON format inside that field.

Restrict search results

In some cases, it may be desirable to partition search results. The search_restriction provided as restriction to the API can facilitate this need. A search_restriction can limit queries to the data described by the search restriction.

As an example, consider the following restriction:

{
    "actor":"Dennis Nedry"
}

In this case, no matter the results included in the query, only results containing "Dennis Nedry" in the actor field will be returned.

This could be useful in an app that exposes the search interface to its users, providing the users with a way to search for auditable actions performed by themselves. A search restriction could restrict them to such actions without allowing them to see activities performed by other users. See the API Reference for more information on API queries.

SDK example

Perform a search

Each SDK provides a search method that can be used to search the audit log.

The following shows examples of searching the audit logs.

POST/v1/search

import os

import pangea.exceptions as pe
from pangea.config import PangeaConfig
from pangea.response import PangeaResponse
from pangea.services import Audit
from pangea.services.audit.audit import SearchOutput, SearchResultOutput
from pangea.tools import logger_set_pangea_config

# This example shows how to perform an audit log, and then search for thats results

token = os.getenv("PANGEA_AUDIT_TOKEN")
domain = os.getenv("PANGEA_DOMAIN")
config = PangeaConfig(domain=domain)
audit = Audit(token, config=config, private_key_file="./key/privkey", logger_name="audit")
logger_set_pangea_config(logger_name=audit.logger.name)


def main():
    print("Log Data...")
    msg = "python-sdk-standard-schema-example"

    try:
        log_response = audit.log(
            message=msg,
            actor="Someone",
            action="Testing",
            source="monitor",
            status="Good",
            target="Another spot",
            new="New updated message",
            old="Old message that it's been updated",
            verify=True,
            verbose=False,
            sign_local=True,
        )
        print(f"Log Request ID: {log_response.request_id}, Status: {log_response.status}")
    except pe.PangeaAPIException as e:
        print(f"Request Error: {e.response.summary}")
        for err in e.errors:
            print(f"\t{err.detail} \n")
        exit()

    print("Search Data...")

    page_size = 10
    query = "message:" + msg

    try:
        search_res: PangeaResponse[SearchOutput] = audit.search(
            query=query, limit=page_size, verify_consistency=True, verify_events=True
        )

        result_id = search_res.result.id
        count = search_res.result.count
        print(f"Search Request ID: {search_res.request_id}, Success: {search_res.status}, Results: {count}")
        offset = 0

        print_header_results()
        while offset < count:
            print_page_results(search_res, offset, count)
            offset += page_size

            if offset < count:
                search_res = audit.results(
                    id=result_id, limit=page_size, offset=offset, verify_consistency=True, verify_events=True
                )

    except pe.PangeaAPIException as e:
        print("Search Failed:", e.response.summary)
        for err in e.errors:
            print(f"\t{err.detail} \n")


def print_header_results():
    print(f"\n\nreceived_at\t\t\t\tMessage \tSource " f"\t\tActor \t\tMembership \tConsistency \tSignature\t")


def print_page_results(search_res: PangeaResponse[SearchResultOutput], offset, count):
    print("\n--------------------------------------------------------------------\n")
    for row in search_res.result.events:
        print(
            f"{row.envelope.received_at}\t{row.envelope.event['message']}\t{row.envelope.event['source']}\t\t"
            f"{row.envelope.event['actor']}\t\t{row.membership_verification}\t\t {row.consistency_verification}\t\t {row.signature_verification}\t\t"
        )
    print(
        f"\nResults: {offset+1}-{offset+len(search_res.result.events)} of {count}",
    )


if __name__ == "__main__":
    main()

note

Setting the optional parameter verify to true will automatically verify the membership and consistency proofs of each returned result.

Paginate search results

Audit results can be paginated using a combination of offset, count, and results_id. The results_id is returned by the search method and is a unique id corresponding to the search results. Search results don't live indefinitely; they have a defined expiration date that's also returned with the search results. Offset is used to determine at which record number results should be returned, and Count indicates how many records in total have been returned by the search. Continuing the previous example, paging through the results in Python would look like this:

if search_res.success:
        result_id = search_res.result.id
        count = search_res.result.count
        offset = 0

        while offset < count and search_res.success:
            for row in search_res.result.events:
                print(f"{row.event.received_at}\t{row.event.message}\t{row.event.source}")
            offset += page_size

            search_res = audit.results(result_id, limit=page_size, offset=offset)

Was this article helpful?

Review the steps to search the Audit Log

Search syntax​

Simple search​

Single field search​

Using multiple search terms​

Grouping search terms​

Escaping characters​

Best practices for audit log searches​

Limiting date range​

Using prefixed queries for fuzzy matches​

Using equality checks for exact matches​

Using a schema with simple columns​

Avoid indexing unnecessary fields​

Restrict search results​

SDK example​

Perform a search​

Paginate search results​

Search syntax

Simple search

Single field search

Using multiple search terms

Grouping search terms

Escaping characters

Best practices for audit log searches

Limiting date range

Using prefixed queries for fuzzy matches

Using equality checks for exact matches

Using a schema with simple columns

Avoid indexing unnecessary fields

Restrict search results

SDK example

Perform a search

Paginate search results