Search the Audit Log
Review the steps to search the Audit Log
An important aspect of having a secure log of all events is to be able to search through those logs. However, the search should also be fast and easy to complete. This page contains our suggestions for best practices when searching the audit log, how to use the search functionality, and the various ways to perform searches: via SDKs, APIs, cURL requests, and on the console.
Search syntax
The Search capability provides a simple search grammar to use when searching the logs for specific events.
The search queries are case sensitive.
Simple search
The search query can be provided as either key-value pairs to specify searching for the query text in a specific field or just the query text to search for the text across all audit fields. For example, a query of deactivated
will search for the term deactivated
across all fields.
Single field search
A simple search string should be provided as <field_name>:<value>
. The field name should be an exact match of the field name to be searched. The /search
API will perform a partial match of data in the specified field matching the search term.
For example: actor:"Dennis"
will initiate the search for any audit event where the actor
field contains the word Dennis
.
You can exclude specific values and return everything that does not match the search term by including a minus (-) prefix. For example, -actor:"Dennis"
returns all results where the actor
field does not include the word Dennis
.
Certain field types, including integers, booleans, and datetimes (in the case of Custom schemas), offer the following functionalities:
- Comparison operators: For fields that support comparison (like datetime and integer), you can use
>
or<
instead of:
to compare values that are greater than or less than the second part, respectively. For example, ifvalue
is a field in your schema, bothvalue>10
andvalue<10
are valid expressions. - Negative values: For integer fields, you can add minus (-) prefix to the number. For example,
value<-11
would search for everything with a value less than negative 11. - Boolean field filtering: You can filter boolean fields using
fieldname:true
to find values that aretrue
orfieldname:false
to locate values that arefalse
.
Using multiple search terms
Multiple search terms can be joined using the AND
and OR
operators. For example, to search for events where the actor
field contains Dennis
and the target
contains Security
, the following search string would be used:
actor:"Dennis" AND target:"Security"
Grouping search terms
Search terms can be logically grouped using parentheses. As an example, to search for events where the actor is "Dennis" or "Grant" and the target is "Security" the following would be used:
(actor:"Dennis" OR target:"Grant") AND target "Security"
Escaping characters
Some fields might have values that make searches behave in unexpected ways. Two such characters are the backslash \
and the quotation mark "
. When this occurs, you can escape the character to make the search return results as expected.
A backslash at the end of the value can cause search to escape the quotation mark at the end of the field value, which has undesirable effects. Instead, if there is a backslash at the end, you can escape the backslash. So if you have a search for website:"www.example.com\"
, you can add a backslash like in the example below.
website:"www.example.com\\"
Since quotations are designed in search to dictate an exact match, having a quotation mark in the middle can also create unexpected results. You can eliminate unexpected results in your queries caused by such quotations by escaping them. If you have a search for
article="Things I like about the "Benji" movie"
, you can escape the internal quotations like in the following example.
article="Things I like about the \"Benji\" movie"
Best practices for audit log searches
The audit query language makes it very easy to run a simple search without needing to think about it; you type in a term and you get results.
However, these queries can be expensive and take a long time when the log data is very large. Users with significant amounts of data will need to write queries in a way that is performant.
Limiting date range
The start and end parameters of the search query indicate the time window in which to find audit records. Audit data is partitioned by default along this value so searches can ignore data for dates that do not need to be searched.
When using the Pangea User Console, you can set the date and time ranges using the quick selection options in the drop-down menu beside the filter.
You can also use the time field to search between two dates or two times in the search field.
received_at>2024-08-15 AND received_at<2024-8-20
The APIs can perform searches similarly using the start
and end
parameters.
"start":"2024-12-29T01:02:03Z","end":"2025-01-02T01:02:03Z"
Using prefixed queries for fuzzy matches
Prefixed queries are queries where the search term you are using is prefixed by the field of data you want to search. The example used below is haystack:needle
. The search term is needle
and the field that is being searched is haystack
.
When running bare queries using terms that have no prefix, the search is expanded to a very expensive substring match against all fields. An example of searching for needle
becomes:
WHERE ( audit.a LIKE '%needle%' OR audit.b LIKE '%needle' OR audit.c LIKE '%needle%' ... )
This will scan the full contents of every field of your audit record for “needle”.
However, you probably don't need to search for needle
in all fields. Maybe you are only looking for a "haystack" field with a value of needle
. Use prefixed terms, instead, to only look in the appropriate fields. The example is then haystack:needle
which becomes:
WHERE audit.haystack LIKE '%needle%';
Using equality checks for exact matches
You can search using prefixed queries with =
instead of :
to perform equality matching. This utilizes indexed lookups instead of doing a scan of the entire field to check for the search term. This is the quickest way to match a field/value (instead of fuzzy/regexp-based matching.) This method should always be used, when applicable, instead of colon-prefixed queries.
So, when searching for needle
inside the haystack
field, a search like the following:
WHERE audit.haystack LIKE '%needle%';
will complete a full scan of every field inside of the haystack
field for a substring similar to needle
.
Instead, when looking for an exact match inside the field, you can use equalities (=
) to complete the same search faster and with lower costs. The search then becomes:
haystack="needle"
This will search for only needle
values, and not return similar terms, such as a needle
or needles
.
When using time field matching, using an equality with a date will match all results for that date, which is the same as using a colon (:
). For instance, when using the default received_at
field, you could search for all events that occurred on a specific date, using the following search.
received_at=2024-02-29
Equalities (=
) and colons (:
) display the same output for searches in the following field types: time, booleans, and integers.
Adding a time to the end of it will match anything that occurred at that exact time, down to the ten-thousandths of a second, i.e to four decimal places (.0000). However, the time stamp is set to UTC, not to local time, even though the timestamp field displays the time and date in local time. Make sure you first convert the time to UTC prior to attempting the search.
received_at=2024-12-29T01:02:03.0004Z
When you need to search logs before or after a specific time stamp or date, you can use the less than (<
) for occuring before the date/time, or the greater than (>
) for occuring after the date/time.
received_at<2024-02-29
You can also restrict search results with search_restrictions
which will perform exact matching against fields, such as actor
, status
, tenant_id
, etc.
Using a schema with simple columns
You should avoid storing multiple values in a single field (like JSON) and querying it with substring matches. Instead, extract the values and store them in their own field. It is better to query fields with less data than fields with more data.
You should, however, use JSON to store multiple values of non-indexed fields as described in the next section.
Avoid indexing unnecessary fields
All fields of a schema are indexed by default. However, by clicking the pencil Edit icon beside a field when creating a schema, you can modify the Type field to be Non-Indexed String. Non-indexed string fields will not be visible to the search, making it easier and faster to perform searches, however, they remain viewable in the logs. Non-prefixed searches will also be more performant, because there will be fewer fields to search for the requested term.
As a best practice, do not index any unnecessary fields when you are creating your schema. Instead when creating a schema, you should create a single field of the non-indexed data type, and then add all non-indexed items in JSON format inside that field.
Restrict search results
In some cases, it may be desirable to partition search results. The search_restriction
provided as restriction
to the API can facilitate this need. A search_restriction
can limit queries to the data described by the search restriction.
As an example, consider the following restriction:
{
"actor":"Dennis Nedry"
}
In this case, no matter the results included in the query, only results containing "Dennis Nedry" in the actor
field will be returned.
This could be useful in an app that exposes the search interface to its users, providing the users with a way to search for auditable actions performed by themselves. A search restriction could restrict them to such actions without allowing them to see activities performed by other users. See the API Reference for more information on API queries.
SDK example
Perform a search
Each SDK provides a search
method that can be used to search the audit log.
The following shows examples of searching the audit logs.
import os
import pangea.exceptions as pe
from pangea.config import PangeaConfig
from pangea.response import PangeaResponse
from pangea.services import Audit
from pangea.services.audit.audit import SearchOutput, SearchResultOutput
from pangea.tools import logger_set_pangea_config
# This example shows how to perform an audit log, and then search for thats results
token = os.getenv("PANGEA_AUDIT_TOKEN")
domain = os.getenv("PANGEA_DOMAIN")
config = PangeaConfig(domain=domain)
audit = Audit(token, config=config, private_key_file="./key/privkey", logger_name="audit")
logger_set_pangea_config(logger_name=audit.logger.name)
def main():
print("Log Data...")
msg = "python-sdk-standard-schema-example"
try:
log_response = audit.log(
message=msg,
actor="Someone",
action="Testing",
source="monitor",
status="Good",
target="Another spot",
new="New updated message",
old="Old message that it's been updated",
verify=True,
verbose=False,
sign_local=True,
)
print(f"Log Request ID: {log_response.request_id}, Status: {log_response.status}")
except pe.PangeaAPIException as e:
print(f"Request Error: {e.response.summary}")
for err in e.errors:
print(f"\t{err.detail} \n")
exit()
print("Search Data...")
page_size = 10
query = "message:" + msg
try:
search_res: PangeaResponse[SearchOutput] = audit.search(
query=query, limit=page_size, verify_consistency=True, verify_events=True
)
result_id = search_res.result.id
count = search_res.result.count
print(f"Search Request ID: {search_res.request_id}, Success: {search_res.status}, Results: {count}")
offset = 0
print_header_results()
while offset < count:
print_page_results(search_res, offset, count)
offset += page_size
if offset < count:
search_res = audit.results(
id=result_id, limit=page_size, offset=offset, verify_consistency=True, verify_events=True
)
except pe.PangeaAPIException as e:
print("Search Failed:", e.response.summary)
for err in e.errors:
print(f"\t{err.detail} \n")
def print_header_results():
print(f"\n\nreceived_at\t\t\t\tMessage \tSource " f"\t\tActor \t\tMembership \tConsistency \tSignature\t")
def print_page_results(search_res: PangeaResponse[SearchResultOutput], offset, count):
print("\n--------------------------------------------------------------------\n")
for row in search_res.result.events:
print(
f"{row.envelope.received_at}\t{row.envelope.event['message']}\t{row.envelope.event['source']}\t\t"
f"{row.envelope.event['actor']}\t\t{row.membership_verification}\t\t {row.consistency_verification}\t\t {row.signature_verification}\t\t"
)
print(
f"\nResults: {offset+1}-{offset+len(search_res.result.events)} of {count}",
)
if __name__ == "__main__":
main()
Setting the optional parameter verify
to true will automatically verify the membership and consistency proofs of each returned result.
Paginate search results
Audit results can be paginated using a combination of offset
, count
, and results_id
. The results_id
is returned by the search method and is a unique id corresponding to the search results. Search results don't live indefinitely; they have a defined expiration date that's also returned with the search results. Offset
is used to determine at which record number results should be returned, and Count
indicates how many records in total have been returned by the search. Continuing the previous example, paging through the results in Python would look like this:
if search_res.success:
result_id = search_res.result.id
count = search_res.result.count
offset = 0
while offset < count and search_res.success:
for row in search_res.result.events:
print(f"{row.event.received_at}\t{row.event.message}\t{row.event.source}")
offset += page_size
search_res = audit.results(result_id, limit=page_size, offset=offset)
Was this article helpful?