There are many reasons to log events — for performance tracing, to understand user behavior, or for “just in case” scenarios where something goes wrong… or it could be for compliance. In my previous blog post, I drew a comparison between compliance and the movie The Santa Clause — the TL;DR; is that people don’t just believe that you’re protecting their privacy on blind faith; instead, you have to prove it to them irrefutably.
That, my friends, is easier said than done. Over the course of this blog post, I’ll lay out the challenges presented by trying to make evidence “irrefutable” and propose corresponding solutions to each one. We’ll learn how third-party logging, Merkle trees, blockchain, and cryptographic signing can be combined to provide robust, tamperproof evidence that cannot be denied by even the most untrusting of customers.
Challenge #1: Sophisticated attacks and customer distrust
When a company creates a software product, it’s typical that they own the audit logging process end-to-end. Meaning they own the code that writes the logs and the infrastructure that houses the logs.
This means that when breaching a company’s infra, an attacker will have access to its sensitive data and the logs recording their access to said data. Modifying these logs and deleting evidence of their intrusion is an excellent way for the attacker to cover their tracks and let their presence go unnoticed for long periods of time. Deleting audit logs is similar to a burglar wiping down their fingerprints at a crime scene. It’s criminal behavior 101.
Additionally, the company itself may have an incentive to modify its own logs. For example, a company making software for managing electronic health records could be fined massive amounts of money if it was breached and PHI data was disclosed. To avoid such penalties, the company could delete the audit logs that recorded the breach activity, and nobody would be the wiser. The point is that the public has no reason to trust any company not to modify logs, especially when it’s to the company’s benefit.
Solution #1: A trusted third party
The easiest way to improve your customer’s ability to trust your audit records is to remove them from your control. Storing audit records with a trusted third party that provides write-once, read-only logging is a simple solution that makes it much more difficult for an adversary to cover their tracks and guarantees that a company can’t modify logs to its advantage.
For an attacker, even if they breach your infra, they still don’t have access to the logs. Further, even if they manage to get ahold of our third-party credentials, the third party hasn’t provided them with any operations other than writing new records and reading existing ones. The only way for the attacker to remove evidence of their intrusion is to breach the third party, which requires much more coordination and effort.
Challenge #2: The commutative property of trust
The problem with moving audit logs to a third party is that all we’ve done is commuted the trust between us and our customer, to us and the third-party logging solution. We’ve made life more difficult for attackers, and we’ve removed some incentive to tamper, but what happens if the third party is targeted and breached and log records containing sensitive information are exposed? A third-party logging solution is also susceptible to attacks like any other company. Additionally, just like us, they have humans too, and those humans could make mistakes resulting in data loss that they’d rather not report to us. To trust our third party, and for our customers to trust us, there’s still a burden of proof shared between the third party and us.
Solution #2: Give them proof, Merkle Proof
To provide this proof, we should implement a solution that allows us to cryptographically verify that our log entries are present in our log data, and are free from tampering. We can do this by implementing a Merkle tree.
The idea behind a Merkle tree is that hashes of our data can be verified as a member of that tree, and any unscrupulous modification or removal of said data can easily be proven. In a Merkle tree, each leaf node is a hash of its underlying data and used to label the node. Each non-leaf node is a hash of each of its child nodes. Thus, the tree’s root hash is unique for each new leaf node added to the tree.
Proof of any leaf’s inclusion in the tree can be produced by providing a Merkle proof. The proof includes the sibling leaf hash and other parent hashes that could not otherwise be calculated without additional leaf nodes. The hashes of the two siblings are combined to create the parent hash and then repeatedly combined with each hash in the proof. The resulting hash is then compared to the current root hash to see if they match. Thus, a Merkle proof can be used to prove the inclusion of any leaf in the tree.
Take these two log messages, A and B, for a quick example. If we hash both of them (HA and HB), and hash those hashes (HAB), we have a simple Merkle tree.
We soon add two more log records (C and D). We hash those log records (HC and HD), and hash those hashes (HCD). Finally, we hash both non-leaf nodes (HABCD) to produce our root hash.
If I wanted to prove C’s existence in the tree, I could do so by asking for the root hash and a Merkle proof, which would provide me with HD, HAB. By calculating HC on my own and then hashing it with HD, I could work my way up the tree to validate the proof provided against the root hash and prove that C is included in the log records.
If we apply Merkle trees to our third party logging solution it will function such that each time we write a log entry, a hash of the log record, the new root hash, and a Merkle Proof are returned. With this information, we can prove that the new record has been included in the audit data. Additionally, the Merkle tree implementation allows us to request the Merkle proof for any previous log entries. Using the proof we can verify that the records are untampered and still present and accounted for.
Challenge #3: Nobody’s tracking changes
While the Merkle tree seems like a foolproof solution it still requires too much trust. Sure, we have a hash, a root hash, and a proof that validates our data is intact, but what happens if the third party is breached, records are lost or updated, and rather than telling you, the third party just rebuilds the Merkle tree instead? The data would still line up correctly on subsequent requests, and nobody would be the wiser. Without keeping a history of root hashes, there’s no real way to prevent this from happening. However, being forced, as a customer, to maintain a list of proofs and root hashes negates some of the benefits of using a third-party solution.
Solution #3: A public ledger
The best way to solve this problem would be to record changes to our audit logs in a publicly accessible, immutable ledger. Luckily, such a ledger exists today, and I’m willing to bet it’s one you’ve heard a lot about — blockchain! With blockchain, we can periodically publish our changing root hashes and proofs. These can be used to prove the continuation of the current root from the previous one (periodically publish because it would be impractical to publish each and every root hash to the blockchain).
A proof of continuity works like so. Say we have a small Merkle tree with four log entries in it:
We publish root hash HABCD and subsequently add three more log records. When we log our seventh record, we want to record and publish our new root hash to the blockchain. We also want to make sure HABCDEFG, our current root hash, is a continuation of HABCD, our previously published root hash. To do this, we provide the top node of each newly completed subtree since HABCD, our previously published root hash, was created.
With this proof, we can hash up the tree and validate that our calculated hash matches the currently published root hash, thereby proving that HABCDEFG is a continuation of HABCD. Additionally, this same concept works for any intermediate root hash created between publishing events. The third-party can always provide a proof to us that each new root hash is a continuation of any previously published hash.
What does this mean for our company using this third-party logging solution? It means that, at most, we have to keep track of one root hash at a time. Each time we create a new log entry, we can submit the previous root hash, and can be given proofs that the log record is part of the tree and that the new root is a continuation from the previous one. Independent third parties can use the public ledger (blockchain) to validate the sanctity of our logs as well.
Challenge #4: Pre-commit modification
While it would seem that we have a solution to nearly every problem there’s still one hole in our solution. What happens if a change is made, accidentally or intentionally, before the log records are committed to the Merkle tree? How can we prove that the record stored in the tree is the record that we submitted to the logging solution?
Solution #4: Client-side signing
The way to do this with our technical third-party logging solution would be to sign each record with a private key on the client side before sending the record to the third party. By signing the record with a key known only to us, we can verify that each record committed to the tree is in the exact state it was in when we sent it. For an added bonus, if we use asymmetric key pairs, we can publish a public key allowing anyone to verify the signature of the logs.
Conclusion
As we can see, the task of providing truly irrefutable evidence to your customers is not without many complex challenges. You need to:
Hinder the ability of an attacker to delete their recorded activity
Store your evidence with a third partyProvide cryptographic proof that all evidence is present and accounted for
Commit all evidence to a Merkle tree and provide Merkle proofs for validationProvide a verifiable, immutable history that independent parties can validate
Periodically publish the root hash and proof of continuity to an immutable public-facing ledger (Blockchain)Ensure that you and only you authored all data in your evidence collection
Sign your records with a private key before sending them to the third party.
If that sounds like a lot of work, it is. If you want to try out a logging solution that’s already implemented this and much more, sign up for Pangea secure audit log.