Authorization in AI Systems with Pangea Multipass

Keith Casey
Keith Casey
Jan 7, 2025

As we’ve worked with customers evaluating Large Language Models (LLMs) for their organizations, the recurring theme was authorization. In too many of these systems, data in the form of pdfs, Google docs, and more are added to give better, more refined answers and the LLM acts as a giant bypass to the authorization policies that protected those files.

In building our own approach for Authorization in AI Apps, we found the real, underlying problem: authorization structures are distributed across many systems, represented differently between those systems, and rebuilding them creates a management nightmare.

Therefore, we came to the simple conclusion: We’ll check the source.

Introducing Pangea Multipass: Your Authorization Helper

With Pangea Multipass, you can query a user’s access to a resource in real time and get back a simple “allowed” or “denied.” Multipass normalizes the interfaces for the underlying services - Google Drive, Confluence, Slack, Github, and more at launch - to abstract the credentials, the interaction, and the response. Further, since LLMs are inherently a read-only interface, Multipass will handle collapsing the various access roles - viewer, editor, owner - to a simple set of “can read” permissions.

Using Pangea Multipass to validate authorization

Using Pangea Multipass: Ingestion vs Inference Time

With respect to AI/LLM-based apps, there are two places you’d want to use Multipass:

First, as you’re ingesting files into your model, you can use Multipass to query the source of the file to extract authorization information. Then you would use an authorization engine - like Pangea AuthZ - to store the mapping of files to users and files to vectors. Later, at inference time, when you have a set of vectors, you can filter those vectors based on the user and generate an authorization-aware response.

Alternatively, for particularly sensitive information or in a RAG-based architecture, you may prefer a real-time authorization check. In this scenario, you will move Multipass’s authorization query from ingestion time to inference time.

Finally, there’s nothing LLM-specific about this library. You can take advantage of Multipass to query access for any resource in a supported backend storage system.

Embedding Multipass in your app

Using Multipass in your application is only a few steps. You can see a full runnable end to end example but here are the key aspects:

First, you add and install it:

poetry add pangea-multipass
poetry install

Next, you add the credentials for the upstream data source. This varies depending on the provider. In the example, we walk through Google Drive specifically.

Then you initialize the data source:

gdrive_reader = GoogleDriveReader(
    folder_id=gdrive_fid, token_path=admin_token_filepath, credentials_path=credentials_filepath
)
documents = gdrive_reader.load_data(folder_id=gdrive_fid)

This gives you a list of files which you can then use the processors to filter into the authorized and unauthorized resource lists:

gdrive_processor = LlamaIndexGDriveProcessor(creds)
node_processor = NodePostprocessorMixer([gdrive_processor])
authorized_docs = node_processor.postprocess_nodes(documents)
unauthorized_docs = node_processor.get_unauthorized_nodes()

In general, the authorized list will be more important but you may want to log or notify an admin if a user is attempting to access a folder where they have limited access. It could be an attempt at data theft or their permissions are incomplete.

Next Steps for Multipass

At launch, we support Google Drive, Confluence, Jira, Github, and Slack to support our own and pilot customers’ needs. That said, we’ve already had requests to extend Multipass to over a dozen other data sources ranging from public, well-structured systems to internal, one-off databases. Therefore, we designed the library to be extensible from the start.

The README describes extending it in detail but you’ll need to get the source’s credentials, use those credentials to connect, retrieve the file or resource, and retrieve the users who have access to the file. If your source doesn’t support API access for each of these actions, you may need to extend the library further.

As you extend Pangea Multipass with your own sources, let us know how we can make it better and easier or even file a pull request to speed things along.

Get updates in your inbox and subscribe to our newsletter

background landmass

We were recognized by Gartner®!

Pangea is a Sample Vendor for Composable Security APIs in the 2024 App Sec Hype Cycle report