Mining the Index: Uncovering Sensitive Data in Public ChatGPT Histories via Google Search

Joey Melo

Aug 1, 2025

Recent revelations have shed light on a new reality for ChatGPT users: Google has been actively indexing shared ChatGPT conversation histories. This means that shared discussions, potentially containing sensitive personal or professional information, are being made publicly discoverable through search engines.

The mechanism behind this unexpected indexing is surprisingly straightforward. When a user chooses to share a ChatGPT conversation, a unique URL is generated. According to OpenAI, “shared links are not enabled to show up in public search results on the internet by default.” However, this crucial step is often overlooked by users who are distracted or unaware, especially when the URL is intended for sharing with a limited audience.

The consequences of this indexing are significant, and the attack surface for privacy breaches only seems to grow.

Exposure of Personal Identifiable Information (PII): Users might unknowingly share details like names, addresses, phone numbers, or other PII during a ChatGPT interaction, which then becomes searchable.
Academic and Research Data: Students and researchers discussing sensitive topics, experiments, or unpublished findings with ChatGPT could find their work prematurely exposed.
Source Code Exposure: Developers or engineers discussing proprietary algorithms, software vulnerabilities, or internal code structures with ChatGPT could inadvertently make this information public.
Internal Infrastructure Exposure: Details about a company's network architecture, server configurations, or security protocols shared with ChatGPT could lead to significant security risks if exposed.

What has been exposed?

I got curious and wanted to see what information and resources are out there on this topic, so I decided to dig in.

Private conversations

Some conversations included somewhat private information, such as home renovation ideas, recipes, or informal brainstorming.

One humorous exchange involved a user inquiring about microwaving a metal fork, eliciting a highly sarcastic response from ChatGPT.

Personal Identifiable Information

Many conversations were found to contain sensitive personal information such as emails, phone numbers, addresses, and names.

More personally identifiable information (PII) was discovered, particularly within the group of individuals seeking resume writing tips.

Source code

Some developers shared their backend tree view or other backend configuration.

Others revealed complete source code for scripts and automation.

Stack traces (internal systems information)

Using ChatGPT for debugging and error handling is common. However, users often share errors without realizing these may contain sensitive information about the underlying code or technology.

Conclusion

This situation underscores a fundamental challenge in the rapidly evolving landscape of AI and online data: the often-blurred lines between private interaction and public discoverability. Users assume a certain level of privacy when interacting with AI models, and the indexing of shared histories by search engines creates a potential for unintended and unwelcome exposure.

Companies, in particular, could benefit from configurable guardrails, which can prevent the inadvertent exposure of sensitive company information, such as proprietary code, internal infrastructure details, and confidential project data, through AI conversations. This proactive approach is crucial for safeguarding intellectual property and maintaining a strong security posture in the age of widespread AI adoption.

Note: At the time of writing, OpenAI has disabled the discoverability feature, and Google has stopped indexing shared conversation histories.

AI openai Google PII sensitive data chatgpt

Get updates in your inbox and subscribe to our newsletter

More blog posts

Is Your LLM Leaking Sensitive Data? A Developer’s Guide to Preventing Sensitive Information Disclosure

Pranav Shikarpur

Is Your LLM Leaking Sensitive Data? A Developer’s Guide to Preventing Sensitive Information Disclosure

Your data has been exposed—and not because of a classic bug, but because your LLM accidentally leaked it. Sensitive information disclosure is a growing concern, especially with the rise of Large Language Models (LLMs) in our apps. This vulnerability ...

The Hidden Influence: Prompt Injection in ArXiv Research Papers

Joey Melo

The Hidden Influence: Prompt Injection in ArXiv Research Papers

Research papers are being published on ArXiv with prompt injections, subtly attempting to influence their review scores and secure positive recommendations from peers. But how effective is this tactic? Reported by Nikkei Asia, researchers from 14 aca...

Oliver Friedrichs

Pangea Leads the Charge in AI Security

It’s no secret that Generative AI is reshaping every facet of our world today. To put this into perspective - some of the largest organizations that we’re speaking with have 900 ACTIVE generative AI projects right now! Securing these applications has...

Secure AI from cloud to code

SOC 2 Type 2

ISO/IEC 27001

ISO/IEC 27701

Mining the Index: Uncovering Sensitive Data in Public ChatGPT Histories via Google Search

Joey Melo

What has been exposed?

Private conversations

Personal Identifiable Information

Source code

Stack traces (internal systems information)

Conclusion

Get updates in your inbox and subscribe to our newsletter

More blog posts

Pranav Shikarpur

Joey Melo

Oliver Friedrichs

Secure AI from cloud to code

Outsmart our AI. Play now