Sometimes it starts with someone trying to finish a report before the end of the day.
Most conversations about AI security focus on external threats — attackers, exploits, rogue models. But the fastest-growing source of AI-related data exposure in 2025 and 2026 is something far more ordinary: people using AI tools to do their jobs, without anyone having thought through what those tools can see, store, or send.
This is not a story about negligence. Most of the employees involved in the cases below were doing exactly what they were supposed to do — working faster, solving problems, meeting deadlines. The leak happened because useful tools were allowed to touch business data without boundaries.
Seven cases. Seven different root causes. One pattern.
lacked proper AI access controls
The Seven Cases
Within 20 days of Samsung's semiconductor division granting engineers permission to use ChatGPT, three separate incidents occurred. The first engineer pasted proprietary source code into the chatbot to debug a database error. The second shared similarly sensitive code for optimisation purposes. The third transcribed an entire internal meeting using a third-party audio tool and fed the full transcript into ChatGPT to generate meeting minutes.
In all three cases, the employees were not acting maliciously. They were trying to work faster. At the time, OpenAI's default policy allowed submitted data to be used to improve future models. Samsung had issued informal warnings not to enter private information, but no technical controls existed to enforce the policy.
Samsung launched disciplinary investigations, capped all ChatGPT prompts at 1,024 bytes, and one month later banned all generative AI chatbots from company-owned devices and networks.
An acceptable-use policy without technical enforcement is not a control. If staff can paste anything into a public AI tool, they will — not out of malice, but because the tool is useful and the boundary is invisible. Consumer and free-tier AI plans often have no data processing agreements and no training exclusions. Enterprise plans exist for a reason.
On 20 March 2023, a bug in an open-source Redis client library used by OpenAI created a nine-hour window in which some ChatGPT users could see the conversation titles — and in some cases billing information — of other active users. OpenAI CEO Sam Altman publicly described it as a "significant issue."
Approximately 1.2% of ChatGPT Plus subscribers active during that window had data exposed: first name, last name, email address, billing address, and the last four digits and expiry date of their payment card.
This was not caused by anything a user did. The exposure came from a third-party dependency containing a bug that corrupted cached session data, allowing one user's session to leak into another's view. Italy's data protection authority used the incident as the basis to temporarily ban ChatGPT and in December 2024 fined OpenAI €15 million for underlying GDPR violations. In March 2026, the Court of Rome annulled that fine in its entirety on appeal.
Even when your team does nothing wrong, vendor-side infrastructure can expose data. If you operate in a regulated sector — healthcare, legal, financial services — review whether your AI vendor offers a data processing agreement and whether their free or consumer tier provides any GDPR compliance guarantees. Most do not.
Three days after Amazon announced its enterprise AI assistant Q at re:Invent 2023, internal employees began raising urgent alarms. According to internal documents reported by Platformer, Amazon Q was "experiencing severe hallucinations and leaking confidential data" during its public preview, including the physical locations of AWS data centres, details of internal employee discount programmes, and unreleased AWS features not yet public.
Amazon disputed the severity of the reports, stating publicly: "No security issue was identified as a result of that feedback." The company accelerated tuning of Q before general availability.
This case is categorised as reported, not confirmed, because Amazon's response contested the characterisation. The underlying dynamic is nonetheless real: AI systems trained on or given access to internal knowledge can surface information through outputs in ways that were never anticipated during design.
Before any AI tool — especially one connected to internal knowledge bases, documentation, or infrastructure data — goes live, test what it can be prompted to reveal. Preview systems and internal pilots are particularly vulnerable because access controls are often not yet fully scoped. Red-team your own deployment before staff do it accidentally.
In May 2024, security researchers discovered that Slack's revised privacy policy — quietly updated in September 2023 — allowed the company to use "Customer Data (e.g. messages, content and files) submitted to Slack" to develop AI and machine learning models. The policy was opt-out by default. Individual users could not opt out themselves — only workspace administrators could, by emailing Slack directly.
The data potentially included private direct messages, confidential internal channels, shared files, and other workspace content. Slack's AI product page simultaneously stated "Your data is yours. It won't be used to train Slack AI" — creating messaging that contradicted the underlying policy terms.
Under public pressure, Slack issued a clarification stating that direct message content was not used to train generative AI models, only aggregate platform-level data. Critics noted the original policy language had been in place for months without prominent communication.
This is not a data breach in the traditional sense. It is a data-governance warning shot: vendor terms of service can change the fate of your business data without a visible incident, without a breach notification, and without requiring anyone to do anything wrong.
Monitor vendor terms for any tool that handles sensitive business communications or files. Terms can change silently. Assign responsibility for quarterly policy reviews on high-risk tools. For Slack specifically, ensure your workspace administrator has confirmed opt-out status for ML training data sharing.
This case has two components that together make GitHub Copilot one of the most documented AI data exposure vectors in development environments.
Training data memorisation. Researchers constructed 900 prompts designed to extract credentials from Copilot's training data. The result was 2,702 hard-coded credentials surfaced through Copilot's code suggestions, of which at least 200 were confirmed functional secrets still active on GitHub. GitGuardian further found that repositories where Copilot is actively used show a 40% higher rate of secret leaks compared to the average public repository.
CamoLeak vulnerability. Security researchers disclosed a vulnerability in GitHub Copilot Chat that allowed attackers to embed malicious prompt injections inside pull request descriptions. When a developer asked Copilot about the pull request, the hidden instructions activated, causing Copilot to encode stolen API keys and source code into image URLs and exfiltrate them to an attacker-controlled server. GitHub addressed the vulnerability by disabling image rendering in Copilot Chat. Because primary-source CVE confirmation is still pending, this article avoids naming a CVE ID.
Treat AI-generated code as untrusted input. Implement automated secret scanning — GitHub's built-in scanner, GitGuardian, or TruffleHog — on every repository. Never allow hardcoded credentials in any codebase. Rotate API keys and credentials regularly, even if you believe no leak has occurred. The velocity at which AI coding tools generate code increases the velocity at which secrets can accidentally enter codebases.
Between 8 and 18 August 2025, a threat actor compromised one of Salesloft's internal GitHub repositories, discovered a sensitive OAuth token, and used it to authenticate into Salesloft's account within the Drift AI chatbot — a sales conversation tool integrated with customers' Salesforce and Google Workspace environments.
Using Drift's trusted integration access, the attackers systematically exfiltrated data from the Salesforce instances of over 700 organisations globally. Confirmed victims included Cloudflare, Palo Alto Networks, Zscaler, Tenable, and Proofpoint. FINRA issued an emergency alert to all member financial firms. Google's Threat Intelligence Group published a detailed advisory.
Cloudflare published a public postmortem confirming their Salesforce instance was accessed between 12 and 17 August. The attacker exfiltrated support case content including, in some cases, customer-submitted API tokens, passwords, and configuration details. Cloudflare rotated 104 API tokens and notified all affected customers directly.
One company was not affected: Okta, which had implemented IP allow-listing on its Salesforce integration. The stolen OAuth token was useless against Okta's IP restrictions.
Every AI tool integrated into your business stack — chatbots, sales assistants, email tools — is a potential attack surface with access to your core systems. Treat OAuth tokens and API credentials that connect AI tools to your CRM, email, and cloud storage with the same discipline as banking credentials. Implement IP allow-listing on integrations where possible. Rotate credentials on a schedule. Immediately revoke access when you stop using a vendor — one company was breached through a Salesloft token that had never been deactivated after they left the platform.
Microsoft 365 Copilot does not have a data leakage bug. It is working exactly as designed. The problem is what it is designed to do: index and search across SharePoint, OneDrive, Teams, and Exchange — respecting whatever permissions exist across those systems.
In most organisations, those permissions are years of accumulated mismanagement. Widely shared links, "everyone except external users" settings, sites that were never locked down after projects ended. When Copilot arrived, it became capable of surfacing documents that employees were never meant to see — not because Copilot had an exploit, but because the permissions had been wrong for years and no one noticed until AI made them easy to query.
Research found that approximately 16% of business-critical data is overshared across the average Microsoft 365 tenant, with around 802,000 files at risk per organisation. A Gartner survey found that 40% of organisations delayed their Copilot rollout by three months or more due to concerns about oversharing. Security researchers also documented prompt injection attacks where malicious payloads embedded in emails caused Copilot to exfiltrate sensitive SharePoint documents via Teams or SharePoint links.
Before enabling any AI tool that indexes your business data — Copilot, Notion AI, Gemini in Workspace, or anything similar — audit your file permissions first. Assume that anything accessible to "Everyone" or via a broadly shared link will become discoverable by any licensed user through an AI query. Apply least-privilege principles to your document storage before you turn AI on, not after.
The Pattern
Across seven cases, seven different root causes, the same underlying dynamic appears.
AI does not create every risk. It makes existing bad boundaries faster, easier, and more searchable.
Samsung had no technical controls on what employees could paste. The risk was always there — AI just made it effortless to act on.
Microsoft 365 Copilot had oversharing problems long before Copilot existed. AI just gave every licensed user a high-speed query interface into the mess.
Salesloft/Drift had a token stored insecurely in a repository. That was always a vulnerability. AI integration just made the blast radius global.
The tools are useful. That usefulness is precisely what makes uncontrolled access dangerous. When AI can do more, it can also leak more — faster, at scale, without a visible incident until the damage is done.
What This Means for Small Business
The cases above involve Samsung, Cloudflare, Palo Alto Networks, and Microsoft. It would be easy to read this as an enterprise problem.
It is not.
The highest concentration of uncontrolled AI use is not in large organisations with complex procurement processes. It is in small and mid-sized businesses where there are fewer policies, fewer technical controls, and more "just get it done" culture around new tools.
Small businesses rarely have a CISO. They rarely have a data processing agreement review process. They rarely have someone whose job is to notice when a vendor quietly updates their terms to include AI training on customer data.
They also rarely have the margin to absorb a data breach. IBM's 2025 Cost of Data Breach Report found that shadow AI incidents add an average of $670,000 to breach costs — not a figure most small businesses can absorb.
The Samsung engineers, the Slack workspace administrators, the developers using Copilot — they were all trying to do good work. The risk is not that AI users are careless. The risk is that the boundaries between "useful tool" and "business data" have not been drawn, communicated, or enforced.
Drawing those boundaries is not a large enterprise problem. It is an every-business problem.
Where This Leads
The seven cases in this article all involve AI tools operating without clear boundaries — around what they can access, what they can store, what they can send, and what happens when something goes wrong.
Those boundaries do not appear by default. They require deliberate decisions about routing, memory, tool access, trust, and local-vs-cloud data flow. They require someone in the organisation to have thought through what the AI stack is actually doing, not just what the marketing materials say it does.
That is the work. The checklist below is where to start.
The article shows where the leaks happen. The checklist helps you find where your own business may already be exposed.
Download the Free ChecklistFoundation Blueprint →
Sources & Verification Notes
Samsung case: widely reported across multiple outlets including Bloomberg, The Guardian, and Gizmodo, May 2023.
OpenAI Redis bug: OpenAI incident blog, March 2023. Italy GDPR fine: Garante decision, December 2024; annulled by the Court of Rome in March 2026.
Amazon Q: Platformer report based on internal documents, December 2023. Amazon disputed the characterisation.
Slack training policy: TechCrunch, Ars Technica, The Register, May 2024. Slack privacy policy update September 2023.
GitHub Copilot training data memorisation: GitGuardian State of Secrets Sprawl 2026. CamoLeak: Legit Security research disclosure, June 2025. CVE ID pending final verification — will be updated upon confirmation.
Salesloft/Drift breach: Cloudflare public postmortem, August 2025. Google GTIG advisory. FINRA emergency alert. Trend Micro and Darktrace threat intelligence reports.
Microsoft 365 Copilot: Gartner AI Governance Survey 2026. Microsoft SharePoint Advanced Management documentation.
IBM 97% stat: IBM Cost of Data Breach Report, 2025.
Reco AI shadow AI concentration stat: Reco AI State of Shadow AI Report, 2025. Small business figure (269 unsanctioned tools per 1,000 employees) sourced from this report.
Shadow AI $670,000 cost premium: IBM Cost of Data Breach Report 2025, cross-referenced with Reco AI 2025 analysis.