According to engadget, a Bloomberg investigation has revealed that Amazon reported over 1 million instances of AI-related child sexual abuse material (CSAM) to the National Center for Missing and Exploited Children (NCMEC) in 2025. The “vast majority” of the 1+ million AI-related CSAM reports NCMEC received last year came from Amazon, which found the material in its AI training data. Fallon McNulty, executive director of NCMEC’s CyberTipline, called the volume an “outlier” and said Amazon’s reports are “inactionable” because the company won’t disclose where the data came from. Amazon stated it obtained the content from external sources used to train its AI services, removed it before training, and aimed to “over-report” to avoid missing cases. This surge is staggering compared to 2024’s 67,000 reports and 2023’s 4,700.
The unanswered question
Here’s the thing that’s just screaming for attention: Amazon says it can’t provide details on the source. That’s a massive red flag. The CyberTipline isn’t just a suggestion box; it’s where actionable data gets sent to law enforcement. By not saying where this mountain of horrific content came from, Amazon has basically handed NCMEC a pile of evidence with no case number, no leads, nothing. It’s useless. So what’s really going on? Is the source so legally fraught or embarrassing that silence is the better option? Or does revealing it expose a gaping hole in their data sourcing practices that they can’t afford to admit? Either way, the lack of transparency here is more damning than the initial discovery.
A systemic AI problem
This isn’t just an Amazon problem, though their scale makes it terrifying. It’s a flashing neon sign pointing at the entire, breakneck industry of AI training. Companies are scraping the darkest corners of the public web, hoovering up petabytes of data to feed their hungry models, and apparently doing a horrifically bad job of filtering it. The explosive growth in reports—from 4,700 to over 1 million in two years—tells you everything. The safeguards are failing, or more likely, were never truly built to handle this scale and this type of evil. And while the focus is on training data, let’s not forget the other side: AI chatbots are already implicated in real-world tragedies with teens. The whole ecosystem is proving to be dangerously unstable when it interacts with the real world, and especially with young people.
What does “over-report” mean?
Amazon’s statement that it aimed to “over-report” is a fascinating piece of corporate messaging. On one hand, you could read it as extreme caution. On the other, it feels like a pre-emptive defense. If you later get caught with CSAM in a model, you can point back and say, “Look, we were so careful we even reported the iffy stuff!” But it also raises a practical question. If you’re flooding the system with a million “inactionable” reports, are you helping or are you actively clogging the very system designed to protect kids? You’re creating noise that could drown out real, actionable signals from other companies. It seems less like a safety measure and more like a liability shield.
The inevitable reckoning
So where does this leave us? The lawsuits are already flying against OpenAI, Character.AI, and Meta for harms related to AI and young users. Amazon’s data scandal adds a massive, foundational layer to that legal threat. If you can’t vet your training data, you can’t guarantee the safety of your model’s outputs. It’s that simple. Regulators and lawmakers, who’ve been slow to catch up, now have a crystal-clear, million-example case study of why this industry cannot be left to police itself. The era of moving fast and breaking things is crashing headfirst into the one area where society has zero tolerance for failure. The cleanup is going to be ugly, expensive, and long overdue.
