Reddit Files Lawsuit Against Perplexity AI Over Alleged Unauthorized Data Scraping

Legal Action Over AI Training Data

Reddit has filed a lawsuit against artificial intelligence company Perplexity and several data scraping firms, alleging they improperly obtained and resold Reddit’s content without authorization, according to court documents reviewed by PYMNTS. The legal action names Perplexity AI, Oxylabs UAB, AWMProxy and SerpApi as defendants in what sources indicate is part of a broader industry pattern of disputes over training data for AI models.

Legal Action Over AI Training Data
Alleged Data Scraping Operation
Reddit’s Data Licensing Strategy
Broader Legal Context

Alleged Data Scraping Operation

The lawsuit claims the defendant companies allegedly obtained Reddit’s data through Google search results and then resold it to AI companies without consent or compensation, according to the filing. Reports state that Perplexity specifically purchased Reddit data from at least one of the scraping firms. This case follows a similar lawsuit Reddit filed earlier this year against Anthropic, alleging comparable unauthorized use of Reddit data for training large language models.

Reddit Chief Legal Officer Ben Lee characterized the situation as representative of wider industry challenges, stating that “AI companies are locked in an arms race for quality human content, and that pressure has fueled an industrial-scale data laundering economy,” according to a statement quoted by Bloomberg. Analysts suggest this reflects the increasing value of human-generated content for AI training purposes.

Reddit’s Data Licensing Strategy

Reddit’s extensive repository of public conversations has become a critical resource for training generative AI models, with the company already establishing paid data-licensing agreements with major technology firms. According to reports, the platform has signed structured data-access deals with both OpenAI and Google that provide authorized access to its posts and comment threads.

The company claims that unauthorized scraping and use of its data undermines fair competition and creator rights, arguing that firms bypassing proper licensing channels gain unfair advantages while depriving content creators of compensation. Legal experts suggest this case could help establish important precedents regarding web-scraped content used in AI model training.

Broader Legal Context

The case, officially titled Reddit Inc. v. SerpApi LLC (25-cv-08736), joins a growing wave of disputes shaping data governance and compliance standards in the AI industry. According to legal analysts, similar cases including The New York Times v. OpenAI are forcing companies to reassess how they manage content ownership, consent and data provenance.

Law firm Nelson Mullins noted in recent analysis that these legal challenges are creating what they term an “AI data crisis,” compelling organizations to reevaluate their data governance and discovery strategies. The outcome of Reddit’s case could potentially influence how U.S. courts interpret the legality of using web-scraped content for commercial AI training purposes.

Representatives for Perplexity, SerpApi and Oxylabs did not respond to requests for comment regarding the allegations, according to the original report. The lawsuit represents the latest development in the ongoing tension between content platforms and AI companies seeking training data.