Internal Meta communications revealed in court documents show employees discussed using copyrighted materials, including pirated books, to train the company's artificial intelligence models.
The documents, unsealed Thursday as part of the Kadrey v. Meta lawsuit, expose conversations where Meta staff debated acquiring copyrighted content without proper licensing agreements.
In a February 2023 chat, Meta research engineer Xavier Martinet advocated for an "ask forgiveness, not permission" approach, suggesting buying retail e-books rather than negotiating with publishers. He argued that many startups were likely already using pirated materials for AI training.
Senior manager Melanie Kambadur noted that while approvals were still needed for using public data, Meta's legal team had become "less conservative" about such approvals. The company now had "more money, more lawyers, more bizdev help," she stated.
The filings also reveal discussions about using Libgen, a controversial platform known for providing unauthorized access to copyrighted works. Sony Theakanath, Meta's director of product management, called Libgen "essential" for achieving top AI model performance. He proposed removing clearly marked pirated content and avoiding public disclosure of Libgen's use.
Internal communications suggest Meta's own data sources - including Facebook and Instagram posts - were insufficient for AI training. In March 2024, Chaya Nayak, director of product management, indicated leadership was considering overriding previous restrictions on certain content sources, stating "we need more data."
Meta has strengthened its legal defense by adding Supreme Court litigators to its team. The company has not responded to requests for comment about these revelations.
The case, filed in the U.S. District Court for Northern California in 2023, includes authors Sarah Silverman and Ta-Nehisi Coates among the plaintiffs. They dispute Meta's claim that training AI models on copyrighted works qualifies as "fair use."
This lawsuit's outcome could establish precedents for how AI companies handle copyrighted material in model training, balancing technological advancement with intellectual property rights.