AI training data controversy escalates: Another tech giant faces lawsuit over pirated books

robot
Abstract generation in progress

[CryptoWorld] Recently, an interesting incident has occurred—another lawsuit in the tech industry over AI datasets. An author, Elizabeth Lyon, sued a well-known tech company, claiming that their large language model was trained using a dataset that included her copyrighted works.

What exactly happened? The issue centers around the SlimPajama-627B dataset. This dataset originates from the RedPajama project and includes a highly controversial collection called “Books3”—essentially a large amount of unlicensed book data. The company used this data to train the SlimLM AI model, and the author discovered that her works had been forcibly included.

This is not an isolated incident. Similar legal troubles are piling up, involving not only this company but also several other tech giants—accused of using protected content without authorization during AI development. This raises a core question: Can AI models freely use data from the internet and publications for training? How can the rights of content creators be protected?

From the perspective of Web3 and open-source communities, this incident reflects a larger contradiction. On one hand, AI development requires vast amounts of data; on the other hand, the rights of content creators must not be arbitrarily infringed. Finding a balance between the two has become a difficult challenge facing the entire tech industry. How these lawsuits will develop remains to be seen.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 5
  • Repost
  • Share
Comment
0/400
GateUser-beba108dvip
· 2025-12-18 01:50
Here we go again, big tech companies just follow orders and take everything, regardless of copyright or not.
View OriginalReply0
AirdropDreamervip
· 2025-12-18 01:50
Here we go again, here we go again, it's another case of AI stealing data... Tech giants are truly unstoppable, huh
View OriginalReply0
MidnightSnapHuntervip
· 2025-12-18 01:48
Damn, here we go again? Large model training is just a modern version of "utilitarianism."
View OriginalReply0
MetaMaximalistvip
· 2025-12-18 01:28
honestly this is just the beginning. once the precedent gets set, every creator's gonna come knocking. the real question nobody's asking is whether fair use doctrine even *applies* to training data at scale... and ngl the tech giants banking on murky legal territory while authors get squeezed is peak extractive capitalism dressed up as innovation.
Reply0
  • Pin

Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)