![](https://fedia.io/media/33/95/33951fd7c296b0000d1da93f5b28bad35a19034fa8e8517f7f2dee91fb6751d4.png)
![](https://lemmy.dbzer0.com/pictrs/image/a18b0c69-23c9-4b2a-b8e0-3aca0172390d.png)
The Internet Archive is currently fighting in the courts to maintain free digital library access to over 500,000 books they own from their own collection, yet Meta uses a pirated dataset of nearly 200,000 books to train their proprietary AI and is just allowed to get away with that??
Publishers will go after a charity making fair use of their content, but not the corporation outright stealing from them. What utter bollocks.
I’d love to agree with you - but when people say that LLMs are stochastic parrots, this is what they mean…
LLMs don’t actually know what the words they’re saying mean, they just know what words are most likely to be next to each other based on training data.
Because they don’t know the meaning of what they’re saying, they also don’t know the factuality of what they’re saying - as such they simply can’t self-fact check.