If you’ve ever written a book and had it professionally published, you’ll know that your work is protected by copyright laws. There are exemptions and limitations, such as fair usage, but all of it is very clear and strict. However, three American authors have filed a lawsuit, claiming that Nvidia is guilty of breaching said laws by using their work without permission, to train its LLM toolkit called NeMo.
Generative AI models, such as GPT-3, Llama, and Dall-e, require huge amounts of data to train them and make it possible to use the model in tools like ChatGPT and Copilot. In the case of NeMo, it’s technically a framework for AI developers, helping to make it easier to create, tweak, and distribute their own large language models (LLMs).
But even so, it still has to undergo AI training and additionally, Nvidia offers a range of pre-trained models in its cloud service. The Reuters report on the lawsuit (via Seeking Alpha) is a touch light on details, but it’s early days for the case as it was only filed last week. The three authors in question (Brian Keene, Abdi Nazemian, and Stewart O’Nan) are claiming that one of the very large datasets Nvidia used for its training contains copies of their published works and the use was done without permission.
Normally in such legal cases, the defence focuses on it being an example of ‘fair use’ and Meta has even gone as far as to say that it’s essentially no different to how a child learns by being exposed to speech and text around it.
On the other hand, those that have filed lawsuits in the past, such as the New York Times, have said that this is simply about the AI world not being willing to pay the due fees for works that are not only protected by copyright laws but have also correctly registered their work with the appropriate authorities.
Defendants of generative AI typically have a different view: If you’ve read a multitude of books and then go on to write your own bestseller, is your work in breach of copyright? LLMs don’t automatically use exact copies of the material used in the training and if you’ve ever used something like Stable Diffusion and told it to draw you a famous painting, you’ll get one like it, but not a picture that’s a direct copy.
It’s a complex situation, no doubt, but if this class action case is successful, it will almost certainly be followed by countless more, as the dataset in question used nearly 200,000 novels, short stories, textbooks, and so on. All of that material is copyrighted, though not necessarily all of it has been registered.
Either way, the AI lawsuit train is showing no signs of slowing down and I should imagine a great number of writers, artists, musicians, and designers will be paying close attention to the outcome of this particular case. Choo, choo!