first science publisher sues over scraped research papers : Global Viral News24

A scientific publisher has joined the dozens of firms and individuals suing artificial intelligence companies over their alleged use of copyrighted works in training AI models.

Elsevier – which publishes thousands of journals, including Cell and The Lancet – was part of a class-action lawsuit filed on 5 May against technology company Meta and its chief executive Mark Zuckerberg in the Southern District of New York. Also named as plaintiffs on the lawsuit are book-publishing giants Hachette and Macmillan, and the US fiction author and lawyer Scott Turow. The publishers allege that Meta obtained and reproduced copyrighted works in developing its large language model (LLM) Llama.

“This case is the first AI action brought by major publishing houses, who have their own story to tell about Meta’s flagrant violation of their rights,” said the Association of American Publishers, in a statement.

AI firms must play fair when they use academic data in training

The case mirrors those of authors and media companies – including The New York Times – suing AI firms on similar grounds. Some cases have been settled but, overall, they have yet to establish a clear precedent on whether it is legal to use copyrighted works to train an LLM. A Meta spokesperson has said the company would “fight this lawsuit aggressively”.

Although AI firms are cagey about their training data, it is widely assumed that paywalled research papers, as well as open-access ones, formed part of the billions of web pages that models were trained on.

Training data

To train Llama, the lawsuit alleges that Meta used the Common Crawl data set, a sample of billions of web pages made by trawling the Internet, which the plaintiffs say is likely to have included unauthorized copies of copyrighted works, such as scientific abstracts and paywalled papers.

The publishers also allege that Meta downloaded and torrented (sourced using a file-sharing method) works from sites including LibGen, a database of books, research papers and textbooks; and Sci-Hub, a repository that gives free access to millions of research articles and books regardless of copyright. Both sites have been the subject of legal challenges. Much of the evidence relies on e-mails between Meta employees that were revealed during a separate case in which several book authors sued Meta last year (Kadrey v. Meta).

Has your paper been used to train an AI model? Almost certainly

Meta has suggested that it will argue that training on copyrighted documents constitutes ‘fair use’, a copyright exemption in US law. “AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use,” its spokesperson said.

Source link

Green Party leader faces questions over council tax on London houseboat

Bosnia’s powerful peace envoy quits, with questions over role’s future

Is the USC vs. Notre Dame rivalry back? Negotiations have resumed

Suspect in WHCA dinner attack pleads not guilty to attempting to assassinate Trump

Discord adds a free Xbox Game Pass ‘starter edition’ for Nitro subscribers

Lessons for Democrats from a candidate who sings and shoots

Here’s why analysts say XRP price is ready for a ‘full-scale rally’ to $2

Can Republicans survive the midterms of Rage?

Fox Won’t Bring Back ‘So You Think You Can Dance’

Sony strikes almost $4bn deal to buy Blackstone’s portfolio of music rights

first science publisher sues over scraped research papers

Training data

Leave a Reply Cancel reply

Training data

Leave a Reply Cancel reply

Related News