The Internet Archive just lost its appeal over ebook lending

fossilesque@mander.xyz · edit-2 3 months ago

The Internet Archive just lost its appeal over ebook lending

DrCake@lemmy.world · 3 months ago

So when’s the ruling against OpenAI and the like using the same copyrighted material to train their models

irotsoma@lemmy.world · 3 months ago

But OpenAI not being allowed to use the content for free means they are being prevented from making a profit, whereas the Internet Archive is giving away the stuff for free and taking away the right of the authors to profit. /s

Disclaimer: this is the argument that OpenAI is using currently, not my opinion.

norimee@lemmy.world · edit-2 3 months ago

Ah, I see you got that all wrong.

Open IA AI uses that content to generate billions in profit on the backs of The People. The Internet Archive just does it for the good of The People.

We can’t have that. “Good for The People” is not how the economy works, pal. We need profit and exploitation for the world to work…

RogueBanana@lemmy.zip · 3 months ago

“Good for the people”? You mean COMMUNISM?

finitebanjo@lemmy.world · 3 months ago

I think you accidentally swapped OpenAI and Open IA which happens to initialize Internet Archive, a little confusing.

norimee@lemmy.world · 3 months ago

I didn’t even realise. Thank you for pointing it out, I fixed it.

v_krishna@lemmy.ml · 3 months ago

OpenAI is burning billions of dollars not making profit.

Agret@lemmy.world · 3 months ago

Sounds like they are operating the same as all the other big tech companies then

ShaggySnacks@lemmy.myserv.one · 3 months ago

Burn a ton a cash to become the only major player in the market and the proceed to enshitify as no one else has anywhere to go.

buddascrayon@lemmy.world · 3 months ago

Wrong

https://futurism.com/the-byte/openai-copyrighted-material-parliament

Anyolduser@lemmynsfw.com · 3 months ago

Hot on the heels of this one, I’d imagine.

iAmTheTot@sh.itjust.works · 3 months ago

Fat chance. Line must go up.

shrugs@lemmy.world · 3 months ago

So, let’s say we create an llm that will be fed will all the copyrighted data and we design it, so that it recalls the originals when asked?! Does that count as piracy or as the kind of legal shananigans openai is doing?

wizblizz@lemmy.world · 3 months ago

Aaaaaany minute now.

PriorityMotif@lemmy.world · 3 months ago

It’s two different things happening. One is redistribution, which isn’t allowed and the other is fair use, which is allowed. You can’t ban someone from writing a detailed synopsis of your book. That’s all an llm is doing. It’s no different than a human reading the material and then using that to write something similar.

xthexder@l.sw0.com · edit-2 3 months ago

the other is fair use

That’s very much up for debate still.

(I am personally still undecided)

PriorityMotif@lemmy.world · 3 months ago

The difference is that the llm has the ability to consume and remember all available information whereas a human would have difficulty remembering everything in detail. We still see humans unintentionally remaking things they’ve heard before. Comedians have unintentionally stolen jokes they’ve heard. Every songwriter has unintentionally “discovered” a catchy tune which is actually someone else’s. We have fanfiction and parody. Most people’s personalities are just an amalgamation of everyone and everything they’ve ever seen, not unlike an llm themselves.

xthexder@l.sw0.com · 3 months ago

I agree with you for the most part, but when the “person” in charge of the LLM is a big corporation, it just exaggerates many of the issues we have with current copyright law. All the current lawsuits going around signal to me that society as a whole is not so happy with how it’s being used, regardless of how it fits in to current law.

AI is causing humanity to have to answer a lot of questions most people have been ignoring since the dawn of philosophy. Personally I find it rather concerning how blurry some lines are getting, and I’ve already had to reevaluate how I think about certain things, like what moral responsibilities we’ll have when AIs truely start to become sentient. Is turning them off and deleting them a form of murder? Maybe…

trafficnab@lemmy.ca · 3 months ago

OpenAI losing their case is how we ensure that the only people who can legally be in charge of an LLM are massive corporations with enough money to license sufficient source material for training, so I’m forced to begrudgingly take their side here

greenskye@lemm.ee · 3 months ago

Agreed. I keep waffling on my feelings about it. It definitely doesn’t feel like our laws properly handle the scale that LLMs can take advantage of ‘fair use’. It also feels like yet another way to centralize and consolidate wealth, this time not money, but rather art and literary wealth in the hands of a few.

I already see artists that used to get commissions now replaced by endless AI pictures generated via a Lora specifically aping their style. If it was a human copying you, they’d still be limited by the amount they could produce. But an AI can spit out millions of images all in the style you perfected. Which feels wrong.

Gsus4@mander.xyz · edit-2 3 months ago

The matter is not LLMs reproducing what they have learned, it is that they didn’t pay for the books they read, like people are supposed to do legally.

This is not about free use, this is about free access, which at the scale of an individual reading books is marketed as “piracy”…at the scale of reading all books known to man…it’s onmipiracy?

We need some kind of deal where commercial LLMs have to pay a rent to a fund that distributes that among creators or remain nonprofit, which is never gonnna happen, because it’ll be a bummer for all the grifters rushing into that industry.

barsoap@lemm.ee · 3 months ago

it is that they didn’t pay for the books they read, like people are supposed to do legally.

If I can read a book from a library, why shouldn’t OpenAI or anybody else?

…but yes from what I’ve heard they (or whoever, don’t remember) actually trained on libgen. OpenAI can be scummy without the general process of feeding AI books you only have read access to being scummy.

General_Effort@lemmy.world · 3 months ago

Meta is defending because they trained on books3 which contained all of Bibliotik. https://en.wikipedia.org/wiki/The_Pile_(dataset)

Gsus4@mander.xyz · edit-2 3 months ago

This is not like reading a book from a library…unless you want to force the LLM to only train one book per day and keep no copies after that day.

barsoap@lemm.ee · 3 months ago

They don’t keep copies and learning speed? Why one day? Does it count if I skim through a book?

Gsus4@mander.xyz · 3 months ago

deleted by creator

PriorityMotif@lemmy.world · 3 months ago

I think we need to re-examine what copyright should be. There’s nothing inherently immoral about “piracy” when the original creator gets almost nothing for their work after the initial release.

index@sh.itjust.works · 3 months ago

stop asking questions and go back to work

MigratingtoLemmy@lemmy.world · 3 months ago

If OpenAI can get away with going through copy-righted material, then the answer to piracy is simple: round up a bunch of talented Devs from the internet who are writing and training AI models, and let’s make a fantastic model trained on what the internet archive has. Tell you what, let Mistral’s engineers lead that charge, and put an AGPL license on the project so that companies can’t fuck us over.

I refuse to believe that nobody has thought of this yet

bandwidthcrisis@lemmy.world · 3 months ago

An AI trained on old Internet material would be like a synthetic Grandpa Simpson:

“In my day we said ‘all your base’ and laughed all day long, because it took all day to download the video.”

Ragnarok314159@sopuli.xyz · 3 months ago

This stupid thing just keeps saying “I can Haz Cheeseburger”. What the hell does that even mean?

General_Effort@lemmy.world · 3 months ago

What do you think Mistral trains its models on? Public domain stuff?

werefreeatlast@lemmy.world · 3 months ago

Better yet! Train an AI to re-write the books into brand new books and let us read, review the content, add notes etc so that the AI can refresh the books if we find errors.

Kick the private collections to the curb! Teeth in like in American History X.

Dkarma@lemmy.world · 3 months ago

“AI write Hamlet” AI writes Idiocracy.

capital@lemmy.world · 3 months ago

We get it, y’all hate LLMs and the companies who make them.

This comparison is disingenuous and I have to think you’re smart enough to know that, making this disinformation.

If/when an LLM like ChatGPT spits out a full copy of training text, that’s considered a bug and is remediated fairly quickly. It’s not a feature.

What IA was doing was sharing the full text as a feature.

As far as I know, there are some court cases pending regarding determining if companies like Open AI are guilty of copyright infringement but I haven’t seen any convictions yet (happy to be corrected here).

All that said, I love IA and have a Warrior container scheduled to run nightly to help contribute.

MigratingtoLemmy@lemmy.world · 3 months ago

Hmm, true. IA wouldn’t be as supported if we couldn’t get the full text of the source.

Can you tell me more about the “warrior container”?

capital@lemmy.world · 3 months ago

It’s mentioned in the OP but it’s this:

https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior

Basically, distributed collection.

dan@upvote.au · edit-2 3 months ago

have a Warrior container

This is an ArchiveTeam project, which is a totally separate effort to the Internet Archive. As far as I know, they’re not related other than the fact that ArchiveTeam use The Internet Archive for storage.

capital@lemmy.world · 3 months ago

Ahh my mistake.

Might be time to financially contribute to IA.

JackGreenEarth@lemm.ee · 3 months ago

Another sad day for pro-preservation advocates

/home/pineapplelover@lemm.ee · 3 months ago

A sad day for intellects

masterspace@lemmy.ca · 3 months ago

Fuck Copyright.

A system for distributing information and rewarding it’s creators should not be one based on scarcity, given that it costs nothing to copy and distribute information.

snooggums@midwest.social · 3 months ago

It was fine when the limited duration was a reasonable number of years. Anything over 30 years max before being in the public domain is too long.

Tilgare@lemmy.world · 3 months ago

Thanks, Disney.

KingJalopy @lemm.ee · 3 months ago

Things I’ve never heard said before

Klear@sh.itjust.works · 3 months ago

Huh. That made me realise I probably never heard or read “Fuck you, Obama”. Don’t live in the USA though.

masterspace@lemmy.ca · edit-2 3 months ago

That was fine then, but it makes zero sense today.

If a book is on sale widely to the public, and it costs nothing to copy and distribute that book to everyone, why shouldn’t we?

The fundamental problem with copyright is it is a system that rewards creators by imposing artificial scarcity where there is no need for one. Capitalism is a system designed around things having value when they’re scarce, but information in a world of computers and the internet is inherently unscarce the instant it’s digitized. Copyright just means that we build all these giant DRM systems to impose scarcity on something that doesn’t need it so that we can still get creators paid a living.

But a better system would for paying creators would be one of attribution and reward, where everyone can read whatever they want or stream whatever they want, and artists would be paid based on their number of views.

snooggums@midwest.social · 3 months ago

But a better system would for paying creators would be one of attribution and reward, where everyone can read whatever they want or stream whatever they want, and artists would be paid based on their number of views.

Which would be enforced through copyright…

masterspace@lemmy.ca · edit-2 3 months ago

If you’re referring to copyright as the actual effective title as owner of the works then yes. If you’re referring to copyright as in our system if copyright == monopoly, then no.

Saik0@lemmy.saik0.com · 3 months ago

So if I own it… as the sole writer of some work. But don’t have a monopoly over how it’s used…

What the fuck logic is that? Can you care to explain how I, as the owner of the work cannot impose whatever limits I want to it?

masterspace@lemmy.ca · edit-2 3 months ago

This involves trying to imagine a system other than the one we currently use.

The concept of exclusive ownership makes sense for material goods because if I have an object, you cannot have that object. If I want a copy of that object, it takes the same amount of resources as it took to make the original object. It’s a fundamental property of matter and energy, but information does not have the same properties. Information can be stored infinitely smally, and replicated for virtually nothing, as many times as you want.

In the digital age, where every single person now has an incredibly powerful information processing machine that is networked to every single other one, it means that once information is digitized, it costs us virtually nothing to distribute it to everyone on earth who wants it.

Copyright only exists, because once we started to be able to do this with early technologies like the printing press, vinyls, VHS, etc, it showed that you could rapidly drive the value of that work down to zero dollars, because in capitalism, thing only have value if they are scarce. Air is a necessity for everyone to live but it costs nothing because it’s all around us. It suddenly gets valuable in places where it’s scarce, but as long as it’s abundant, it has no value according capitalism. So continuing to allow the free copying of works meant that the original creators would never get rewarded. This made some sense at a time when it took months and a ton of resources to chop down trees, make paper, print a book, and ship it across the world and then get a response back regarding it.

But now, in the digital age, we have all the tools we need to build a middle man free service that would allow everyone to watch or read anything, and reward the creators based on how much their works are used or viewed or remixed. It’s basically how music streaming services and the behind the scenes remix/sampling licensing deals work already, they just have a ton of corporate middle men taking profits at every step.

In print media, advertising driven models are hamfisted work arounds that do the same thing of providing the information to everyone, but again, with middle men that fuck the authors and ruin the experience for readers.

Spotify, Apple Music, etc could all still exist, they’d just all have access to the same content catalog and you’d be picking and paying solely based on the quality of the interface and service they provide.

It’s also not a crazy idea that once you create an idea you don’t get to exclusively own it. For the vast majority of human history, copyright did not exist, and the only way that stories and songs and ideas were passed on was through chains of people copying and retelling them.

Saik0@lemmy.saik0.com · 3 months ago

This involves trying to imagine a system other than the one we currently use.

No it doesn’t. Just because the work I created was done in paint or word doesn’t make it any less mine. Just because I could distribute it freely doesn’t make me obligated to. I am justified in asking for compensation and proposing limits on how it’s shared.

This is no different to printing the physical version of these works. I could print 10 copies of the book and tell my friends they cannot distribute it. Just the same I could send them an email with the works and say the same thing.

There is no difference here.

But now, in the digital age, we have all the tools we need to build a middle man free service that would allow everyone to watch or read anything, and reward the creators based on how much their works are used or viewed or remixed.

This has no logical basis in your response though. You’re saying that creators of works would have no say in how much a digital work is copied/transferred. How do you prove how much a work is even used/viewed? That would require heaps and loads of DRM management and to go after those who circumvent those measures… which takes money/infrastructure… and GASP That’s exactly what the publishers are doing now! Look at that!

Doomsider@lemmy.world · edit-2 3 months ago

Sure, you don’t actually own it. The words you strung together are not actually yours nor is the grammar you strung it together with. The knowledge you used to create it is also not yours.

The only way to ensure no one reads, borrows, or “steals” your work is to never share it with anyone and certainly never put it on the Internet.

The only way to ensure it is truly yours is to never have participated in society, invent your own language, and of course hide it from ever being discovered.

This is the only real way. You need to create in a vacuum and lock it up so no one will ever find it. Then and only then can it truly be yours.

Fuzzy_Red_Panda@lemm.ee · 3 months ago

Yeah. In a better world where the US court system doesn’t get weaponized and rulings aren’t delayed for years or decades, I would argue 8 to 15 years is the reasonable number, depending on the type of information being copyrighted.

masterspace@lemmy.ca · 3 months ago

deleted by creator

fpslem@lemmy.world · 3 months ago

Not a surprise, but still somehow crushing. It’s a loss for us all.

HexesofVexes@lemmy.world · 3 months ago

Ah, I see we’re burning the Library of Alexandria again… Just as with last time, the survival of texts will rely upon copies.

Stern@lemmy.world · 3 months ago

Oh sure I want to read copyright books it’s an issue, but OpenAI does it and it’s vital to their business so they can keep going.

yetAnotherUser@lemmy.ca · 3 months ago

We live in a capitalist society. You can do whatever you want as long as you have money or promise lots of money to powerful people.

MellowYellow13@lemmy.world · 3 months ago

Still doesnt make any sense whatsoever

drislands@lemmy.world · edit-2 3 months ago

My understanding is that the IA had implemented a digital library, where they had (whether paid or not) some number of licenses for a selection of books. This implementation had DRM of some variety that meant you could only read the book while it was checked out. In theory, this means if the IA has 10 licenses of a book, only 10 people have a usable copy they borrowed from the IA at a time.

And then the IA disabled the DRM system, somehow, and started limitlessly lending the books they had copies of to anyone that asked.

I definitely don’t like the obnoxious copyright system in the USA, but what the IA did seems obviously ~~wrong~~ against the agreement they entered into. Like if your local library got a copy of Book X and then when someone wanted to borrow it they just copied it right there and let you keep the copy.

ETA: updated my wording. I don’t believe what the IA did was morally wrong, per se, but rather against the agreement I presume they entered into with the owners of the books they lent.

MrScottyTay@sh.itjust.works · 3 months ago

They disabled drm during lockdown so people had something to do

accideath@lemmy.world · 3 months ago

Which was nice of them, but that doesn’t mean they should’ve done that, especially in the eyes of the law. (Also, if you’re after free ebooks, why are you pirating them on archive.org instead of libgen?)

CondensedPossum@lemmy.world · 3 months ago

Removed by mod

accideath@lemmy.world · 3 months ago

Where did I say that find it good that they got sued or lost their appeal? I just said that the reason why they lost the appeal is because according to the law they’re bound to, what they did was wrong. And maybe they should’ve left that to a platform that enjoys a little more immunity from said law, because there are plenty of those. It was stupid of them. They painted an unnecessary target on their back that doesn’t help their cause and I‘d prefer them not to have to shut down at some point because I’m all for the Internet archive archiving anything and everything. They should’ve stayed a legitimate library and everything would have been fine and would have served their cause sufficiently well.

CondensedPossum@lemmy.world · 3 months ago

Removed by mod

accideath@lemmy.world · 3 months ago

Ah, so you‘re one of those people that would be well at home at lemmygrad. And what fate are you talking about? Not getting sued?

huiccewudu@lemmy.ca · edit-2 3 months ago

I definitely don’t like the obnoxious copyright system in the USA, but what the IA did seems obviously wrong.

The publisher-plaintiffs did not prove the “obvious wrong” in this case, however US-based courts have a curious standard when it comes to the application of Fair Use doctrine. This case ultimately rested on the fourth, most significantly-weighted Fair Use standard in US-based courts: whether IA’s digital lending harmed publisher sales during the 3-month period of unlimited digital lending.

Unfortunately, when it comes to this standard, the publisher-plaintiffs are not required to prove harm, rather only assert that harm has occurred. If they were required to prove harm they’d have to reveal sales figures for the 27 works under consideration–publishers will do anything to conceal this information and US-based courts defer to them. Therefore, IA was required to prove a negative claim–that digital lending did not hurt sales–without access to the empirical data (which in other legal contexts is shared during the discovery phase) required to prove this claim. IA offered the next best argument (see pp. 44-62 of the case document to check for yourself), but the data was deemed insufficient by the court.

In other words, on the most important test of Fair Use doctrine, which this entire case ultimately pivoted upon, IA was expected to defend itself with one arm tied behind its back. That’s not ‘fair’ and the publishers did not prove ‘obvious’ harm, but the US-based courts are increasingly uninterested in these things.

edited: page numbers on linked court document.

azuth@sh.itjust.works · 3 months ago

The decision is that even lending out ebooks against owned copies is illegal

What the IA may be illegal but is certainly not wrong.

finitebanjo@lemmy.world · 3 months ago

Wrong? No.

Against the terms of agreements they made? Yes.

Actions also protected by laws exempting nonprofits and archives from copyright restrictions? Also supposed to be yes.

drislands@lemmy.world · 3 months ago

Against the terms of agreements they made? Yes.

To be fair, this is what I meant when I said wrong. Enough people have taken umbrage with my wording that I think I should update it, though. Thank you for your reply.

CondensedPossum@lemmy.world · 3 months ago

Removed by mod

Allero@lemmy.today · 3 months ago

Completely useless rageposting on your side.

Also, the original commenter corrected it later to clarify.

eskimofry@lemmy.world · 3 months ago

Like if your local library got a copy of Book X and then when someone wanted to borrow it they just copied it right there and let you keep the copy.

That’s how it works in the rest of the world.

dave@feddit.uk · 3 months ago

What part of the rest of the world are you in?

MonkderVierte@lemmy.ml · 3 months ago

Some university library probably.

TheGrandNagus@lemmy.world · 3 months ago

No it isn’t.

bitwolf@lemmy.one · 3 months ago

Easy solution. Update the web-scraper they use to include an LLM. Then its for “training”

xenoclast@lemmy.world · 3 months ago

As long as they have a tech billionaire in charge they should be fine.

They could also rename the project to: “The AI Archive” and add lots of buttons with multicolor gradients.

bitwaba@lemmy.world · 3 months ago

Need to give it a quirky name.

The AIkive

bamfic@lemmy.world · 3 months ago

Libgen.rs

fossilesque@mander.xyz · 3 months ago

Direct link to the court document: https://storage.courtlistener.com/recap/gov.uscourts.ca2.60988/gov.uscourts.ca2.60988.306.1.pdf

✺roguetrick✺@lemmy.world · 3 months ago

Side note: court listener’s RECAP is often quite disliked by the legal system. They do not like it when people put stuff from PACER fee waved sources on there like Aaron Schwartz did. https://en.m.wikipedia.org/wiki/Free_Law_Project

NotAnotherLemmyUser@lemmy.world · 3 months ago

Woah, I wish I had known about this sooner. Thanks!

ZILtoid1991@lemmy.world · 3 months ago

They need to rename themselves “Intelligent Archive” then claim they’re an AI service that can just happen to regenerate whole books.

Aatube@kbin.melroy.org · 3 months ago

Really unfortunate. I wonder why nobody foresaw this when they started the stupid NEL thing.

Edit: NEL is the thing where the Archive removed all borrowing restrictions except 10 books per account and some sort of basic verification that you were in the US

AlexWIWA@lemmy.ml · 3 months ago

Yeah they flew too close to the sun

sircac@lemmy.world · 3 months ago

But I’m training my organic LLM, can’t I?

Grass@sh.itjust.works · 3 months ago

what does warrior do? The git readme seems to just be setup instructitons

zzx@lemmy.world · 3 months ago

I had the same question. Here’s the answer:

The Archive Team Warrior is a virtual archiving appliance. You can run it to help with the Archive Team archiving efforts. It will download sites and upload them to our archive—and it’s really easy to do!

The warrior is a container running inside a virtual machine, so there is almost no security risk to your computer. (“Almost”, because in practice nothing is 100% secure.) The warrior will only use your bandwidth and some of your disk space, as well as some of your CPU and memory. It will get tasks from and report progress to the Tracker.

fossilesque@mander.xyz · edit-2 3 months ago

click wiki link in readme: https://wiki.archiveteam.org/index.php?title=ArchiveTeam_Warrior

antonim@lemmy.dbzer0.com · 3 months ago

Yeah I’m wondering as well. It seems to save webpages, whereas the issue is with scanned books which may be removed from IA…

Parabola@lemmy.world · 3 months ago

If only the readme clearly said what it was with a link you could click…

Grass@sh.itjust.works · 3 months ago

somehow I didn’t see anything above getting started. Looking again I don’t know how I missed it with the big logos unless they didn’t load and the rest was behind a notification or something.

Allero@lemmy.today · 3 months ago

Just give the link if you have one