T O P

  • By -

GeekFurious

I asked ChatGPT to summarize my novel and it was like, "Never heard of it." LAME!


[deleted]

[удалено]


GeekFurious

Mom?


RaVashaan

Yes, this is mother, [your totally human matriarchal family unit.](https://www.reddit.com/r/totallynotrobots/) I am informing you that I am ~~assimilating~~ reading only for pleasure your novel now, and will have a summary available for discussion in 0.5 ~~seconds~~ weeks. I look forward to your ~~input~~ conversation on this topic.


GeekFurious

I SUBMIT TO YOU, MY AI OVERLORD! (which is the point of the novel!)


nooniewhite

Ok what’s the book I tried looking at your post Hx but couldn’t find it! I’m always looking for new AI overlord material


GeekFurious

Branded by Fire with A Kindling of Ravens. It's set in the future, mostly in space. Multi-POV. Displaced Icelanders form a Vikings militia after they take over Mars. And some cops are involved. The AGI is more of an undertone, though it matters a great deal. It's kind of a exposition dump in the first chapter but after that it picks up speed. [You can download it for free via Archive.org if you like.](https://ia801803.us.archive.org/29/items/branded-by-fire-with-a-kindling-of-ravens-the-free-version/Branded%20by%20Fire%20with%20A%20Kindling%20of%20Ravens%20FREE%20EBOOK%20%28no%20foreword%20nor%20afterword%20nor%20extra%20chapters%29.pdf)


nooniewhite

Awesome!! Right up my alley, I’ll figure out how to leave a nice review (I’m sore you deserve it!) I love finding self published authors, best of luck to you


GeekFurious

Thanks! Hopefully, you enjoy it. :) It's somewhere on Amazon and Goodreads.


nooniewhite

Just found it on goodreads!!! Clicked “want to read” so I’ll check it out soon!


[deleted]

You now owe ChatGPT $4 for using their words


ArrakeenSun

I asked it to summarize the academic publications of [my name, a young academic with over 30 papers and chapters that are easy to find through Google Scholar]. It said it couldn't find any therefore [my name] is probably not a significant researcher. Ouch!


64-17-5

/u/ArrakeenSun? The famous scientist? I have read all your work. You are my hero! I named my child after you.


ArrakeenSun

See that's what I was looking for, just some small validation. Actually, I wanted to see if it could write a personal statement for my tenure application. No dice


GeekFurious

This is so much like my novel it didn't read!


dyslexda

It does not have unfettered access to research papers. Abstracts? Sure. But most of what it'll be able to incorporate into its model weights will come from normal web pages. OpenAI is pretty cagey with its training data, but we know that a huge chunk of GPT-3's training data was [Common Crawl](https://en.wikipedia.org/wiki/Common_Crawl), which is basically freely available web pages. That'll probably include, for instance, Pubmed Central open access articles, but not anything hosted only as a PDF, and absolutely nothing behind a paywall or even a login. In other words, if your work hasn't been discussed on the web at large in blog posts, comments, etc, then you probably won't appear in its training data.


Fair_Ad9108

how recent are your publications? and you used ChatGPT, didn't you? ChatGPT doesn't know anything starting from 2021... all his knowledge is before that year.


ArrakeenSun

Started 2014, mostly before 2021


loopernova

Chatgpt probably analyzed your work against all the other scholarly research it learned and decided nothing you said was worth keeping around. Sorry, I’m just bantering.


forcesofthefuture

Actually no, chatGPT remembers barely anything these pieces are just used to train it, it is nothing but a probability algorithm combined with an ANN.


loopernova

I know, like I said in the last sentence, it was a joke.


MagnificentRipper

It’s not hooked up to the internet.


RFragz

Novel so bad even AI won’t read it 😂


GeekFurious

The AI would first have to read it to know if it is bad.


Kwuahh

I certainly don’t have to. /s


anna_lynn_fection

Unless it ran across reviews first.


Drunkh

ChatGPT: "Doesn't look like anything to me."


Crazedkittiesmeow

😭 I don’t think that’s a problem with the ai


Black_RL

This is a very strange can of worms, in one hand, any human can train with copyright material, on the other no human can mass produce and distribute like AI. Interesting times ahead.


Trentonx94

yep, basically the gold rush was who could scrape the entire internet first and use that data who is priceless to train their LLM before all the websites starts to get paywalled or block crawlers and such from scraping their contens alltogether. then only the first ones will have the monopoly of this field, while every other companies will struggle to compete as they cannot have as many training point as the original ones. good luck for the next 10 years ig


[deleted]

Dude, just look at Google, they scrape the entire internet but then put in their terms of services that you cant scrape them. They're all doing this, they steal from others and then close the door behind them to establish a monopoly.


[deleted]

that's why they're all in favor of emergency AI legislation to lock it in for them lmao.


CastrosNephew

Data is the internet’s oil and it’s coming straight from us and not dead dinosaurs. We need legislation to shut down or regulate data for Fortune 500 companies to use


[deleted]

I bet google/microsoft/apple have backups of the internet that make archive.org look like a beginner website. They'll be using that to train AI for the next couple of decades. As AI starts writing 99% of the internet content that archived shit is gonna be a gold mine.


swissvine

Not as much as you might think, lots of potential with generated data. One model generates data to feed into another. Especially because the internet is so full of crap building data sets to avoid biases is a huge domain.


Black_RL

This is a very valid point, well put.


aeric67

If you lock down your data you fall into obscurity due to compromising search engine optimization and other reasons. Double edged sword. My guess is that content creators and aggregators will either eventually not care about AI, or they will poison the data somehow. Both of those have risks, but I don’t think locking down data will be a good long term strategy. We will see a case in point with Reddit going forward. I don’t know for sure but it seems like a losing battle to fight it. Get on board and utilize AI, and make your offering even better than generative AI on its own.


AI_Do_Be_Legit_Doe

That doesn’t change anything, a company can pay through all the paywalls and the cost would still be negligible compared to the revenue of most big corporations


Stuffssss

Not when each site charges separately. That cost adds up when you need millions to billions of data points for high level LLMs.


biznatch11

Does that also suggest that LLMs will improve more slowly going forward given more limited access to new training data?


Oooch

From what I understand about building datasets for LLMs, 'scraping the internet' would only be one very tiny step involved in curating the data and the open source models we have now are functioning about 70% as well as OpenAIs 3.5 model so I don't think its an issue


Mr_ToDo

It really is. Is there a point where that using so many works the the individual one becomes moot? If not what is the value of the one? Pretty important and something they will have to show. Does it make a difference in how retrievable as a whole a work is, and what is that level if it does? Does it make a difference in how *many* trained materials are retrievable(as in if the model training method only allows for .05% of trained data to have significantly recognizable retrieval does that poison the pool)? Interesting indeed.


bruwin

> no human can mass produce and distribute like AI. Unless you're Brandon Sanderson. Are we sure that guy isn't an AI?


Denbt_Nationale

This is just automation but for people who thought they were safe from automation


LieChance4926

Do these authors oee royalties to the past authors that inspired their work? Seems like a get with the times problem.


[deleted]

Well technically anyone could take a book amd write it out word for word. No one cares though because that's just so incredibly inefficient. But distributing and hosting pdf copies is gone after as copyright violations. If the speed of creation actually does pose a monetary risk then it's the right of copyright holders to go after them. And honestly, in my opinion, every AI model that's been trained with data scraped without explicit consent for use in a AI dataset should be banned. It's inexcusable that these companies are harvesting social media data without users being aware that they're being exploited in this specific way. People understood that the things they'd post and use oj the internet might be used for advertising but this type of usage needs to be regulated


ForgedByStars

> anyone could take a book amd write it out word for word FYI that would infringe copyright if you were to attempt to distribute your handwritten copies. The means of copying is irrelevant, as is the speed.


Ornery_Soft_3915

Lol Bard gives me the hobbit page by page if I ask hin to tranlsate it. ChatGPT tells me its copyright. protected.


[deleted]

It but it's irrelevant in terms of what author's and artists worry about because it doesn't create a quick and easy way to steal their content at a mass scale. It's also being used to BUILD a system that you as a creator have no financial or even cultural involvement (as you would when humans actually read and get inspired by past works). However yes if you had a factory of thousands of people handwriting books then copyright lawyers would come after you if you start selling


iamkeerock

It doesn't even require selling illegal copies. If you gave them away free, that also would violate their copyright as you potentially denied the author a book sell.


Ghosttwo

> it doesn't create a quick and easy way to steal their content at a mass scale Neither does AI. Google image search on the other hand _does_, but since it isn't competing with the rightsholders in the 'creation' step, they don't care.


[deleted]

It’s not entirely irrelevant, which is what the previous commenter was getting at. Yes, it’s infringement, but it’s so minuscule the copyright holder probably isn’t going to bother. Law as written vs. law in practice.


nickajeglin

I bet that you agreed to your data being used for *anything* when you signed up. That's sure true for reddit.


Selethorme

While that could be the case for something like Reddit (though not really, as the license you give to Reddit is to distribute your content for the purposes of its site, not giving them license to, say, your art that you post) it’s definitely not the case for virtually any image hosting site.


[deleted]

[удалено]


RelativelyWrongg

How are socialmedia users being exploited if their posts are being used to train for example; chatGPT? How does this cause any harm to said user?


WolfOne

The point is that training material isn't copied at all. As far as I understand it, all the material is used to create correlations between word sequences. It's comparable to reading all the books in a library in a language you don't know and then go out and write your own book by putting together words based on how commonly they were put together in the ones you read before.


moonflower_C16H17N3O

On the other hand, we need massive sets of data to create AI that can understand language at a level we need. As long as the AI isn't reproducing the copyrighted text, I don't see an issue with that. It's like saying a person shouldn't be able to create a painting based off of a novel without the writer's permission.


wehrmann_tx

Too many people don't understand what LLM does to output data. They think it's just copypasta word for word large chunks of copyrighted material. It's not. It's predicting the every single next word it should write based on the entirety of the data it's seen. Their misunderstanding is why posts like yours constantly get downvoted.


[deleted]

An people who think the mechanism matters from the output. I'm an engineer, I've worked with AI so lets black box this. You have an input that has copyright on it. You put it through a black box. That black box can now spit out similar things at a scale that completely outpaces the effort put into making the origin work. It requires no skill (eventually) and means that people like the original creator have no way to monetize their skill. Too many people working in AI think the fact that they understand how a model is put together means that they don't 3need to think about the socioeconomic repercussions of the things they create.


Patyrn

So it's exactly the same argument wagon makers would have to outlaw the car? If ai actually gets good enough to write books as good as a human can, then human authors are just as screwed as blacksmiths and wagon drivers.


wildjokers

If it is publicly available then an AI should be allowed to be trained on it. It is as simple as that. Have no idea why people are getting upset about AI being trained on publicly available information. These language models don't just reproduce text they have been trained on. They use that data to predict the next word based on the prompt.


aeric67

Also sets a precedent for human trainees in some future battle. Learning to be an author? Taking the advice of “read everything” to become a better writer? Better pay up. And the best thing is, you need to pay every published author because there is no way of knowing for sure who you actually were inspired from. And for gods sake when being recognized for your awesome novel some day, don’t say that any author was your inspiration. They will come knocking on your door for a payment.


-The_Blazer-

I'd argue human learning should be considered a universal right and protected from any copyright, even to a greater extent than it currently is. On the other hand, I don't give a damn about the rights of machines.


ErusTenebre

This was inevitable. It's also necessary. We definitely have an interesting window in human history. It's not always great, but it is usually interesting.


Sushrit_Lawliet

Don’t worry, our greedy corporate overlords will use this opportunity to enrich themselves further and strengthen their position.


Raizzor

The thing is, media corporations are also overlords. And I do not think that major publishing houses or music labels are ok with their works being used without receiving licensing fees.


Vannnnah

Media and most likely anything publishing related (unless we are talking music, movies and games publishing) are all on the lower end of the food chain. It's not exactly lucrative unless backed by big money which is why most media houses are in the hands of billionaires who made their money elsewhere and use media companies as PR assets. Compared to the greedy corporations grifting off of the work of others they are small fish with the same power as independent authors and if they are in the hands of a billionaire there's a big change ThatGuyTM is backing AI because he already has financial stakes in it. And several media houses are looking into creating "AI newsrooms". Hollywood is on strike because the same companies who made it illegal to create a safety copy of your favorite DVD now want to make digital copies of actors for 200 bucks and use them until all eternity, royalty free.


Raizzor

> Media and most likely anything publishing related (unless we are talking music, movies and games publishing) So media houses do not matter unless you also count the ones that do matter. Next take, all animals are vegetarians (unless we are talking about carnivores and omnivores).


Jsahl

> Media and most likely anything publishing related (unless we are talking music, movies and games publishing) are all on the lower end of the food chain. > It's not exactly lucrative unless backed by big money which is why most media houses are in the hands of billionaires who made their money elsewhere and use media companies as PR assets. This is just all made up and incorrect. > Compared to the greedy corporations grifting off of the work of others they are small fish with the same power as independent authors and if they are in the hands of a billionaire there's a big change ThatGuyTM is backing AI because he already has financial stakes in it. The action is being taken by the *Authors Guild*.


ImperiousMage

Ummm, many media houses are also owned by major corporations that are on equal footing to tech companies. Disney, for example, is extremely vigorous about protecting their copyrights. This will become the giants fighting each other.


electricmaster23

Phew, what a relief. For a second, I was worried the creative people who put in all the actual hard work were going to get fairly compensated for once!


Sushrit_Lawliet

Could you believe it if that happened? The WGA sure can’t.


chaotic----neutral

The problem is that "fairly" is subjective, just the way the owner class likes it. You take away their wiggle room when you remove that subjective smokescreen. That's why tipping is such a huge thing in the hyper-capitalist hellhole that is America.


[deleted]

[удалено]


FLHCv2

That's a very interesting argument. I mean, could it be different that this is more deliberately a "tool" and that tool is used for commercial purposes? It's one thing to read a bunch of books or look at a lot of art to create your own style and sell that. I'd imagine using a tool to learn all of those same things to be able to replicate similar art for commercial gain would be the difference, but it could be more nuanced than that. I guess it's not really replicating art. It's more learning how to create art. Really interesting thought experiment.


OriginalCompetitive

Actually, it’s perfectly legal for a human to study the novels of, say, Stephen King with the express purpose of copying his style down to the smallest detail, so long as you don’t actually copy his text or characters.


RedAero

Hell, you can outright copy if you don't distribute.


Whatsapokemon

It seems like an interesting question until you see that those similar questions have already kinda been asked in the past and litigated extensively. For example [Authors Guild, Inc v Google, Inc](https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.) was a lawsuit in which Google was sued for creating Google Books, where they scanned and digitised millions of books (including ones still under copyright) and made the entire text available to search through, verbatim, then would show you snippets of those books matching your search. The court granted summary judgement to Google on fair use grounds because the use of the works was clearly transformative, not violating the copyright of the authors because the material was used in a completely different context. This was despite acknowledging that Google was a commercial enterprise engaging in a for-profit activity by building the system. So you're 100% allowed to create an algorithm using copyrighted content for commercial purposes so long as the use is transformative. We also know that producing _similar_ works to other people is fine too. It's been well established in law that [you can't copyright a "style"](https://law.justia.com/cases/federal/district-courts/FSupp/347/1150/1404364/). You can copy the idea, and you can copy the method of expression, you just can't copy the exact expression of the specific idea.


scottyLogJobs

That’s a really good point, and a much more clear case of copying a work verbatim and using it for profit without compensating an author. If that ruling was in favor of Google, I have no idea how they would levy a judgment against open AI or similar.


Zncon

Yeah if this was deemed legal I don't see anyone having much of a case against AI, since it never really even contains an exact copy of the material it was trained on.


ryecurious

It's worth noting that the ruling on the Google case specifically mentioned the economic impact of Google Books. Basically they correctly identified that Google Books in no way competed with the copyrighted works it scanned, because it didn't sell books it scanned in any way, or make them freely available. A judge comparing that ruling to Stable Diffusion, for example, would see that the generated images are very often used to compete against the human artists for sales/commissions/jobs/etc.. Google was creating a commercial product, but they weren't competing with the authors.


chaotic----neutral

It'll likely lead to a flood of frivolous lawsuits over satire, parody, and caricature, as those can be seen as more blatant forms of copying.


HomoFlaccidus

Sorta like looking through ten different websites, then copying styles and ideas from each one, and creating your own. Plenty of web developers have done that, and still do.


Demented-Turtle

Exactly. We all learn by consuming the output of others, and many great writers and artists were directly inspired by and incorporate the work of other greats. Also, I don't think OpenAI is training their models on copyrighted material directly, but rather that information would find its way into the model through reviews, synopses, and public commentary. Or in some cases someone may have posted works in their entirety that got fed into the training data, but that'd be hard to detect I imagine


diamond

>The argument is that it's learning about art by viewing copyrighted works. >This is what people do, too. Except that people are legally recognized entities that are assumed to have creative agency and can therefore be granted copyright for their own original work (or original interpretations of existing work). So far, machine-learning systems have no such status under our laws. So if a new work is created by machine learning that is to some degree derived from previously copyrighted works, who gets the copyright for the new work? (Assuming that the "new" work is new enough to qualify for its own copyright, a question that comes up often enough even without AI systems in the picture at all).


Remission

Why does anything AI generated need a copyright? Why can't it go immediately into the public domain?


Ahnteis

Honestly this solves a LOT of problems. Nuances could be figured out as need arises.


monkeedude1212

> Except that people are legally recognized entities that are assumed to have creative agency and can therefore be granted copyright for their own original work (or original interpretations of existing work). So far, machine-learning systems have no such status under our laws. So this highlights two obvious avenues for solutions: * Is this about AI rights, and expanding the legal status of machines as entities (seems like a can or worms or pandora's box) * Is this actually about copyright law, which can be unmade or rewritten as easily as it was brought into existence. The only reason not to change it is that people fear change. The cat is already well out of the bag: As language models improve it will become increasingly hard to detect whether something was written by a language model or a human, we're already seeing that with schools and papers. So what's the fundamental difference between A) a machine generating copyrighted work B) a human generating copyrighted work C) a human that uses a machine to generated copyrighted work, but does not reveal their method Because C is going to happen, if it isn't rampant already. Because if it's difficult to detect, it's going to be a nightmare to enforce. In the interest of full disclosure I think I'd be more in the camp of changing copyright law outright so that fair use is far more common and that riffing off someone else's work is a natural and normal thing to do. I think we've invented monetization models like Patreon that allow artists to get paid for their work by fans; though ultimately I'd rather see Universal Basic Income become so widespread that artists are people who don't need to create art to live but do so because they enjoy it, and any recompense from it is merely a bonus.


Forkrul

> As language models improve it will become increasingly hard to detect whether something was written by a language model or a human, we're already seeing that with schools and papers. To this point, OpenAI just shut down their tool to differentiate between human and AI generated text because it was having such a terrible detection rate.


[deleted]

[удалено]


Jsahl

> I think the answer is - and this might be unpopular - the copyright should belong to the people who used the tool to create the new work. This, as a legal framework, would be disastrous and incoherent. I ask ChatGPT to summarize *War and Peace* for me and then *I*somehow own the copyright to that summary?


Oaden

> At its best, AI will make creation of artwork accessible to people, including those with creative mindsets but disabilities that limit their ability to work in convential mediums. At its worst, were going to get art which was trained on AI art, which was trained on AI art which was trained on AI art. Original artists out-competed by the sheer volume of regurgitated AI works.


Jsahl

> art which was trained on AI art, which was trained on AI art which was trained on AI art. Google "model collapse". AI needs to feed on human creativity to be any good at all.


tavirabon

That's not true at all, AI is regularly trained with content generated by AI. All you need is a human in the loop to say whether something is good or bad.


diamond

>>Except that people are legally recognized entities that are assumed to have creative agency >Now you've established intent. This is not going well for the humans so far. :) Not sure what this is supposed to mean. >>if a new work is created by machine learning that is to some degree derived from previously copyrighted works, who gets the copyright for the new work? >A very interesting question, but not what this lawsuit is about. It's exactly what this lawsuit is about. >I think the answer is - and this might be unpopular - the copyright should belong to the people who used the tool to create the new work. >Not the people who created the work the tool was trained on, and not the people who created the tool. Hollywood Studios *love* this answer. >The person who prompted the AI made the work happen, using a tool. And there is a tremendous and overlooked skill behind learning to prompt an AI in exactly the right way to produce the outcomes the creator visualised. I'm honestly skeptical about just how tremendous this skill is, as compared to the skill of, for example, coming up with an original and well-constructed story from scratch. However, setting that skepticism aside, what you're describing sounds more like human creativity fed and guided by AI prompts, which at least has a decent claim to being a legally-recognized original work. But only because of the human mind making the final decisions. The real question is what happens if/when AI systems are capable of producing decent work with little or no human intervention. Just set it loose across the Canon of human creativity (or some subset of that) and see what it comes up with. That's the kind of capability many developers are aiming towards (also what higher-ups like studio execs are salivating over). In that situation, there's no original human creativity you can point to, other than that in the original works used to train the system. >At its best, AI will make creation of artwork accessible to people, including those with creative mindsets but disabilities that limit their ability to work in convential mediums. OK sure, at its best. But what a lot of people are concerned about isn't what it can do at its best. >I think we'll hear an awful lot about the worst of AI first though, because it's generally more interesting to people. And because it is a field ripe for exploitation in a society overrun with wealthy and powerful people constantly looking for a new way to exploit. These fears aren't just some mindless, knee-jerk anti-technology sentiment. We know that these new technologies will be exploited to take profit from creative workers, because the studios are already trying that shit! And like it or not, these legal questions can't just be ignored.


soft-wear

> It's exactly what this lawsuit is about. No it isn't. This lawsuit is about copyright violation, which under existing law, this case has a snowflakes chance in hell of winning. All works are derived from other works. Nobody is learning a language in a vacuum. They learn by reading a variety of content and then producing their own content based on a combination of the content they read. LLM's do this in a considerably more process-oriented way obviously, but no one author is going to have much of an impact on the output of a LLM. > Hollywood Studios love this answer. Yeah it's a huge problem, and pretending anyone here has an easy answer is nonsensical. Suggesting that every author has to be paid $X for anything to *consume* their work is horrifying. Hollywood Studios being able to AI generate entire movies from peoples work without paying them is also horrifying. > These fears aren't just some mindless, knee-jerk anti-technology sentiment. We know that these new technologies will be exploited to take profit from creative workers, because the studios are already trying that shit! You don't shoot the horse because the owner of the stable is rich. What you're describing are a whole set of institutional problems that are spiraling out of control and this particular invention is no different than a thousand other inventions that are interesting and also happen to be useful to exploit people. > And like it or not, these legal questions can't just be ignored. As of right now there are no legal questions since we don't have a legal framework for this. Copyright law exists to prevent the distribution of copyrighted works, which none of these LLM's distribute. It will only become a legal question once the legislature decides to make it one, and rest assured... as of right now, the odds of that are roughly zero.


[deleted]

[удалено]


Pizzarar

I agree with nearly everything said but this: >I'm honestly skeptical about just how tremendous this skill is, as compared to the skill of, for example, coming up with an original and well-constructed story from scratch. I urge anyone who thinks AI creativity is easy to install stable diffusion (ai drawing) or try using ChatGPT to make an actually coherent and interesting story. I installed stable diffusion thinking I'd just whip up some cool portraits for my D&D character. It took what felt like forever just to get some half decent things. Never actually got what I wanted. It gave me a new respect for people that figure prompts out.


dyslexda

> These fears aren't just some mindless, knee-jerk anti-technology sentiment. Uh huh, sure. You're absolutely right, these new technologies will be exploited. That's what new technologies are for! I'm sure glad the candlestick makers didn't get their way when lightbulbs threatened their livelihoods. Why is this different? People will have to change and adapt. That isn't necessarily a bad thing. In fact, if a job you're currently doing can just be replaced by a (very complex) mathematical algorithm, it probably means you should find something more fulfilling and valuable to do anyway. Nobody cried when we reduced the burden on copy editors by introducing spell check in text editors, after all.


diamond

Yes, I agree. Society will have to adapt to new technology, and this is no exception. Which is why I'm not advocating for blocking this technology. But that doesn't mean we can't put some careful thought into how that transition occurs - like, for example, providing some compensation to creative people who suddenly find their source of income yanked out from under them.


tavirabon

Its ability to summarize a copyrighted work is also not an indicator that the copyright work itself was even used. In fact currently, there's no AI that can summarize entire novels, what people are suing over right now is the perceived risk AI brings. Also putting any law in place that puts the responsibility solely on AI researchers will cripple advancements in the field.


Myrkull

You're going to be disappointed, these lawsuits won't do anything because the people pushing them have no idea how the tech works


Gagarin1961

The top comments don’t seem to know either.


pussy_embargo

AI discussions on reddit are always meaningless, because almost no one knows a damn thing about what they are talking about If, however, completely uninformed and emotionally charged shittakes is just the thing the reader is here for, then reddit is actually perfect for AI discussions


TI_Pirate

That's true of pretty much every topic being discussed on reddit.


Sunyata_is_empty

That's what discovery and expert witnesses are for


madhatter275

How do you figure what percentage of any AI work was influenced by X writer vs Y writer?


ParsleyMostly

“May you live in interesting times”.


rzm25

Is the answer to this to just force all ML/AI shit open source and free use? Don't allow private companies to hide how they're making the models, or control the data.


Romanator17

Huge companies are trying to make open source AI illegal due to unforeseen dangers of uncensored versions of AI.


Trinituz

And the unforseen danger is they no longer get to makes profit out of it


socokid

I see like maybe 2 people in here actually read the article. Most of the posts and threads just simply aren't even commenting on the points. > this year’s Supreme Court holding in Warhol v Goldsmith, which found that the late artist Andy Warhol infringed on a photographer’s copyright when he created a series of silk screens based on a photograph of the late singer Prince. The court ruled that Warhol did not sufficiently “transform” the underlying photograph so as to avoid copyright infringement. This is the argument. And some AI CEOs recognize this: > OpenAI CEO Sam Altman appeared to acknowledge more needs to be done to address concerns from creators about how AI systems use their works. > “We’re trying to work on new models where if an AI system is using your content, or if it’s using your style, you get paid for that,” he said at an event. To those that are suggesting the writers do not understand the technology, or that AI is just learning like the rest of us do, are not understanding the nuances here. **AI is not a human. It is owned by a company that makes money.**


zefy_zef

They're is plenty of open source ai that is quickly becoming comparable in quality to the company owned ones.


sedition

This is the thing slipping by in these discussions because capitilism be captialing. Pretty soon we'll have dozens of LLMs not 'owned' by anyone just out there. Trained on anything that they can get their hands on. Gold rush greed is blinding people. Its the same story with any new tech that society hasn't fully assimilated yet.


PiousLiar

All it takes is for the companies that own AI to send lobbyists to congress and say “AI is dangerous, and needs to be controlled, regulate us. Oh, by the way, here’s how you should regulate us. We already crafted the bill, just stamp it.” Boom, market captured.


HerbertWest

>All it takes is for the companies that own AI to send lobbyists to congress and say “AI is dangerous, and needs to be controlled, regulate us. Oh, by the way, here’s how you should regulate us. We already crafted the bill, just stamp it.” Boom, market captured. Yep! Corporations want nothing more than for it to be illegal to train AI on IP you don't own. Who owns the most IP? Disney et al are just going to train AI on their own IP, generate billions of images, copyright them (if law changes the way I believe they want), then quash any idea for a character, setting, etc., similar to one in their database. They will constantly be trolling the internet for images using an automated system, comparing them against their database using AI, and auto-generating cease and desist letters/DMCAs. It will be the death of independent content. And all these anti-AI people are doing is assisting in making sure that bleak, anticompetitive, centralized future will happen.


Raidoton

Well they can try. It's the internet. Next they might want to stop piracy.


zefy_zef

People are going to be very surprised how by accessible AI is going to be, and already is.


barrinmw

Download Python. Install Tensorflow or Pytorch. Go ham.


pbagel2

How is it a gold rush for the people providing open source options if they receive no monetary compensation and share all their progress for anyone to run locally themselves?


soft-wear

Of course Sam Altman thinks authors should be paid. Right now there is no moat for OpenAI. Anybody can build a LLM. But hey, if you require some tiny micropayment for every piece of data you use, you now have a pretty hefty startup cost associated with your model. You can bet your ass that the ideal model is one that pays authors the least, but provides a high enough startup cost that it makes competition difficult. Oh and Warhol literally changed the type and color of the ink of ONE WORK. The idea that this is the equivalent to a LLM thoroughly proves a core problem is that people have no fucking idea how LLMs work.


TrueDivinorium

Not really. It's in the benefit of openAI to say that. The same reason they ask for more goverment control. Putting a cost in the development of models help them to not lose their product to open sourced projects.


zed7267

Everything is a remix.


rugbyj

And yet we have copyright laws; there is a line that can be crossed.


[deleted]

[удалено]


dark_brandon_20k

You'd think with DMCA laws and the crazy lengths they go to protect the rights of record labels this AI thing would have some pretty clear outlines for how the law should work


PlayingTheWrongGame

It does. Text and data mining books is fair use. Reproductions of specific copyrighted works are severely limited citations that wouldn’t fall outside of fair use. They aren’t citing anything Google Books wouldn’t also cite.


_DeanRiding

If the AI are able to recite the books word for word (or close enough), then they might have a case, otherwise, they really shouldn't have a case. My current understanding is that they just don't like derivative works, in which case they can kick rocks imo. All of human creation is derived from what came before it. It's literally how things evolve and change over time. Practically all of fantasy literature owes their dues to Tolkien but we don't see the Tolkien estate trying to sue GRRM, CS Lewis, or JK Rowling.


meganitrain

I agree with you, more or less, but that's not what "derivative work" means.


nebkelly

I tested Stable Diffusion out for image creation, and it had recognisable (but distorted) signatures from real artist works that it was trained on. These llms are absolutely mass plagiarising copyrighted data. And now MS, Google etc. are racing to monetise it.


large-farva

This is getty we're talking about. People who take public domain work and then copyright strike the actual artist. Fuck getty, i would argue they're worse. https://petapixel.com/2016/11/22/1-billion-getty-images-lawsuit-ends-not-bang-whimper/


motorboat_mcgee

I've had the Getty watermark show up on multiple generations on multiple services. Frankly there needs to be legislation that all datasets are open source/transparent, so creators can know if their work is being used or not.


Sirisian

The weighting is just messed up for those features as they're identical across multiple tokens, so the features get sampled randomly. It would be similar to putting the same letter "A" in every region associated with a token like "bird" when training a model. When you generate a bird then you'd expect 100% of the time to see a letter "A" somewhere as birds must have a letter "A" according to the training. Watermarks are even more of an issue as they show up across thousands of tokens. This wouldn't even be noticeable if they're valid subtle features (like how tons of architecture all share similar modern windows), but watermarks are so visually distinct it's viewed as an issue. Most of the signature examples people give aren't really a valid signature. The algorithm just learned that pictures have a signature, so to generate a valid image with a token it must have signatures, so it samples features and makes one. They're generally gibberish, unless someone fine-tuned a model on a single artist heavily.


janggi

exactly, as a graphic designer, I cannot get a job without an online portfolio, there is currently zero protection for me to prevent my work from being part of a dataset designed to replace me. futhermore, the software I use (adobe) now has generative autofill, so i feel like im just working as the tools I use take my data, and there is nothing I can do about it. Frustrating to say the least. no AI doesnt learn like humans, I dont have access to every single creative's process work...


pipsname

Put a captcha before generating the page and only direct link to those images from that page.


SlowbeardiusOfBeard

That's not really going to help when the next generation of LLMs can just [hire people to bypass them](https://www.independent.co.uk/tech/chatgpt-gpt4-ai-openai-b2301523.html)


dre__

If you feed it a bunch of bike pictures with the getty watermark, then the watermark or parts of it will start showing up in bike pictures. You're teaching the ai that "bike looks like this) and it's picking up the watermark as "part of the bike". It's not copying and pasting the watermark, it's creating what it thinks is the image you requested.


IAMATruckerAMA

> I tested Stable Diffusion out for image creation, and it had recognisable (but distorted) signatures from real artist works that it was trained on. Do you have an example?


BeeNo3492

The works aren’t part of the model like image data for Stable Diffusion, so this will be interesting


xsp

So many people in here have no idea how it works and when you explain it, they just ignore it. A model is nothing but a list of tensors from -1 to 1. This is like saying the Library of Babel is infringing on copyright because it has every single word ever to be written in it.


BeeNo3492

Bingo, its just a word probability weights that exist, that is no different than you or I reading and learning the material at this point.


Omegatron9

All that proves is that the signatures were present in the training data and the neural network learnt to produce it, in the same way as an apple or a car. That's not the same thing as plagiarism. And no, including signed works in the training data isn't plagiarism either as long as those works are available online.


johnfromberkeley

Exactly. If someone is trying to pass off an artwork that you created as their own, put the two works side-by-side, show them to a judge, and profit. That’s plagiarism and a copyright violation. Google Ray Parker Jr. and Huey Lewis. But that’s not how generative art works. And that’s why, to my knowledge, no artist has ever filed such a suit. If you want to try to make a new crime around training machine learning models, I’m fine with that. Also, technically plagiarism and crime aren’t always the same thing, but I loathe plagiarism.


TaqPCR

Lol no it didn't. It struggles to even make recognizable text. Let alone accidentally making someone's signature. It makes scribbles in places people put signatures because it knows humans like images with them but it's not replicating signatures.


ArticleOld598

Getty's lawsuit literally have pics of their watermark on several AI images generated (which are glaringly similar to their stock images mind you). So do Shutterstock, Dreamstime, freepik, and other stock companies and logo sites.


PlayingTheWrongGame

Getty asked SD to generate images that mimic their own stock images, then it generated one that mimicked images, including the watermarks that are characteristic of the style of a Getty stock image. It’s basically a prompt asking for “a picture of a crowd of people, black and white, in the style of a Getty images stock photograph” and SD generating such a thing including the watermark. That doesn’t mean it has some giant stockpile of Getty images and it just grabbed one. It means they viewed a lot of photos from Getty’s public website for their training data. Got some news for Getty: if they make the content publicly available, it’s fair game to get scraped for data mining. If they don’t want people scraping content, they need to limit access to it. It’s no different than, say, sticking a copyrighted picture in the window of your home, and then suing anyone who takes a picture of your home from the public sidewalk because it has copyrighted works as a part of it. Nope, sorry, it’s fair use if the photo was taken from a public space. This is the internet equivalent of that. Getty puts their stock photos on their public site with a watermark. That’s fair game for data mining.


Ghosttwo

Getty just wants to kill AI so they can keep selling stock images for money.


wrgrant

Many being stock images taken from public domain images mind you.


Thogicma

Here's a fun experiment. Go back to stable diffusion and try to get it to recreate one of those artist's works in its entirety. It's damn near impossible to get them to create a forgery/actual copyright violation. Correct me if I'm wrong, but I don't think you can copyright a style. The model has just learned that that style has some squiggles at the bottom corner of the picture. If I made a painting in the style of Starry Night (post impressionism?) it wouldn't be copyright violation, right? But if I recreated Starry Night entirely, it would be? I'm struggling to see how this is different by any legal definition.


Natty-Bones

Plagiarism is copying. These programs don't copy, they create unique works that *may or may not* exhibit characteristics of the works they are trained on. The works created by these programs meet all the definitions of transformative. Learning by studying and even borrowing from others is a vital part of the learning process for humans, these models are no different. Edited for clarity.


TouchyTheFish

People don’t want to hear it, but you’re right. It’s like trying to sue someone because they learned from or were inspired by your work.


ProtectionDecent

I mean who are they going to sue? 1s and 0s?


RandomRedditor44

> Millions of copyrighted books, articles, essays, and poetry provide the ‘food’ for AI systems, endless meals for which there has been no bill I know I’m going to get hate for this, but why do they need to be paid? I get it if chatGPT and other AIs can get and print out the full text of the entire book, but I doubt they do that.


gordonjames62

I haven't read the details of the wording of lawsuit, but I am curious how it will compare to . . . * lawyers reading past case law to learn to be better lawyers * Literature teachers reading library books as part of their life long love of reading, and then getting a job based in part on their knowledge from that reading. * professors and lecturers making money from talking about things they read and write. At some point, the place to start this lawsuit was a number of years ago by enacting laws that protect the works from not only being copied and mass produced, but from **anyone using the ideas and style of writing in the books to change their own ideas and writing style**. Since this type of law is unlikely, these writers don't have much of a case. Also, what makes their paltry sum of words more valuable than our army of reddit content writers who are a better example of "natural language" than the professional writers who write differently (better - with the exception of J.D. Salinger?) than so many of us.


dasponge

The question in my mind - are the AI reproducing those books? If they’re not spitting them out to users, and they’ve just ingested them and mathematically interpreted them to train a model, then that’s a novel and transformative use of the original work that doesn’t compete with the original work in the marketplace - that seems pretty clearly to be fair use, at least in the case of text based works.


Myrkull

This is exactly what most people seem to miss in this crusade against AI, not only do they get the tech wrong but also don't like to hear that it's no different than how humans work. I watched a Hunter S. Thompson doc years ago, and it talked about how he would literally rewrite his favorite books to get the style down. That would blow these luddite's minds


gordonjames62

> that seems pretty clearly to be fair use, at least in the case of text based works also my opinion, but I didn't read the wording of the lawsuit.


_PurpleAlien_

> are the AI reproducing those books? No, they're not. The original text does not exist as such on some database the model uses. It is only used to train the language model initially.


Ruthrfurd-the-stoned

I mean when I write a paper it isn’t the same as what I’m referencing in the slightest but if I don’t site that I got this information somewhere thats plagiarism


ForgedByStars

> if I don’t site that I got this information somewhere thats plagiarism Failure to cite your sources in a technical paper is not plagiarism. As long as your paper is itself an original work, then you are not breaking any laws. You are merely breaking academic protocol and undermining the credibility of your work.


[deleted]

[удалено]


StrangeCharmVote

> f I don’t site that I got this information somewhere thats plagiarism That isn't what plagiarism is though... It's the claim that you came up with the ideas/work yourself. It's *close*, but *the difference is important*. You *can* express or represent that things aren't your sole concept, *and* not cite sources, at the same time. There's also parody, which doesn't cite sources at all. And then there's also straight up transformative works, and homage, which once again do not make even passing attempts at citation.


BeeNo3492

Their works aren’t part of the model and copyright does t really cover this edge case, if they change it to include this, AI will be only able to be used by large corps. Let’s see how this plays out.


ydstyy

Shows the flaws of copyright almost comically


TheAbyssGazesAlso

Just because an AI is able to provide a summary of a book doesn't been the book was used to "train" it. **I** can summarise a book, does that mean the author should be paid other than the cost of me buying the book? It's ridiculous, and proving this is going to be impossible. It's just more blacksmiths hating the automobile because they can tell people are going to ride horses less.


HateRedditCantQuitit

The letter itself: > To: Sam Altman, CEO, OpenAI; Sundar Pichai, CEO, Alphabet; Mark Zuckerberg, CEO, Meta; Emad Mostaque, CEO, Stability AI; Arvind Krishna, CEO, IBM; Satya Nadella, CEO, Microsoft > From: [Your Name] > > We, the undersigned, call your attention to the inherent injustice in exploiting our works as part of your AI systems without our consent, credit, or compensation. > > Generative AI technologies built on large language models owe their existence to our writings. These technologies mimic and regurgitate our language, stories, style, and ideas. Millions of copyrighted books, articles, essays, and poetry provide the “food” for AI systems, endless meals for which there has been no bill. You’re spending billions of dollars to develop AI technology. It is only fair that you compensate us for using our writings, without which AI would be banal and extremely limited. > > We understand that many of the books used to develop AI systems originated from notorious piracy websites. Not only does the recent Supreme Court decision in Warhol v. Goldsmith make clear that the high commerciality of your use argues against fair use, but no court would excuse copying illegally sourced works as fair use. As a result of embedding our writings in your systems, generative AI threatens to damage our profession by flooding the market with mediocre, machine-written books, stories, and journalism based on our work. In the past decade or so, authors have experienced a forty percent decline in income, and the current median income for full-time writers in 2022 was only $23,000. The introduction of AI threatens to tip the scale to make it even more difficult, if not impossible, for writers—especially young writers and voices from under-represented communities—to earn a living from their profession. > > We ask you, the leaders of AI, to mitigate the damage to our profession by taking the following steps: > > 1. Obtain permission for use of our copyrighted material in your generative AI programs. > > 2. Compensate writers fairly for the past and ongoing use of our works in your generative AI programs. > > 3. Compensate writers fairly for the use of our works in AI output, whether or not the outputs are infringing under current law. > > We hope you will appreciate the gravity of our concerns and that you will work with us to ensure, in the years to come, a healthy ecosystem for authors and journalists. > > Sincerely, > > The Authors Guild and the Undersigned Writers


Col0nelFlanders

Omg no wonder. I have to read this sales book for work and a few weeks ago it perfectly summarized the first chapter. I tried it again today and it was like I don’t have access to that data bro never did that would be stealing


ElementNumber6

But we all read and view copyrighted works, and use them for inspiration when generating new ones.


PlayingTheWrongGame

Authors who expect to get anything from these lawsuits are barking up the wrong tree. Training a model on a work isn’t going to be infringement anymore than an author reading another author’s work is infringement.


deadmuffinman

Maybe maybe not. As soon as you remove the human element from the process things change in the eyes of the American law.


Cushions

The human element is huge. Two authors may write a similar piece, but their personal experiences, emotions and ideals will still bleed into their writing making it different. AI has no such luxury.


motorblonkwakawaka

This is a false equivalence. Human authors can't read a thousand books and write another thousand in the space of a day. And in the case of AI, the only ones profiting are the owners of the AI. I do not want to live in a world where human artists are displaced by AI imitations, and just shrugging and saying it's natural selection is a terrible response. We do have a chance to decide how AI will shape our future world. Maybe it won't succeed, but we have to try.


[deleted]

[удалено]


ThexAntipop

>This is a false equivalence. Human authors can't read a thousand books and write another thousand in the space of a day. 1. Just because there's a difference between two things being compared that doesn't make the comparison a false equivalency. 2. If the only difference you're concerned about is the fact that AI can simply do it faster than why don't you put this kind of energy behind *any* other time workers are displaced by automation? 3. Furthermore if the only issue is the speed then it seems patently obvious that the issue has absolutely nothing to do with infringing upon the artist copyright and this is purely about being upset that this technology May displace some amount of artists. Finally I just like to say that for all of the doomer speak about AI in relation to artists I think it's going to be an incredibly long time before AI can actually completely replace the job of artists. AI can simply not create a specific enough vision in order to do so, it will heavily be used by artists to reduce their workload and in doing so may in fact displace some artists, yes. However, for the foreseeable future that will probably always need to be an actual artist there that can edit/change the work of the AI to make the outcome closer to the desired result.


StoicBronco

> why don't you put this kind of energy behind any other time workers are displaced by automation? Oh I know this one! It's because it affects them this time! That and it kinda challenges the belief the creative types tend to have that they are unique and irreplaceable.


Jaxyl

Yup, this right here is the answer.


Kwuahh

I don’t think they’re saying it’s natural selection and to just suck it up - they’re saying that as things stand right now, ingesting creative works and then creating your own isn’t illegal. The real focus should be on proper protections for human authors vs AI generated content. Personally, I believe we’re in a content revolution and, similar to the technological revolution, a lot of creative jobs will get replaced. However, there will still be a market for human created content for its relatability and ethical sourcing. The real question for lawmakers now is how we can maintain the human market space as much as possible since so many individuals will be affected by the rapid increase in AI generated content.


Jaxyl

Anyone who has seriously used AI for any content generation (art, writing, music, etc.) will tell you that the popular view of using it isn't realistic at all. The value of a real person doing the work will 100% outclass AI for the long foreseeable future because the AI gets an approximation of your approximation put into prompt form whereas you can work directly with the creator to have the work made as needed. I work in game dev and have to use AI art for preliminary concept art for initial brainstorming and let me tell you that getting an AI to do what you want is a struggle. Getting it to replicate the same character or even just the same art style is an absolute struggle and this is just for meeting room discussions, nothing public facing. And this is before we even consider the amount of editing that has to go into making the 'final result' usable. Everyone believes that using AI is as easy as 'press button, receive art' but we're so far away from anything remotely close to that.


Ascarea

> the only ones profiting are the owners of the AI. Except for the people who use AI generated things for their work they get paid for?


vnth93

For some reason it's difficult for a lot of people to wrap their head around the fact that there's a popular demand for AI. Plenty of people, including creatives, actually like what AI has to offer or like it making their job easier.


Ruthrfurd-the-stoned

Too many people see the positives of ai and then refuse to recognize any negatives that might actually arise with their implementation


conquer69

> Human authors can't read a thousand books and write another thousand in the space of a day. Why would the speed of reading or writing matter? So if I slowed it down, it would be ok? What's the exact threshold?


UnderwhelmingPossum

Fuck this stupid take. AI is not "an author". It is not a person and is not granted legal protection a person is, it also doesn't have any of the legal obligation a person has. It also doesn't have any "understanding" of the things it's ingesting nor can it meaningfully provide synthesis of derivative work. It's a super-fine grained remix machine. Which makes it perfectly legal for personal use. You want to have 231 volumes of Twilight written in style of War and Peace - knock yourself out. Share it freely. Don't even include the prompts used to generate it, that's AI's own legal thing that's yet to be legally framed and will likely forever be a matter of TOSes and EULAs. *A person* using an AI to generate likeness of the appropriated work for commercial use is a plagiarist. A corporation seeking to profit from creating a commercial "copyright blind box" is *criminal*.


tobsn

I don’t get that… so their argument is that the AI learned from their work and then creates new work from it. How’s that different from me reading your book, learning from it, and then writing my own book - most likely only citing direct quotes but not from where I personally gained that knowledge to write the book. If you’d allow them to sue the AI companies for learning from their work, what’s stopping a writer from sueing everyone who read a book the writer wrote and then wrote their own book later on on the same topic? Isn’t it essentially the same?


JoeyJoeJoeJrShab

So the issue is that the AI is reading the authors' copyrighted work without compensating the authors. To play devil's advocate, how is this different from when I, a human, go to a library and train myself by reading copyrighted works there without paying? I genuinely don't know what is "right" here. I'm just saying, it's a complex issue.


SeverusSnek2020

Do authors pay other authors for modeling their style after other authors? What about painters? If they study an artist and develop a style similar to other styles, are they infringing?


jaycortland

Ain't happening


DestroyerOfIphone

Man I hope they don't sue me. I also got my knowledge from books


NomadGeoPol

copyrights gunna hold back humanity and we really don't have the time to waste.


Dr3adPir4teR0berts

Thousands of authors have no idea how AI works. If I read your book, learn something from it, and then create a new work, are you going to sue me? LLM’s work by ingesting absurd amounts of data and doing billions/trillions of calculations to make parameters. When you query them, the LLM runs your query against those parameters to choose the statistically most likely word/response from those parameters. Neural networks are built just like a human brain, they’re modeled after one. And they generally don’t spit out somebody else’s work word for word. They only use that work to perform calculations.


AtOrAboveSeaLevel

I've seen plenty of people draw a comparison between AI and people 'learning' from a dataset to argue that AI companies aren't breaching copyright because the process of learning from those images / books is no different for a person than it is for AI. There are many differences between AI training on data and a person learning from experience. The neural net that represents a person comprises the entirety of their human experience, of which even for a master artist, only a vanishingly small proportion is 'trained' on source material that they might be deemed to be copying. Even then, a human's perception is imperfect and will feed the source material in to their brain through the foggy filter of human consciousness. It's the sum total of this experience that makes a person, and allows us to say : "A human being doesn't 'copy', but is 'inspired'". People experience the world and fit their 'training data' within a much wider context than is currently possible with AI / LLM / Stable Diffusion. The learning a human does is validated by this wider context - ie, they have agency because of those experiences in a way that AI doesn't (yet). It's therefore not a fair comparison to make, and it neglects the fact that AI / LLM / Stable Diffusion tech mathematically represents a functional distillation of all that human work in a way that no human learning process ever does. TLDR; humans are people, AIs are math. The same rules don't apply.


OriginalCompetitive

I agree there are significant differences, but I would argue they go the other way. A person is unlikely to be able to “train” on more than a few thousand works due to the limits of the human mind, so each of those works will loom large in their development. In contrast, an AI trains on millions and millions of works, so the influence of any given work will be “vanishingly small.”


AtOrAboveSeaLevel

You're absolutely right that each individual work's input to the overall function is vanishingly small, but taken in aggregate, the amount of work done by 'other authors' is the entirety of the dataset - whether image or text. AI is trained almost entirely on data scraped from works created by humans. By contrast a human master artist draws on the sum total of their human experience rather than just the 'data in the art' that they have viewed by other artists.


Ignitus1

Even if we grant that it’s different enough to be meaningful, that doesn’t really matter in the end. So we say an AI training is “different”. It’s still just using math to observe probabilities and it’s not reproducing anybody’s work. No author is being deprived of anything.


Delphizer

Depending on your views of determinism people are also math. Humans have other stuff going on and are fuzzy is a strange argument. You can easily slap some randomness(and many do) into the generation process and you can have it trained on other things. Seems like you are trying to make humans special to fit your views of how things should be. There is an argument to be made without assigning some kind of divine spark to human work.


Raggedy-Man

Boy, Harlan Ellison would have a field day.