Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For those who might wonder how accurate this is, there is advice from the Federal Register to this effect. [0] Its quite comprehensive, and covers pretty much every question that might be asked about "What about...?"

> In these cases, copyright will only protect the human-authored aspects of the work, which are “independent of” and do “not affect” the copyright status of the AI-generated material itself.

[0] https://www.federalregister.gov/documents/2023/03/16/2023-05...

 help



I cannot take seriously any politician or layer using the words "artificial intelligence", especially to models from 2023. These people have never used LLMs to write code. They'd know even current models need constant babysitting or they produce unmaintainable mess, calling anything from 2023 AI is a joke. As the AI proponents keep saying, you have to try the latest model, so anything 2 years old is irrelevant.

There's really 2 ways to argue this:

- Either AI exists and then it's something new and the laws protecting human creativity and work clearly could not have taken it into account and need to be updated.

- Or AI doesn't exist, LLMs are nothing more than lossily compressed models violating the licenses of the training data, their probabilistically decompressed output is violating the licenses as well and the LLM companies and anyone using them will be punished.


If monkeys can't hold copyright, which is an actual case discussed above, then no, an LLM probably can't either. "Human" is required.

Yeah, an LLM, being a machine obviously shouldn't hold copyright. But that doesn't stop people claiming that running vast amounts of code through an LLM can strip copyright from it.

Ultimately LLMs (the first L stands for large and for a good reason) are only possible to create by taking unimaginable amounts of work performed by humans who have not consented to their work being used that way, most of whom require at least being credited in derivative works and many of whom have further conditions.

Now, consent in law is a fairly new concept and for now only applied to sexual matters but I think it should apply to every human interaction. Consent can only be established when it's informed and between parties with similar bargaining power (that's one reason relationships with large age gaps are looked down upon) and can be revoked at any time. None of the authors knew this kind of mass scraping and compression would be possible, it makes sense they should reevaluate whether they want their work used that way.

There are 3 levels to this argument:

1) The letter of the law - if you understand how LLMs work, it's hard to see them as anything more than mechanical transformers of existing work so the letter should be sufficient.

2) The intent of the law - it's clear it was meant to protect human authors from exploitation by those who are in positions where they can take existing work and benefit from it without compensating the authors.

3) The ethics and morality of the matter - here it's blatantly obvious that using somebody's work against their wishes and without compensating them is wrong.

In an ideal world, these 3 levels would be identical but they're not. That means we should strive to make laws (in both intent and letter) more fair and just by changing them.


If consent to use of your code in AI training can be revoked at any time, that makes training impossible, since if anyone ever withdraws consent, it's not like you can just take out their work from your finished model.

Yup. Not my problem.

You could even say it strongly would very strongly incentivize the LLM companies to be on their best behavior, otherwise people would start revoking consent en-masse and they'd have to keep training new models all the time.

If you want something more realistic, there would probably be time limits how long they have to comply and how much they have to compensate the authors for the time it took them to comply.

There absolutely are ways to make it work in mutually beneficial ways, there's just no political will because of the current hype and because companies have learned they can get away with anything (including murder BTW).


> Yup. Not my problem.

And that is why the entire industry is going to roll their eyes and ignore you.

No law is putting this genie back in the bottle, so all there is left to do is adapt and push for models with open training data like those by Ai2.


Almost all the productivity enhancement provided by an AI coding assistant is provided by circumventing the copyright laws, with the remaining enhancement being provided by the fact that it automates the search-copy-paste loop that you would do if you had direct access to the programs used during training.

(Much of the apparent gain of the automatic search-copy-paste is wasted by skipping the review phase that would have been done at that time when that were done manually, which must then be done in a slower manner when you must review the harder-to-understand entire program generated by the AI assistant.)

Despite the fact that AI coding assistants are copyright breaking tricks, the fact that this has become somehow allowed is an overall positive development.

The concept of copyright for programs has been completely flawed from its very beginning. The reason is that it is absolutely impossible to write any kind of program that is not a derivative of earlier programs.

Any program is made by combining various standard patterns and program structures. You can construct a derivation sequence between almost any 2 programs, where you decompose the first in some typical blocks, than compose the second program from such blocks, while renaming all identifiers.

It is quite subjective to decide when a derivation sequence becomes complex enough that the second program should not be considered as a derivative of the first from the point of view of copyright.

The only way to avoid the copyright restrictions is to exploit loopholes in the law, e.g. if translating an algorithm to a different programming language does not count as being derivative or when doing other superficial automatic transformations of a source program changes its appearance sufficiently that it is not recognized as derivative, even if it actually is. Or when combining a great number of fragments from different programs is again not recognized as derivative, though it still kind of is.

The only way how it became possible for software companies like Microsoft or Adobe to copyright their s*t is because the software industry based on copyrighted programs has been jumpstarted by a few decades of programming during which programs were not copyrighted, which could then be used as a base by the first copyrighted programs.

So AI coding agents allow you to create programs that you could not have written when respecting the copyright laws. They also may prevent you from proving that a program written by someone else infringes upon the copyright that you claim for a program written with assistance.

I believe that both these developments are likely to have more positive consequences than negative consequences. The methods used first in USA and then also in most other countries (due to blackmailing by USA) for abusing the copyright laws and the patent laws have been the most significant blockers of technical progress during the last few decades.

The most ridiculous claim about the copyright of programs is that it is somehow beneficial for "creators". Artistic copyrights sometimes are beneficial for creators, but copyrights on non-open-source programs are almost never owned by creators, but by their employers, and even those have only seldom any direct benefit from the copyright, but they use it with the hope that it might prevent competition.


> The reason is that it is absolutely impossible to write any kind of program that is not a derivative of earlier programs.

And that's why copyright has exceptions for humans.

You're right copyright was the wrong tool for code but for the wrong reasons.

It shouldn't be binary. And the law should protect all work, not just creative. Either workers would come to a mutual agreement how much each contributed or the courts would decide based on estimates. Then there'd be rules about how much derivation is OK, how much requires progressively more compensation and how much the original author can plainly tell you what to do and not do with the derivative.

It's impossible to satisfy everyone but every person has a concept of fairness (it has been demonstrated even in toddlers). Many people probably even have an internally consistent theory of fairness. We should base laws on those.

> abusing the copyright laws and the patent laws have been the most significant blockers of technical progress during the last few decades

Can you give examples?

> copyrights on non-open-source programs are almost never owned by creators, but by their employers

Yes and that's another thing that's wrong with the system, employment is a form of abusive relationship because the parties are not equal. We should fix that instead of throwing out the whole system. Copyright which belongs to creators absolutely does give creators more leverage and negotiating power.


> And that's why copyright has exceptions for humans.

Why would the exceptions be only for humans?

"Only human works can get copyright" makes plenty of sense. "Only humans can have fair use" doesn't make sense. Why would we disallow a monkey video having a clip of something as part of the monkey reviewing it? Why would we allow a human to caption something for accessibility but not a computer?

Grammar and idioms should be outside the realm of copyright entirely, not something you get an exception to use anyway.

> It's impossible to satisfy everyone but every person has a concept of fairness (it has been demonstrated even in toddlers). Many people probably even have an internally consistent theory of fairness. We should base laws on those.

A lot of people seem to default to thinking they should get permanent and total control over any idea they have, so I think it's a bad idea to rely on intuition here.


> Why would the exceptions be only for humans?

For starters because you can't own humans. If it's possible to launder copyrighted work through something which can be owned, then rich people get an advantage because they can own more of it.

> so I think it's a bad idea to rely on intuition here

Yep, that's why I said we should only concern ourselves with those which are internally consistent. If people want to apply rules to others which they don't intend to or cannot follow themselves, they lose the right to be taken seriously.


> For starters because you can't own humans. If it's possible to launder copyrighted work through something which can be owned, then rich people get an advantage because they can own more of it.

If it's actually 'laundering' then it's invalid to begin with.

If it's a proper new thing then how do rich people get an advantage? If anything AI code is cheap enough to even things out.

> Yep, that's why I said we should only concern ourselves with those which are internally consistent. If people want to apply rules to others which they don't intend to or cannot follow themselves, they lose the right to be taken seriously.

I think a lot of those people are consistent! The issue is they have way too little respect for the public domain and are overprioritizing property against freedom.


> If it's actually 'laundering' then it's invalid to begin with.

It's laundering in any reasonable meaning of the word. Whether it's legal according to the letter of the law is being decided.

Please differentiate morality and legality as well as intent and letter of the law.

> If anything AI code is cheap enough to even things out.

1) Do you think people have and will have access to the same models as large corporations internally, especially those who train LLMs themselves? Nothing stopping Google from excluding its own source code from the publicly available models but including it for internal models.

2) It's not just about the code, it's about the whole pipeline from nothing to a finished product and revenue stream. Did you know half the price of a new car is marketing? How much you can spend on ads, legal, market research, sales reps, etc. In some areas, especially B2B, nobody will even talk to you if you're a single guy in a shed, companies want stability, predictability and long term support.

3) More crudely, if you wanted to influence product selection or government elections, how many tokens could you afford for LLMs to influence online discussions, how many residential IPs could you afford, how much data could you buy about users to target each one specifically? Rich people will clearly have an advantage there.

Basically, if the cost of code goes towards zero, other factors will play a larger role.

> I think a lot of those people are consistent!

Only if they're consistently applying the rules to others but not themselves. Otherwise "permanent and total control over any idea they have" means they could never base anything on other people's ideas.


It's silly to say a human writing a piece of software is laundering their knowledge of existing software, even if they're trying to make a competitor to a specific thing. Legally and morally.

It's just a silly to say it's laundering when a machine does it.


>> abusing the copyright laws and the patent laws have been the most significant blockers of technical progress during the last few decades > Can you give examples?

This is a subject so vast that giving examples requires a book-length text. IIRC at least one or two books have actually been written about this, but I am too lazy to search now for their titles.

I am more familiar with what happened in cryptography, where many algorithms have begun to be used only after the 20 years or more required for their patents to expire, while as long as patents remained valid, inferior solutions were used, wasting energy and computing time.

Regarding copyrights, I know best my own activity, but I am pretty certain that this anecdotal experience is representative for many programmers.

During the first decades of computer programming, until the seventies, there have been a lot of discussions about software reuse as the main factor that can improve programming productivity, and about which features of the programming languages and of the available programming tools can increase the amount of reuse, like modularity.

However all those discussions were naive, because later the amount of reuse has remained much lower than predicted, but the causes were not technical, but the copyright laws. Open-source programs have become the main weapon against the copyright laws, which enable the reuse of software nowadays.

However the value of software reuse has never been understood by the management of many companies. In decades of working as a programmer, I have wasted a lot of time with writing programs in such a manner so that whoever was my employer could claim the copyright for them.

There were plenty of opportunities when I could have used open-source programs, but I could not use them as there was someone who insisted that the product must contain "software IP" owned by the company. Therefore I had to waste time by rewriting something equivalent with what I could have used instantaneously, but different enough to be copyrightable.

There were also other cases that were even more annoying, when I had to waste time by rewriting programs that I had already written in the past, but in a different way so that there will be no copyright infringement. Some times the old programs were written when being employed elsewhere, other times they were programs written for myself, during my own time and on my own computers. In such cases, I could not use my own programs, as the employer would then claim copyright on them, so I would lose ownership and I would not be able to use them in the future, for my own needs.

There are many projects where I have wasted more time avoiding copyrights than solving problems. I believe that there must be many others who must have had similar experiences.

So I welcome the copyright-washing AI coding assistants, which can be employed successfully in such cases in order to avoid the wasteful duplication of work.


It all boils down to some people thinking they should be able to use other people's work for free.

> patents

Patents, unlike copyright, are not automatic. Which indicates that the people who expended their limited lifetime to invent the algorithms explicitly did not want you using them, at least not unless you came to an agreement with them first.

---

re rewriting:

There's your real problem. Copyright should belong to the people doing the actual work, not owners/employers who perform no useful work.

If that was the case, the person who did the original work would have no reason to prevent you from using it, as long as he could also benefit from the fruits of your combined labor. For him, the work was already done, it would be extra reward. For you, it would be profitable as long as his reward was less than the cost of you doing it from scratch. You'd most likely meet somewhere in the middle.

Same situation when rewriting your own work.

As often happens, a system was put in place for good. Rich people found a way to exploit it. Now, instead of trying to fix the system, you're arguing to remove it entirely, not realizing you'll be worse off in the end. LLM want to replace all programmers by using their work against them. This is not for your benefit, it's for theirs.

As I often say, what should be protected isn't creativity or expression but work. People should benefit from their work and it should not be used against them. It should also not be possible for someone to benefit without doing useful work.

---

Would you work for a company which develops software to detect homosexuals using public cameras and eye tracking? What about a company discovering and selling Android exploits to governments? Does it matter which governments? What about a company which tracks employee movements and productivity to such a level they have to pee in bottles to meet quotas?

The world is full of these examples but at least you had the choice of not helping them. Now you don't.

The people who own them are some of the most anti-social people on the planet and you think they should be able to use our work as they wish...


Nice, -4 points, somebody, many somebodies in fact, took that personally and yet were unable to express where they disagree in a comment.

Look, if you think I am wrong, you can surely put it into words. OTOH, if you don't think I am wrong but feel that way, then it explains why I see no coherent criticism of my statements.


When your comment is about how you can’t take your counterparty seriously and they’re a joke, you’re incentivizing people who disagree to just downvote and move on.

The signal you’re sending is that you are not open to discussing the issue.


It's a fallacy. Someone being utterly wrong and dismissing them for it so does not logically make me claim easily dismissible.

Yea, that’s exactly what I’m talking about.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: