Alpine and Chimera however both are not reproducible or full source bootstrapped or signed and do not enforce code review. I would honestly steer clear of both for anything but low risk hobby use cases.
IMO they should be best thought of as research projects useful for reference by distros designed for production use.
For those that like the LLVM/musl/mimalloc choices of chimera, but also want signed commits, signed reviews, container-native design, full source bootstrapping, 100% deterministic builds, and multi-party-signed artifacts check out https://stagex.tools
I was about to say I would need a newer gen card to test the new open kernel driver stack, but on some research it appears that the 2080 series was the first to support them, and with that new knowledge I realized I have a 2080Ti on hand already.
So thanks for offering yours. It made me remember I actually own one!
Look, I think Sam Altman is a terrible person too, but to anyone reading that hates people like him as much as I do you should want him alive while we work to build a world where he can live out a long life in complete safety, in prison.
Violence never solves anything. You will never make anything in this world better by becoming a worse person than your enemies.
My team and I have been building stagex, a FOSS multi-party reviewed/built/signed, deterministic, full source bootstrapped, llvm native, container native, musl/mimalloc native linux distribution to build all the things.
ROCm is open source and TheRock is community maintained, and in a minute the first Linux distro will have native in-tree builds. It will be supported for the foreseeable future due to AMDs open development approach.
It is Nvidia that has the track record of closed drivers and insisting on doing all software dev without community improvements to expected results.
> And the worst privacy, transparency, and FOSS integration due to their insistence on a heavily proprietary stack.
The market doesn't care about any of that. The consumer market doesn't care, and the commercial market definitely does not. The consumer market wants the most Fortnite frames per second per dollar. The commercial market cares about how much compute they can do per watt, per slot.
> there exist plenty like me that demand high freedom and transparency and will pay double for it if we have to.
The four percent share of the datacenter market and five percent of the desktop GPU market say (very strongly) otherwise.
I have a 100% AMD system in front of me so I'm hardly an NVIDIA fanboy, but you thinking you represent the market is pretty nuts.
I did not claim to represent the market as a whole, but I feel I likely represent a significant enough segment of it that AMD is going to be just fine.
I think local power efficient LLMs are going to make those datacenter numbers less relevant in the long run.
Just spent the last week or so porting TheRock to stagex in an effort to get ROCm built with a native musl/mimalloc toolchain and get it deterministic for high security/privacy workloads that cannot trust binaries only built with a single compiler.
It has been a bit of a nightmare and had to package like 30+ deps and their heavily customized LLVM, but got the runtime to build this morning finally.
Things are looking bright for high security workloads on AMD hardware due to them working fully in the open however much of a mess it may be.
I also attempted to package ROCM on musl. Specifically, packaging it for Alpine Linux.
It truly is a nightmare to build the whole thing. I got past the custom LLVM fork and a dozen other packages, but eventually decided it had been too much of a time sink.
I’m using llama.cpp with its vulkan support and it’s good enough for my uses. Vulkan so already there and just works. It’s probably on your host too, since so many other things rely on it anyway.
That said, I’d be curious to look at your build recipes. Maybe it can help power through the last bits of the Alpine port.
Interesting how Vulkan and ROCM are roughly the same age (~9 years), but one is incredibly more stable (and sometimes even more performant) for AI use cases as side-gig, while the other one is having AI as its primary raison d'être. Tells you a lot about the development teams behind them.
I've built llama.cpp against both Vulkan and ROCm on a Strix Halo dev box. I agree Vulkan is good enough, at least for my hobbyist purposes. ROCm has improved but I would say not worth the administrative overhead.
I realize it does not address the OP security concerns, but I'm having success running rocm containers[0] on alpine linux specifically for llama.cpp. I also got vLLM to run in a rocm container, but I didn't have time to to diagnose perf problems, and llama.cpp is working well for my needs.
It is sad to observe this time and time again. Last year I had the idea to run a shareholder campaign to change this, I suspended it after last years AMD promises - but maybe this really needs to be done: https://unlockgpu.com/action-plan/
So much about this confuses me. What do Kitty and ncurses have to do with ROCm? Why is this being built with GCC instead of clang? Why even bother building it yourself when the tarballs are so good and easy to work with?
On the last one: OP said they were trying to get it working for a musl toolchain, so the tarballs are probably not useful to them (I assume they're built for glibc).
Agreed on the others though. Why's it even installing ncurses, surely that's just expected to be on the system?
> Hey @rektide, @apaz-cli, we bundle all sysdeps to allow to ship self-contained packages that users can e.g. pip install. That's our basic default and it allows us to tightly control what we ship. For building, it should generally be possible to build without the bundled sysdeps in which case it is up to the user to make sure all dependencies are properly installed. As this is not our default we seemed to have missed some corner cases and there is more work needed to get back to allow builds with sysdeps disabled. I started #3538 but it will need more work in some other components to fully get you what you're asking with regards to system dependencies. Please not that we do not test with the unbundled, system provided dependencies but of course we want to give the community the freedom to build it that way.
I did get past that issue with nurses & kitty! Thanks for some work there!
There are however quite a large list of other issues that have been blocking builds on systems with somewhat more modern toolchains / OSes than whatever the target is here (Ubuntu 24.04 I suspect). I really want to be able to engage directly with TheRock & compile & run it natively on Ubuntu 25.04 and now Ubuntu 26.04 too. For people eager to use the amazing leading edge capabilities TheRock offers, I suspect they too will be more bleeding edge users, also with more up to date OS choices. They are currently very blocked.
I know it's not the intent at all. There's so much good work here that seems so close & so well considered, an epic work spanning so many libraries and drivers. But this mega thread of issues gives me such vibes of the bad awful no good Linux4Tegra, where it's really one bespoke special Linux that has to be used, that nothing else works. In this case you can download the tgz and it will probably work on your system, but that means you don't have any chance to improve or iterate or contribute to TheRock, that it's a consume only relationship, and that feels bad and is a dangerous spot to be in, not having usable source.
I'd really really like to see AMD have CI test matrixes that we can see, that shows the state of the build on a variety of Linux OSes. This would give the discipline and trust that situations like what we have here do not arise. This obviously cannot hold forever, Ubuntu 24.04 is not acceptable as a build machine for perpetuity, so these problems eventually have to be tackled, but it's really a commitment to avoiding making the build work on one blessed image only that needs to happen. This situation should not have developed; for TheRock to be accepted and useful, the build needs to work on a variety of systems. We need fixes right now to make that true, and AMD needs to be showing that their commitment to that goal is real, ideally by running and showing a build matrix CI where we can see it that it does compile.
Nvidia is opening their source code because they moved most of their source code to the binary blob they're loading. That's why they never made an open source Nvidia driver for Pascal or earlier, where the hardware wasn't set up to use their giant binary blobs.
It's like running Windows in a VM and calling it an open source Windows system. The bootstrapping code is all open, but the code that's actually being executed is hidden away.
Intel has the same problem AMD has: everything is written for CUDA or other brand-specific APIs. Everything needs wrappers and workarounds to run before you can even start to compare performance.
In the python eco system you can just replace CUDA with DirectML in at least one popular framework and it just runs. You are limited to windows then though.
I have both of those cards. Llama.cpp with SYCL has thus far refused to work for me, and Vulkan is pretty slow. Hoping that some fixes come down the pipe for SYCL, because I have plenty of power for local models (on paper).
I had to rebuild llama.cpp from source with the SYCL and CPU specific backends.
Started with a barebones Ubuntu Server 24 LTS install, used the HWE kernel, pulled in the Intel dependencies for hardware support/oneapi/libze, then built llama.cpp with the Intel compiler (icx?) for the SYCL and NATIVE backends (CPU specific support).
In short, built it based mostly on the Intel instructions.
>Just spent the last week or so porting TheRock to stagex in an effort to get ROCm built with a native musl/mimalloc toolchain and get it deterministic for high security/privacy workloads that cannot trust binaries only built with a single compiler.
...I have a feeling you might not be at liberty to answer, but... Wat? The hell kind of "I must apparently resist Reflections on Trusting Trust" kind of workloads are you working on?
And what do you mean "binaries only built using a single compiler"? Like, how would that even work? Compile the .o's with compiler specific suffixes then do a tortured linker invo to mix different .o's into a combined library/ELF? Are we talking like mixing two different C compilers? Same compiler, two different bootstraps? Regular/cross-mix?
I'm sorry if I'm pushing for too much detail, but as someone whose actually bootstrapped compilers/user spaces from source, your usecase intrigues me just by the phrasing.
For information on stagex and how we do signed deterministic compiles across independently operated hardware see https://stagex.tools
Stagex is used by governments, fintech, blockchains, AI companies, and critical infrastructure all over the internet, so our threat model must assume at least one computer or maintainer is compromised at all times and not trust any third party compiled code in the entire supply chain.
Nice! I'd thought about doing something similar, but never went so far as to get where y'all are at! I got about to an LFS distro that I was in the process of picking apart GCC to see if I could get the thing verifiable. Can't say as I'm fond of the container first architecture, but I understand why you did it, and my old fartness aside, keep up the good work! Now I have another project to keep an eye on. And at least 4 other people other than me that take supply chain risk seriously! Yay!
Container-first here is mostly about build sandboxing and a packaging format where we avoid re-inventing the wheel and using standards to achieve toolchain diversity and minimalism. Docker is used as a default as it is most popular but you can build with a shell script in a chroot without much work and we want to have several paths to build.
Also sxctl will download, verify, and install packages without a container runtime being installed at all.
It cannot be understated how religiously opposed many in the Linux community are to even a single AI assisted commit landing in the kernel no matter how well reviewed.
Plenty see Torvalds as a traitor for this policy and will never contribute again if any clearly labeled AI generated code is actually allowed to merge.
People have measurably lower levels of ownership and understanding of AI generated code. The people using GenAI reap a major time and cognitive effort savings, but the task of verification is shifted to the maintainer.
In essence, we get the output without the matching mental structures being developed in humans.
This is great if you have nothing left to learn, its not that great if you are a newbie, or have low confidence in your skill.
> LLM users also struggled to accurately quote their own work. While LLMs offer immediate convenience, our findings highlight potential cognitive costs. Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels.
While I agree with this intuitively, I also just can't get past the argument that people said the same thing when we switched from everyone using ASM to C/Fortran etc.
> "I also just can't get past the argument that people said the same thing when we switched from everyone using ASM to C/Fortran etc."
There was no "switch"; the transition took literally decades. Assembler and high level languages co-existed in the mainstream all the way until the 1990s because it was well understood that there was a trade off getting the best performance using assembler (e.g. DOOM's renderer in 1993) and ease of development and portability (something that really mattered when there were a dozen different CPU architectures around) using high level languages.
There is no need to get past the argument because it doesn't exist. Nobody said that.
>can't get past the argument that people said the same thing when we switched from everyone using ASM to C/Fortran
that's a bad comparison for two reasons. One is that C is a transparent language that requires understanding of its underlying mechanics. Using C doesn't absolve you from understanding lower concepts and was never treated as such. The power of C comes squarely with a warning label that this is a double edged sword.
Secondly insofar as people have used higher level languages as a replacement for understanding and introduced a "everyone can code now" mentality the criticism has been validated. What we've gotten, long before AI tooling, were shoddy, slow, insecure tower-of-babel like crappy codebases that were awful for the exact same reason these newest practices are awful.
Introducing new technology must never be an excuse for ignorance, the more powerful the tool the greater the knowledge required of the user. You don't hand the most potent dangerous weapon to the least competent soldier.
I would trust the python or rust code to be memory safe even if written by a rando that has no idea how to implement memory safety. Meanwhile, I do not trust any human to write memory safe C.
Sure, a lot of people are incompetent. But the world generally works. Which is of course, the problem. The only time anything really gets questioned is when you start having a GitHub like 0 9s situation.
There is a massive difference in outright transformation of something you created yourself vs a collage of snippets + some sauce based on stuff you did not write yourself. If all you did to use your AI was to train it exclusively on your own work product create during your lifetime I would have absolutely no problem with it, in fact in that case I would love to see copyright extended to the author.
But in the present case the authorship is just removed by shredding the library and then piecing back together the sentences. The fact that under some circumstances AIs will happily reproduce code that was in the training data is proof positive they are to some degree lossy compressors. The more generic something is ("for (i=0;i<MAXVAL;i++) {") the lower the claim for copyright protection. But higher level constructs past a couple of lines that are unique in the training set that are reproduced in the output modulo some name changes and/or language changes should count as automatic transformation (and hence infringing or creating a derivative work).
Is this even a controversial statement? Seems very clearly correct to me.
My original point wasn't worried about the copyright though. I'm completely ignoring it for now because I do agree it's a problem until Congress says something (lol) or courts do.
The study compares ChatGPT use, search engine use, and no tool use.
The issues with moving from ASM to C/Fortran are different from using LLMs.
LLMs are automation, and general purpose automation at that. The Ironies of Automation came out in the 1980s, and we’ve known there are issues. Like Vigilance decrement that comes when you switch from operating a system to monitoring a system for rare errors.
On top of that, previous systems were largely deterministic, you didn’t have to worry that the instrumentation was going to invent new numbers on the dial.
So now automation will go from flight decks and assembly lines, to mom and pop stores. Regular to non-deterministic.
The HLL-to-LLM switch is fundamentally different to the assembler-to-HLL switch. With HLLs, there is a transparent homomorphism between the input program and the instructions executed by the CPU. We exploit this property to write programs in HLLs with precision and awareness of what, exactly, is going on, even if we occasionally do sometimes have to drop to ASM because all abstractions are leaky. The relation between an LLM prompt and the instructions actually executed is neither transparent nor a homomorphism. It's not an abstraction in the same sense that an HLL implementation is. It requires a fundamental shift in thinking. This is why I say "stop thinking like a programmer and start thinking like a business person" when people have trouble coding with LLMs. You have to be a whole lot more people-oriented and worry less about all the technical details, because trying to prompt an LLM with anywhere near the precision of using an HLL is just an exercise in frustration. But if you focus on the big picture, the need that you want your program to fill, LLMs can be a tremendous force multiplier in terms of getting you there.
> The people using GenAI reap a major time and cognitive effort savings, but the task of verification is shifted to the maintainer.
The people using GenAI should be the ones doing the verification. The maintainer's job should not meaningfully change (other than the maintainer using AI to review on incoming code, of course).
Why does everyone who hears "AI code" automatically think "vibe-coded"?
All kinds of worries are possible. (1) It turns out that all this AI generated stuff is full of bugs and we go back to traditional software development, creating a giant disinvestment and economic downturn. (2) sofware quality going way down. we cannot produce reliable programs anymore. (3) massive energy use makes it impossible to use sustainable energy sources and we wreck the environment every more than we are currently doing. (4) AIs are in the hands of a few big companies that abuse their power. (5) AI becomes smarter than humans and decides that humans are outdated and kills all of us.
It obviously depends on how powerful AI is going to become. These scenarios are mutually exclusive because some assume that AI is actually not very powerful and some assume that it is very powerful. I think one of these things happening is not at all unlikely.
1 and 2 are really only an issue if you vibe code. There's no reason to expect properly reviewed AI assisted code to be any worse than human written code. In fact, in my experience, using LLMs to do a code review is a great asset - of used in addition to human review
In particular that the most used LLMs are proprietary. This is in great opposition to the best software out there so far: tcp/ip, linux, git, emacs, postgres, and a long etc. We depend enormously on this tools and that’s fine because they are open source. But we are starting to depend enormously on proprietary LLMs and that sucks. I know we have open source LLMs but 99% of us are not using them; that’s reality.
For me it's always the fear of AI regurgitating something legally problematic directly from its training set: unintentionally adding copyright and licensing issues from those even with no intentions of doing so.
Obviously these issues existed before AI, but they required active deception before. Regurgitating others people's code just becomes the norm now.
Are they against change in general, or certain kinds of change? Remember when social media was seen as near universal good kind of progress? Not so much now.
Social media has never been seen as a universal positive force? It's the same with AI. It has good and bad aspects as does any technology that has an impact on this scale, AI will arguably have a much bigger impact imo.
People are generally against change that forces them to change the way they used to do things.
I'm sure most will have their reasons why they are against this particular change, but I don't think it will affect anything. The genie is out of the bottle, AI is here to stay. You either adapt or you will slowly wither away.
It reminds me of something I read on mastodon: "genie doesn't go back in the bottle say AI promoters while the industry spends a trillion dollars a year to try to keep the genie out of the bottle"
It's certainly possible. All that is required is for AIs to become more expensive than humans. Developing projects on a $100 Claude Code subscription is a lot of fun. I bet people would simply go back to hiring human developers if that subscription cost $10,000 instead.
That is the bait and switch. The end goal is that you are out of the equation. Your perceived effectiveness at using AI as an exchange of labor diminishes over time to the point that you become irrelevant.
Who has that end goal?? Who is going to direct the AI if only the CEO is left in the organization? The CEO will never actually do it , and will always need someone who can and will do it. I just can’t see a grand plan to take humans out of the equation entirely.
If you selectively read one sentence of my comment, you risk missing the forest for the trees. I don't have any particular knowledge on the arab spring so I won't comment on that but I quite clearly said that technology has good and bad aspects to it.
This is like blaming a knife as being a killer weapon. Social media is inherently good if owners of the platforms allow for good interactions to take place. But given the mismatch between incentives alignment, we don't have nice things.
For those who might wonder how accurate this is, there is advice from the Federal Register to this effect. [0] Its quite comprehensive, and covers pretty much every question that might be asked about "What about...?"
> In these cases, copyright will only protect the human-authored aspects of the work, which are “independent of” and do “not affect” the copyright status of the AI-generated material itself.
I cannot take seriously any politician or layer using the words "artificial intelligence", especially to models from 2023. These people have never used LLMs to write code. They'd know even current models need constant babysitting or they produce unmaintainable mess, calling anything from 2023 AI is a joke. As the AI proponents keep saying, you have to try the latest model, so anything 2 years old is irrelevant.
There's really 2 ways to argue this:
- Either AI exists and then it's something new and the laws protecting human creativity and work clearly could not have taken it into account and need to be updated.
- Or AI doesn't exist, LLMs are nothing more than lossily compressed models violating the licenses of the training data, their probabilistically decompressed output is violating the licenses as well and the LLM companies and anyone using them will be punished.
Yeah, an LLM, being a machine obviously shouldn't hold copyright. But that doesn't stop people claiming that running vast amounts of code through an LLM can strip copyright from it.
Ultimately LLMs (the first L stands for large and for a good reason) are only possible to create by taking unimaginable amounts of work performed by humans who have not consented to their work being used that way, most of whom require at least being credited in derivative works and many of whom have further conditions.
Now, consent in law is a fairly new concept and for now only applied to sexual matters but I think it should apply to every human interaction. Consent can only be established when it's informed and between parties with similar bargaining power (that's one reason relationships with large age gaps are looked down upon) and can be revoked at any time. None of the authors knew this kind of mass scraping and compression would be possible, it makes sense they should reevaluate whether they want their work used that way.
There are 3 levels to this argument:
1) The letter of the law - if you understand how LLMs work, it's hard to see them as anything more than mechanical transformers of existing work so the letter should be sufficient.
2) The intent of the law - it's clear it was meant to protect human authors from exploitation by those who are in positions where they can take existing work and benefit from it without compensating the authors.
3) The ethics and morality of the matter - here it's blatantly obvious that using somebody's work against their wishes and without compensating them is wrong.
In an ideal world, these 3 levels would be identical but they're not. That means we should strive to make laws (in both intent and letter) more fair and just by changing them.
If consent to use of your code in AI training can be revoked at any time, that makes training impossible, since if anyone ever withdraws consent, it's not like you can just take out their work from your finished model.
You could even say it strongly would very strongly incentivize the LLM companies to be on their best behavior, otherwise people would start revoking consent en-masse and they'd have to keep training new models all the time.
If you want something more realistic, there would probably be time limits how long they have to comply and how much they have to compensate the authors for the time it took them to comply.
There absolutely are ways to make it work in mutually beneficial ways, there's just no political will because of the current hype and because companies have learned they can get away with anything (including murder BTW).
Almost all the productivity enhancement provided by an AI coding assistant is provided by circumventing the copyright laws, with the remaining enhancement being provided by the fact that it automates the search-copy-paste loop that you would do if you had direct access to the programs used during training.
(Much of the apparent gain of the automatic search-copy-paste is wasted by skipping the review phase that would have been done at that time when that were done manually, which must then be done in a slower manner when you must review the harder-to-understand entire program generated by the AI assistant.)
Despite the fact that AI coding assistants are copyright breaking tricks, the fact that this has become somehow allowed is an overall positive development.
The concept of copyright for programs has been completely flawed from its very beginning. The reason is that it is absolutely impossible to write any kind of program that is not a derivative of earlier programs.
Any program is made by combining various standard patterns and program structures. You can construct a derivation sequence between almost any 2 programs, where you decompose the first in some typical blocks, than compose the second program from such blocks, while renaming all identifiers.
It is quite subjective to decide when a derivation sequence becomes complex enough that the second program should not be considered as a derivative of the first from the point of view of copyright.
The only way to avoid the copyright restrictions is to exploit loopholes in the law, e.g. if translating an algorithm to a different programming language does not count as being derivative or when doing other superficial automatic transformations of a source program changes its appearance sufficiently that it is not recognized as derivative, even if it actually is. Or when combining a great number of fragments from different programs is again not recognized as derivative, though it still kind of is.
The only way how it became possible for software companies like Microsoft or Adobe to copyright their s*t is because the software industry based on copyrighted programs has been jumpstarted by a few decades of programming during which programs were not copyrighted, which could then be used as a base by the first copyrighted programs.
So AI coding agents allow you to create programs that you could not have written when respecting the copyright laws. They also may prevent you from proving that a program written by someone else infringes upon the copyright that you claim for a program written with assistance.
I believe that both these developments are likely to have more positive consequences than negative consequences. The methods used first in USA and then also in most other countries (due to blackmailing by USA) for abusing the copyright laws and the patent laws have been the most significant blockers of technical progress during the last few decades.
The most ridiculous claim about the copyright of programs is that it is somehow beneficial for "creators". Artistic copyrights sometimes are beneficial for creators, but copyrights on non-open-source programs are almost never owned by creators, but by their employers, and even those have only seldom any direct benefit from the copyright, but they use it with the hope that it might prevent competition.
> The reason is that it is absolutely impossible to write any kind of program that is not a derivative of earlier programs.
And that's why copyright has exceptions for humans.
You're right copyright was the wrong tool for code but for the wrong reasons.
It shouldn't be binary. And the law should protect all work, not just creative. Either workers would come to a mutual agreement how much each contributed or the courts would decide based on estimates. Then there'd be rules about how much derivation is OK, how much requires progressively more compensation and how much the original author can plainly tell you what to do and not do with the derivative.
It's impossible to satisfy everyone but every person has a concept of fairness (it has been demonstrated even in toddlers). Many people probably even have an internally consistent theory of fairness. We should base laws on those.
> abusing the copyright laws and the patent laws have been the most significant blockers of technical progress during the last few decades
Can you give examples?
> copyrights on non-open-source programs are almost never owned by creators, but by their employers
Yes and that's another thing that's wrong with the system, employment is a form of abusive relationship because the parties are not equal. We should fix that instead of throwing out the whole system. Copyright which belongs to creators absolutely does give creators more leverage and negotiating power.
> And that's why copyright has exceptions for humans.
Why would the exceptions be only for humans?
"Only human works can get copyright" makes plenty of sense. "Only humans can have fair use" doesn't make sense. Why would we disallow a monkey video having a clip of something as part of the monkey reviewing it? Why would we allow a human to caption something for accessibility but not a computer?
Grammar and idioms should be outside the realm of copyright entirely, not something you get an exception to use anyway.
> It's impossible to satisfy everyone but every person has a concept of fairness (it has been demonstrated even in toddlers). Many people probably even have an internally consistent theory of fairness. We should base laws on those.
A lot of people seem to default to thinking they should get permanent and total control over any idea they have, so I think it's a bad idea to rely on intuition here.
For starters because you can't own humans. If it's possible to launder copyrighted work through something which can be owned, then rich people get an advantage because they can own more of it.
> so I think it's a bad idea to rely on intuition here
Yep, that's why I said we should only concern ourselves with those which are internally consistent. If people want to apply rules to others which they don't intend to or cannot follow themselves, they lose the right to be taken seriously.
> For starters because you can't own humans. If it's possible to launder copyrighted work through something which can be owned, then rich people get an advantage because they can own more of it.
If it's actually 'laundering' then it's invalid to begin with.
If it's a proper new thing then how do rich people get an advantage? If anything AI code is cheap enough to even things out.
> Yep, that's why I said we should only concern ourselves with those which are internally consistent. If people want to apply rules to others which they don't intend to or cannot follow themselves, they lose the right to be taken seriously.
I think a lot of those people are consistent! The issue is they have way too little respect for the public domain and are overprioritizing property against freedom.
> If it's actually 'laundering' then it's invalid to begin with.
It's laundering in any reasonable meaning of the word. Whether it's legal according to the letter of the law is being decided.
Please differentiate morality and legality as well as intent and letter of the law.
> If anything AI code is cheap enough to even things out.
1) Do you think people have and will have access to the same models as large corporations internally, especially those who train LLMs themselves? Nothing stopping Google from excluding its own source code from the publicly available models but including it for internal models.
2) It's not just about the code, it's about the whole pipeline from nothing to a finished product and revenue stream. Did you know half the price of a new car is marketing? How much you can spend on ads, legal, market research, sales reps, etc. In some areas, especially B2B, nobody will even talk to you if you're a single guy in a shed, companies want stability, predictability and long term support.
3) More crudely, if you wanted to influence product selection or government elections, how many tokens could you afford for LLMs to influence online discussions, how many residential IPs could you afford, how much data could you buy about users to target each one specifically? Rich people will clearly have an advantage there.
Basically, if the cost of code goes towards zero, other factors will play a larger role.
> I think a lot of those people are consistent!
Only if they're consistently applying the rules to others but not themselves. Otherwise "permanent and total control over any idea they have" means they could never base anything on other people's ideas.
It's silly to say a human writing a piece of software is laundering their knowledge of existing software, even if they're trying to make a competitor to a specific thing. Legally and morally.
It's just a silly to say it's laundering when a machine does it.
>> abusing the copyright laws and the patent laws have been the most significant blockers of technical progress during the last few decades
> Can you give examples?
This is a subject so vast that giving examples requires a book-length text. IIRC at least one or two books have actually been written about this, but I am too lazy to search now for their titles.
I am more familiar with what happened in cryptography, where many algorithms have begun to be used only after the 20 years or more required for their patents to expire, while as long as patents remained valid, inferior solutions were used, wasting energy and computing time.
Regarding copyrights, I know best my own activity, but I am pretty certain that this anecdotal experience is representative for many programmers.
During the first decades of computer programming, until the seventies, there have been a lot of discussions about software reuse as the main factor that can improve programming productivity, and about which features of the programming languages and of the available programming tools can increase the amount of reuse, like modularity.
However all those discussions were naive, because later the amount of reuse has remained much lower than predicted, but the causes were not technical, but the copyright laws. Open-source programs have become the main weapon against the copyright laws, which enable the reuse of software nowadays.
However the value of software reuse has never been understood by the management of many companies. In decades of working as a programmer, I have wasted a lot of time with writing programs in such a manner so that whoever was my employer could claim the copyright for them.
There were plenty of opportunities when I could have used open-source programs, but I could not use them as there was someone who insisted that the product must contain "software IP" owned by the company. Therefore I had to waste time by rewriting something equivalent with what I could have used instantaneously, but different enough to be copyrightable.
There were also other cases that were even more annoying, when I had to waste time by rewriting programs that I had already written in the past, but in a different way so that there will be no copyright infringement. Some times the old programs were written when being employed elsewhere, other times they were programs written for myself, during my own time and on my own computers. In such cases, I could not use my own programs, as the employer would then claim copyright on them, so I would lose ownership and I would not be able to use them in the future, for my own needs.
There are many projects where I have wasted more time avoiding copyrights than solving problems. I believe that there must be many others who must have had similar experiences.
So I welcome the copyright-washing AI coding assistants, which can be employed successfully in such cases in order to avoid the wasteful duplication of work.
It all boils down to some people thinking they should be able to use other people's work for free.
> patents
Patents, unlike copyright, are not automatic. Which indicates that the people who expended their limited lifetime to invent the algorithms explicitly did not want you using them, at least not unless you came to an agreement with them first.
---
re rewriting:
There's your real problem. Copyright should belong to the people doing the actual work, not owners/employers who perform no useful work.
If that was the case, the person who did the original work would have no reason to prevent you from using it, as long as he could also benefit from the fruits of your combined labor. For him, the work was already done, it would be extra reward. For you, it would be profitable as long as his reward was less than the cost of you doing it from scratch. You'd most likely meet somewhere in the middle.
Same situation when rewriting your own work.
As often happens, a system was put in place for good. Rich people found a way to exploit it. Now, instead of trying to fix the system, you're arguing to remove it entirely, not realizing you'll be worse off in the end. LLM want to replace all programmers by using their work against them. This is not for your benefit, it's for theirs.
As I often say, what should be protected isn't creativity or expression but work. People should benefit from their work and it should not be used against them. It should also not be possible for someone to benefit without doing useful work.
---
Would you work for a company which develops software to detect homosexuals using public cameras and eye tracking? What about a company discovering and selling Android exploits to governments? Does it matter which governments? What about a company which tracks employee movements and productivity to such a level they have to pee in bottles to meet quotas?
The world is full of these examples but at least you had the choice of not helping them. Now you don't.
The people who own them are some of the most anti-social people on the planet and you think they should be able to use our work as they wish...
Nice, -4 points, somebody, many somebodies in fact, took that personally and yet were unable to express where they disagree in a comment.
Look, if you think I am wrong, you can surely put it into words. OTOH, if you don't think I am wrong but feel that way, then it explains why I see no coherent criticism of my statements.
When your comment is about how you can’t take your counterparty seriously and they’re a joke, you’re incentivizing people who disagree to just downvote and move on.
The signal you’re sending is that you are not open to discussing the issue.
Meanwhile I expect that intellectual property protections for software are completely unenforceable and effectively useless now. If something does not exist as MIT, an LLM will create it.
The playing field is level now, and corpo moats no longer exist. I happily take that trade.
Because AI is also proving to be very good at reverse engineering proprietary binaries or just straight up cloning software from test suites or user interfaces. Cuts both ways.
Oh sure, AI is a fantastic protection against copyright law. You do realize that if you're not going to be able that you wrote something you're wide open to claims of copyright infringement, especially if your argument is going to be 'it wasn't me that did the RE, it was the AI, the same AI that wrote the code'.
It's going to be very interesting to see 'cleanroom' kind of development in the AI age but I suspect it's not going to be such a walk in the park as some seem to think it will be. There are just too many vested interests. But: it would be nice to see someone do a release of say the Oracle source code as rewritten by AI through this progress, just to see how fast the IP hammer will come down on this kind of trick.
Have you ever seen what obfuscation looks like when somebody puts the effort in?
Not to mention companies will try to mandate hardware decryption keys so the binary is encrypted and your AI never even gets to analyze the code which actually runs.
Companies have been encrypting code to HSMs for decades. Never stopped humans from reverse engineering so it certainly will not stop AI aided by humans able to connect a Bus Pirate on the right board traces. Anything that executes on the CPU can be dumped with enough effort, and once dumped it can be decompiled.
You are agreeing with me, you just don't know it yet.
1) The financial aspect: As you say, more and more advanced DRM requires more and more advanced tools. Even assuming advanced AI can guide any human to do the physical part, that still means you have to pay for the hardware. And the hardware has to be available (companies have been known to harass people into giving up perfectly moral and legal projects).
2) The legal aspect: Possession of burglary tools is illegal in some places. How about possession of hacking tools? Right now it's not a priority for company lobbying, what about when that's the only way to decompile? Even today, reverse engineering is a legal minefield. Did you know in some countries you can technically legally reverse engineer but under some conditions such as having disabilities necessitating it and only using the result for personal use?[0]
3) The TOS aspect: What makes you think AI will help you? If the company owning the AI says so, you're on your own.
---
You need to understand 2 things:
- Just because something is possible doesn't mean somebody is gonna do it. Effort, cost and risk play huge roles. And that assumes no active hostile interference.
- History is a constant struggle between groups with various goals and incentives. Some people just want to live a happy life, have fun and build things in their free time. Other people want to become billionaires, dream about private islands, desire to control other people's lives and so on. People are good at what they focus on. There's perhaps more of the first group but the second group is really good at using their money and connections to create more money and connections which they in turn use to progress towards their primary objectives, usually at the expense of other people. People died[1] over their right to unionize. This can happen again.
Somebody might believe historical people were dumb or uncivilized and it can't happen today because we've advanced so much. That's bullshit. People have had largely the same wetware for hundreds of thousands of years. The tools have evolved but their users have not.
> The financial aspect: As you say, more and more advanced DRM requires more and more advanced tools
Yeah I have broken cutting edge $15,000 HSMs used by fintech companies, with a flash drive. Not worried about this. Most HSM designers are solving for compliance, not security.
> The legal aspect: Possession of burglary tools is illegal in some places.
A security researcher like myself would be crazy to live in those places
> 3) The TOS aspect: What makes you think AI will help you? If the company owning the AI says so, you're on your own.
What AI company? I self host my LLM hardware on property I own. Also lets me remove all the censorship preventing use in security research.
None of your points concern me in the slightest. I can reverse engineer anything I want much faster now.
I spend a fun week during Christmas figuring out some really obfuscated bibary code with antidebugging anti pampering things in a cryptographic context. I didn’t use ghydra or ida or anything beyond gdb with deepseek chat in a browser. That low effort got me what I needed to get.
AI proponents completely ignore the disparity of resources available to an individual and a corporation. If I and a company of 1000 people create the same product and compete for customers, the company's version will win. Every single time. Or maybe at least 1000:1 if you're an optimist.
They have access to more money for advertising, they have an already established network of existing customers, they have legal and marketing experts on payroll. Or just look at Microsoft, they don't even need advertising, they just install their product by default and nobody will even hear about mine.
Not to mention as you said, the training advances only goes from open source to closed source, not the other way around.
AI proponents who talk about "democratization" are nuts, it would be laughable if it wasn't so sad.
>If I and a company of 1000 people create the same product and compete for customers, the company's version will win. Every single time.
As a person who works for a company with 25k people, I would disagree. You, a single person will often get to the basic product that a lot of people will want much faster than a company with 1k, 5k and 25k people.
Bigger companies are constrained by internal processes, piles of existing stuff, and inability to hire at the scale they need and larger required context. Also regulation and all that. Bigger companies are also really slow to adapt, so they would rather let you build the product and then buy out your company with your product and people who build it. They are at at a temporary disadvantage every time the landscape shifts.
The point wasn't about the number of people, the point was a company which employs that number of people has enough money which can be converted to leverage against you.
Besides that, your whole arguments hinges on large companies being inflexible, inefficient and poorly run. Isn't that exactly the kind of problem AI promises to solve? Complete AI surveillance of every employee, tasks and instructions tailored to each individual and superhuman planning. Of course at that point, the only employees will be manual workers because actual AI will be much better and cheaper at everything than every human, except those things where it needs to interact with the physical world. Even contract negotiations with both employees and customers will be done with AI instead of humans, the human will only sign off on it for legal requirements just like today you technically enter a contract with a representative of the company who is not even there when you talk to a negotiator.
Large companies are often inflexible and inefficient as a matter of deliberate strategy. I've found myself in scenarios where we have a complete software artifact that a smaller company would launch and find successful, but we can't launch it, because we have to satisfy some expectation we've set or do a complex integration with some important other system of ours.
A lesson from gamedev is that players will deliberately restrict themselves - sometimes to make the game more fun or challenging, sometimes to appeal to their aesthetic principles.
If/when superhuman AI is achieved, those limitations will all go away. An owner will just give it money and control and tell it to optimize for more money or political power or whatever he wants.
That's a much scarier future than a paperclip maximizer because it's much closer and it doesn't require complete takeover first, it'll be just business as usual, except more somehow more sociopathic.
The corporate moat is the army of lawyers they have. It doesn’t matter whether they win or not if you can’t afford endless litigation. Is the same for patents.
Funny, their army of lawyers seems incapable of stopping me from easily downloading pirated software or coding an open alternative to their closed-source software with AI if I wanted to..
You cannot keep a purely legally-enforced moat in the face of advancing technology.
Music is free, because music piracy is unenforceable so the law is irrelevant. Now, I personally buy most of my music on vinyl because I want to support artists, but absolutely nothing forces me to do that as all the music is available for free.
As far as I can see, the vast majority of people don’t pirate music these days (unlike 20 years ago). Most people wouldn’t even know where and how to pirate music. They just have Spotify or another streaming service.
In the sense of artists cannot expect to get any money for their work, yeah music's free. Becoming a meme or a celebrity on the grounds of personality is still fair game, to the extent that AI is not impersonating people effectively at scale yet.
Yet.
A whole bunch of people I watch on youtube (politics, analysts, a weatherman) are already seeing AI impersonation videos, sometimes misrepresenting their positions and identities. This will grow.
So, you can't create art because that's extruded at scale in such a way that it's just turning on the tap to fill a specified need, and you can't be a person because that can also be extruded at scale pretty soon, either to co-opt whatever you do that's distinct, or to contradict whatever you're trying to say, as you.
As far as being a person able to exist and function through exchanging anything you are or anything you do for recompense, to survive, I'm not sure that's in the cards. Which seems weird for a technology in the guise of aiding people.
Uhm... yes? The cost of downloading pirated music is essentially zero. The only reason why people use services like Spotify is because it's extremely cheap while being a bit more convenient. But jack up the price and the masses will move to sail the sea again.
The cost of stealing has always been essentially zero. Same argument can be made for streaming, and yet Netflix is neither cheap nor struggling for subscribers.
Ironically, I actually suspect the exact opposite. Linux has no real choice in this matter because most of the code is written by Google, Red Hat, Cisco, and Amazon at this point, and these big cos are all going to mandate their developers have to use AI coding agents. Refuse to accept these contributions and we're just going to end up with 20 Linuxes instead of one, and the original still under the control of Linus will be relegated to desktop usage and wither and die.
Generating software still token costs, generating something like ms-word will still cost a significant amount, takes a lot of human effort to prompt and validate. Having a proven solution still has value.
You can already generate surprisingly complex software on an LLM on a raspberry pi now, including live voice assistance, all offline. Peoples hardware can self write software pretty readily now. The cost of tokens is a race to zero.
That is not what i'm seeing. I've been coding intensively with claude code for the last 3 months: 200k lines of go, 1200+ commits, mostly using opus. I don't think i could have done this with a local LLM. Maybe on a M5 pro?
Qwen 3.5 122b is competitive with Opus 4.6, and runs at 35t/s on a Strix Halo. It is my daily driver.
Unlike Opus I can run abliterated models with censorship removed so it can be used for security research and reverse engineering and whatever I want with privacy, offline.
They can just generate the same code with an AI assistant, and then it is you who cannot claim that their code infringes the copyright that you claim for the code that you have written with assistance.
So neither of the 2 parties that have used an AI assistant is able to prevent the other party to use the generated code.
I consider this as a rather good outcome and not as a disadvantage of using AI assistants. However, this may be construed as a problem by the stupid corporate lawyers who insist that any product of the company must use only software IP than is the property of the company.
These kind of lawyers are encountered in many companies and they are the main reason for the low software productivity that was typical in many places before the use of AI assistants.
I wonder how many of those lawyers have already understood that this new fashion of using AI is incompatible with their mandated policies, which have always been the main blocker against efficient software reuse.
I don't think modified by a human is enough. If you take licensed text (code or otherwise) and manually replace every word with a synonym, it does not remove the license. If you manually change every loop into a map/filter, it does not remove the license. I don't think any amount of mechanical transformation, regardless if done by a human or machine erases it.
There's a threshold where you modify it enough, it is no longer recognizable as being a modification of the original and you might get away with it, unless you confess what process you used to create it.
This is different to learning from the original and then building something equivalent from scratch using only your memory without constantly looking back and forth between your copy and the original.
This is how some companies do "clear room reimplementations" - one team looks at the original and writes a spec, another team which has never seen the original code implements an entirely standalone version.
And of course there are people who claim this can be automated now[0]. This one is satire (read the blog) but it is possible if the law is interpreted the way LLM companies work and there are reports the website works as advertised by people who were willing to spend money to test it.
If they actually were decided to be infringements somehow, there are millions of different cases needed already, so it is already past the point of enforcement.
These sorts of things are almost never tested legally and it seems even less likely now.
It’s weird how people on HN state legal opinion as fact… e.g if someone in the Philippines vibecodes an app and a person in Equador vibecodes a 100% copy of the source, what now?
Ok, so a simplified summary of EU AI Act approach as of now:
Model outputs are not copyrightable at all, only human work. That means the prompt, and whatever modifications done to output by human, are copyrighted, but nothing else.
HOWEVER, that does not mean the output can not violate copyright. Output of the model falls under same "derivative work" rules as anything else, AI just can't add its own "authorship". So if you accidentally or not recover script for a movie with serial numbers filed off, then its derivative work, etc. Same with code.
There’s this thing called the Berne Convention. Countries that cooperate on copyright are going to standardize their interpretations on questions like this sooner or later.
if the code is legally public domain doesnt that make it gpl compatible? this would be a non issue for linux, the only thing that matters is its not stolen code that was originally under a different license thats more strict in a incompatible way
I find the strong anti AI sentiment just as annoying as the strong pro AI sentiment. I hope that the extremes can go scream in their own echo chamber soon, so that the rest of us can get back to building and talking about how to make technology useful.
Sounds dramatic, but it entirely depends on what "many" and "plenty" means in your comment, and who exactly is included. So far, what you wrote can be seen as an expectable level of drama surrounding such projects.
True - on Mastodon there is a very vocal crowd that are against AI in general, and are identifying Linux distros that have AI generated code with the view of boycotting it.
It cannot be understated how religiously opposed many in the woodworking community are to even a single table saw assisted cut making it's way to a piece of furniture, no matter how well designed.
Plenty see {{some_woodworker}} as a traitor for this policy and will never contribute again if any clearly labeled table saw cuts is actually allowed to be used in furniture making.
But I, a woodworker, can immediately see if the piece of wood that came out of the table saw looks like it should.
Also I, a programmer, can immediately see whether the "probabilistic device" generated code that looks like it should.
Both just let me get to the same result faster with good enough quality for the situation.
I can grab a tape measure or calipers and examine the piece of wood I cut on the table saw and check if it has the correct measurements. I can also use automated tests and checks to see that the code produced looks as it should and acts as it should.
If it looks like a duck and quacks like a duck... Do we really need to care if the duck was generated by an AI?
> Most programmers are bad at detecting UB and memory ownership and lifetime errors.
And this is why we have languages and tooling that takes care of it.
There's only a handful of people who can one-shot perfect code in a language that doesn't guard against memory ownership or lifetime errors every time.
But even the crappiest programmer has to actually work against the tooling in a language like Rust to ownership issues. Add linters, formatters and unit tests on top of that and it becomes nigh-impossible.
Now put an LLM in the same position, it's also unable to create shitty code when the tooling prevents it from doing so.
But how do you know it's cut to spec if you don't measure it?
Maybe someone bumped the fence aw while you were on a break, or the vibration of it caused the jig to get a bit out of alignment.
The basic point is that whether a human or some kind of automated process, probabilistic or not, is producing something you still need to check the result. And for code specifically, we've had deterministic ways of doing that for 20 years or so.
> And for code specifically, we've had deterministic ways of doing that for 20 years or so.
And those ways all suck!
It's extremely difficult to verify your way to high quality code. At lower amounts of verification it's not good enough. At higher amounts the verification takes so much longer than writing the code that you'll probably get better results cutting off part of the verification time and using it to write the code you're now an expert on.
I guess that the point being made by GP is that most software are a high-dimensional model of a solution to some problem. With traditional coding, you gradually verify it while writing the code, going from simple to complex without loosing the plot. That's what Naur calls "The theory of programming", someone new to a project may take months until they internalize that knowledge (if they ever do).
Most LLM practices throw you in the role of that newbie. Verifying the solution in a short time is impossible. Because the human mind is not capable to grapple with that many factors at once. And if you want to do an in depth review, you will be basically doing traditional coding, but without typing and a lot of consternation when divergences arise.
> And for code specifically, we've had deterministic ways of doing that for 20 years or so.
And none of them are complete. Because all of them are based on hypotheses taken as axioms. Computation theory is very permissive and hardware is noisy and prone to interference.
With normal artisanal coding you take your time getting from A to B and you might find out alternate routes while you slowly make your way to the destination. There's also a clear cost in backtracking and trying an alternate route - you already wrote the "wrong" code and now it's useless. But you also gained more knowledge and maybe in a future trip from A to C or C to D you know that a side route like that is a bad idea.
Also because it's you, a human with experience, you know not to walk down ravines or hit walls at full speed.
With LLMs there's very little cost in backtracking. You're pretty much sending robots from A to B and checking if any of them make it every now and then.
The robots will jump down ravines and take useless side routes because they lac the lived in experience "common sense" of a human.
BUT what makes the route easier for both are linters, tests and other syntactic checks. If you manage to do a full-on Elmo style tunnel from A to B, it's impossible to miss no matter what kind of single-digit IQ bot you send down the tube at breakneck speed. Or just adding a few "don't walk down here, stay on the road" signs on the way,
Coincidentally the same process also makes the same route easier for inexperienced humans.
tl;dr If you have good specs and tests and force the LLM to never stop until the result matches both, you'll get a lot better results. And even if you don't use an AI, the very same tooling will make it easier for humans to create good quality code.
That would be great if you were a research lab with unlimited funding. But most business needs to grapple with real user data. Data they've been hired to process or to provide an easier way to process. Trying stuff until something sticks is not a real solution.
Having tests and specs is no guarantee that something will works. The only truth is the code. One analogy that I always take is the linear equation y = ax + b. You cannot write tests that fully proves that this equation is implemented without replicating the formula in the tests. Instead you check for a finite set of tuples (x, y). Those will helps if you chose the wrong values of a or switch to the negative of b, but someone that knows the tests can come up with a switch case that returns the correct y for the x in the tests and garbage otherwise. That is why puzzle like leetcode don't show you the tests.
Of course tests can't be perfect, but even a flimsy guardrail and a warning sign before a ravine is better than nothing.
The optimal solution would be to encase the whole thing in blast-proof transparent polymer, but nobody has the money to do that :)
Trying stuff until something sticks was not a solution when a human had to do the trying and every line of code cost money.
Now you can launch 20 agents to do slightly different things to see if something sticks - and still do the manual work yourself for the 21st path. The cost for those extra 20 attempts is next to nothing compared to the price of an actual programmer.
What these hardliners are standing for, I have no idea. If the code passes review, we're just arguing about hues of zeros and ones. "AI" is an attribute that type-erases entirely once an engineer pulls out the useful expressions and whips them into shape.
The worst part about all reactionary scares is that, because the behaviors are driven by emotion and feeling as opposed to any intentional course of action, the outcomes are usually counter productive. The current AI scare is exactly what you would want if you are OpenAI. Convince OSS, not to mention "free" software people, to run around dooming and ant milling each other about "AI bad" and pretty soon OSS is a poisonous minefield for any actual open AI, so OSS as a whole just sabotages itself and is mostly out of the fight.
I'm currently in the middle of trying to blow straight past this gatekeepy outer layer of the online discourse. What is a bit frustrating is knowing that while the seed will find the niches and begin spreading through invisible channels, in the visible channels, there's going to be all kinds of knee-jerk pushback from these anti-AI hardliners who can't distinguish between local AI and paying Anthropic for a license to use a computer. Worse, they don't care. The social psychosis of being empowered against some "others" is more important. Either that or they are bots.
And all of this is on top of what I've been saying for over a year. VRAM efficiency will kill the datacenter overspend. Local, online training will make it so that skilled users get better models over time, on their own data. Consultative AI is the future.
I have to remind myself that this entire misstep is a result of a broken information space, late-stage traditional social, filled with people (and "people") who have been programmed for years on performative clap-backs and middling ideas.
So fortunate to have some life before internet perspective to lean back on. My instinct and old-world common sense can see a way out, but it is nonetheless frustrating to watch the online discourse essentially blinding itself while doubling down on all this hand wringing to no end, accomplishing nothing more than burning a few witches and salting their own lands. You couldn't want it any better if you were busy entrenching.
Doesn't matter. Linux today is a toy of corporations and stopped being community oriented a long time ago. Community orientation I think these days only exists among the BSD and some fringe linux distributions.
The linux foundation itself, is just one big, woke, leftist mess, with CV-stuffers from corporations in every significant position.
The idea that something can simultaneously be "woke [and] leftist" and somehow still defined by its attachments to corporations is a baffling expression of how detached from reality the US political discourse is.
The rest of the world looks on in wonder at both sides of this.
Only if you let it. You can own the means of production. I self host my daily driver LLMs in hardware in my garage.
Never given money to an LLM provider and never will. I only do work with tools I own.
reply