What might open-source generative AI mean for proprietary software?

What might the future look like now that open-source generative AI is a reality?
18 May 2023

From bedrooms and basements, the open-source community makes the world better.

Six months ago, when generative AI first exploded onto the tech world’s consciousness like a sentient tab of acid, offering answers to every question in the knowable cosmos, there was one very noticeable thing about it. While the name officially attached to ChatGPT was OpenAI, a research company with a very tight focus, the power behind the newly-invented throne of generative AI was Microsoft, and its piles and piles of shiny, burning research dollars.

The ensuing scramble to join the generative AI gold rush and establish a claim in the sudden “new world” was very much a race of the tech giants. Google practically fell over itself in its hurry to establish its Bard as a viable alternative to ChatGPT.

Microsoft and OpenAI blithely launched GPT-4, which could do more than ChatGPT had just done, more or less beating their own flush just as much as they’d beaten Google. Alibaba, bless it, with a timing for which it’s only possible to feel a profound sense of sympathy, announced its generative AI offering, Tongyi Qianwen, just in time for China to announce a complete crackdown on the technology, barring chatbots properly trained in solidly socialistic principles.

Rise and stumble.

Generative AI, and ChatGPT in particular (capitalizing on its first-to-market exclusivity) had a fairly messianic few months – going from being everybody’s favorite new toy and the herald of a brave new world of possibilities, to having programmers question the wisdom of democratizing the coding process, to leading AI scientists quitting Google on the basis of the potential that generative AI could become smarter than humans in a hurry and just possibly kill us all.

Italy put it in time out while it sought assurances on its data practices. Samsung fell foul of a lack of awareness of those data practices, unthinkingly giving ChatGPT some of its proprietary code and subsequently banning all use of the technology. China, as we mentioned, had a spectacularly socialistic hissy fit. And a collection of esteemed academics, industry figures, and ultimately anybody who felt like it, added their name to an open letter asking the industry for a pause in development of generative AI above the capabilities of GPT-4.

Sam Altman of OpenAI, testifying before Congress this week, acknowledged that the potential of generative AI was scary, and confirmed that whatever will eventually become GPT-5 (or ideally, something with a much catchier name) has not begun training yet, and won’t for at least the next six months.

An understood model of the world.

But the salient point is that all of this happened in a world where the model was familiar – multi-billion-dollar companies funding significant advances that they would eventually add to their product rostra and either charge for directly, or monetize in other ways. They were the kings of this advance, and the development, progress, speed and above all, the price of the advance would be theirs to dictate.

It was Scottish poet Robert Burns who famously said “The best laid plans of mice and multi-billion-dollar tech giants aft gang aglay.” Or so ChatGPT tells us.

And aglay (astray, or wrong) those plans duly went, when a version of Meta’s foundation model,  LLaMA (erratically acronymical spelling, but immediately more memorable than ChatGPT) was leaked to the open-source community.

The open-source community, in case you’re new to TechTown, is part army, part ant colony, millions if not billions strong, based all around the world, very techno-geeky and essentially composed almost entirely of the kind of people who could get us not only to Mars but out of the solar system before NASA had got its space boots on, so long as somebody said it couldn’t be done.

The open-source community is made up of puzzle people. They see puzzles, restrictions, limitations, roughnesses, and inconvenient, why-won’t-you-do-this-like-I-think-you-should issues as Rubik cubes to be solved in the fastest time and the slickest way, for bragging rights, pizza money, and just occasionally the score of a lifetime when some proprietary software house needs their solutions.

But mostly the bragging rights and pizza money.

And now, they have generative AI code to play with.

The certainties of life.

There are very few certainties in life – death, taxes, occasional crushing political disappointment and the fact that you look neither as good nor as bad as you sometimes think you do.

But if there is a single certainty on which the world literally depends in the 21st century, it’s that things get better when the members of the open-source community get their hands on them. Often cheaper, too, but always, always better.

That came to light late last week when a memo was supposedly leaked from an unnamed Google staffer, listing the many reasons why the traditional proprietary software houses could and probably should be losing their collective minds over the fact that the open-source community has generative AI code to play with.

And while we may never be entirely sure a) whether it came from a genuine Google staffer, or b) whether the views expressed in the memo are in any way indicative of Google’s private corporate internal monolog right now, neither of those things will ultimately matter, because the achievements documented in the memo are real, and verified, and have a defined timeline.

Things like LLMs on a phone, fully-functioning generative AI that take only around the power of a handful of threadrippers to use, rather than the resource-intensive versions of the technology, developed and deployed by the tech giants.

Most particularly of all, there are two ideas in the “leaked memo” that might well revolutionize the way the world interacts with generative AI.

First, that open-source development allows for smaller, more dedicated, more personalized generative AI models than the behemoth, potentially world-conquering creations that the proprietary giants have had to make, in order to justify their spend on the whole project.

That could mean you can get all the generative AI you need, in an easily-trained and personalized way, without paying the prices of the proprietary tech barons. About which, the argument could easily be made, what’s not to love?

And secondly, the haunting notion that’s backed up by undeniable evidence – that open-sourcers are delivering generative AI capabilities that are, right now, almost as good, and fast, and smart, as anything the giants have come up with. That they’re doing it with significantly less expense and compute-demand, significantly more versatility, and that, very soon – possibly before you finish reading this article – the open-source versions will overtake anything the proprietary houses have to offer, or can ever hope to catch up with.

The meaning of the LLaMA leak.

OpenAI may not be working on GPT-5 just yet. We’d be willing to bet that somewhere in a bedroom or a basement, someone is. Only it will be faster, and more useable, and more versatile, and crucially of course – almost insultingly cheaper.

What does all this mean for the proprietary generative AI giants? We suspect they’ll be trying to figure that out themselves. The idea of significant regulation of the technology was probably necessary in any case, but has gained support from some of the big players in relatively recent days. Could that curtail the operations of the open-source community?

Maybe – it could arguably impose rules around what could be legally developed except by players who were able to expensively commit to principles of corporate responsibility, creating a monopoly of capital investment that would shut out open-sourcers from actively profiting from their work.

There’s always the potential for a giant intake of open-source coders into the ranks of the tech giants, binding the coders and their developments to the advancement of the companies in return for a hefty sack of cash. That’s an extremely short-term solution, and only really half of one – the open-source community is also akin to a hydra: for every head you remove, two more spring up in its place, and then six months or a year down the line, you’re being out-developed again.

There’s the potential to sue for IP rights, but that’s practically impossible and highly frustrating – only Meta would realistically have a claim, and it could be easily argued that it gains much more by association with the ways in which the open-source community has improved generative AI on the initial basis of its foundation model than it would have by restricting use of that model to a monetized version and a relatively hard-won user-base.

Besides which, since the original leak, there are probably already a thousand “children” of the original model, all of which are significantly different enough from the parent, leaked version to warrant an individual identity. At which point, the only people getting rich are the lawyers.

The future, or something like it.

The most likely result is that the tech giants will have to grin and bear it. But for those predicting the end of the proprietary world in terms of generative AI, there’s bad news, too.

The market will likely settle and stratify, in much the same way as it has done in relation to other business tools – you have your Microsoft 365s, your Google Workspaces… and then you have a host of others that do similar things, but probably, when all is said and done, better. Less well known, and with less famous support networks in the event of anything going wrong, but out there and thriving, developing faster and in more bespoke ways than the behemoths can match. And cheaper. Always, always cheaper.

In terms of generative AI, the difference between the strata is likely to be more extreme and noticeable – at least until the giants begin aggressively copying the open-sourcers in providing smaller, lighter, more agile generative AI setups that can be customized and trained easily by the client with bespoke datasets relevant to their needs. (It might also be the case that the value of datasets rockets in response to these developments – like getting a cheap, fast, efficient games console, only for the price of the games to go up).

By which time, the open-source community is likely to have launched and grown a thriving market in exactly that kind of more personalized AI product, and established significant amounts of customer loyalty as a result.

The open-source invasion of generative AI is not, as such, the end of the world for the proprietary tech giants and their AI investments. But it does mean a relative democratization of the technology, which will strike a very great number of businesses – not to mention enthusiastic individuals – as an extremely attractive alternative to paying big business prices for less agile models.