A Github Copilot thought experiment
Introduction to Copilot
Probably one of the biggest game-changers in software development in the last ten years has been the release of GitHub Copilot, billed by the company as the AI-powered pair programmer – your helpful buddy who suggests code snippets based on the contents of comments in plain English or on the developer’s (or group of developers’) code.
Copilot takes tab completion to a whole new level, suggesting snippets, common functions and solutions to users, and therefore speeds up development, reduces the need for debugging, obviates typos, and generally cuts out all the tedious searching through personal and online repositories of code. Most devs will spend a decent chunk of time each day searching Stack Overflow, Googling for answers or combing through archived projects. Copilot helps code pros spend more time coding, and less time in semi-permanent fogs of “How did I do that, last time?”
The current state of Copilot
Both criticisms and accolades have been leveled at Copilot, which are summarized here in a far-from-complete-list:
- Microsoft owns GitHub, Microsoft is evil. Therefore CoPiot is an embodiment of evil. This opinion is more common among the generations of IT pros who can remember when Microsoft’s attitude to open-source was, putting it mildly, pretty negative.
- Students of coding will be able to “cheat,” either in exams or in terms of being able to progress without really learning the ropes properly. While the former may be true in situations where examiners aren’t too switched on, the latter is probably not relevant as Copilot uses the developer’s (or student’s) own work as the basis for most suggestions and autocompletes – although some observers say Copilot is less selective, see below. Copilot also makes some pretty dumb errors, which (while they still exist) should be a red flag to any examiner or interviewer for junior programmers’ positions.
- Copilot has learned from a huge body of work, much of it open-source, and is now monetizing work never intended to be monetized, or monetized under different terms from the ways that Microsoft is creating an income. Accusations of GitHub’s repositories being used as a learning corpus are not provable one way or another: Copilot code is proprietary and effectively a black box. Where you stand on this issue depends on how much you are willing to give the company the benefit of the doubt.
- Copilot might suggest code snippets released under licenses that are at variance with the snippet’s eventual use. e.g., a proprietary application may unwittingly contain GPL-licensed code provided by Copilot, or code released as fully open-sourced may contain proprietary code provided by Copilot. There have been some comments in developers’ forums and other places where geeks hang out that other people’s code is being suggested to developers who weren’t the original author. The question here is around how much code can be considered to constitute someone else’s. (In the early days of electronic music, for instance, a two-second sample of another musician’s work was considered fair game – more than that and the copyright lawyers would sit up in their coffins.) In programming terms, it’s debatable at what stage code becomes unique, or at least, identifiable as unique. Lawyers may begin to circle over code disputes before long – see our thought experiment, below.
- The code it writes can be of low quality. That’s an issue that will, in all likelihood, change for the better in the coming months and years.
- It’s free for students and for those who contribute to a notable degree to open-source projects. These are good things.
- It acts as a superb aide memoir for developers – a kind of helpful library of past knowledge, plus as a source of ideas or inspiration about how things might be done. This too is a positive for developers under time pressure.
- Project leaders can establish conventions (such as common nomenclature) that, if used by their team, ensure that completions follow the same patterns and criteria. Essentially, the same “code buddy” works with all members of a team, keeping conventions and guide rails in place, and therefore, the macro project more on target.
Our thought experiment attempts to extrapolate some of the themes above and extend them into predictions as to what the future may hold for Copilot, its users, and software development as a whole.
- Assume Copilot is being overseen by managers at GitHub and Microsoft. These people will be expected to generate income from Copilot, and each year (or financial period), increase net income year-on-year.
- After a period of growth, Copilot’s paying user base numbers will stabilize, or will grow only by nominal amounts. In short, those who want to or have to use Copilot will be doing so.
- At this point, to maintain revenue growth, fees will have to increase (but not to such an extent that too many users will unsubscribe).
- To encourage waverers to stay, or newbies to sign up, more features will be added and the AI should get smarter too.
- Additionally, suggested code completions will need to be more accurate, more powerful and save more time than previously; again, to justify those higher monthly fees.
- It seems inevitable that to increase the value of code suggestions, more involved and complex code will be suggested, and be attuned more accurately to each developer’s work context. That increases the chance of software license infringement, or copyright issues – both of which have the potential to be massively costly to users, and by proxy, GitHub and Microsoft.
- This means that the corpus used by Copilot will have to include the entirety of code held on the GitHub platform (if it doesn’t already), and Copilot will have to obfuscate results to avoid the circling of legal vultures.
- Developers writing code that they wish to be released under one or other flavor of open source licenses will likely be aware that anything hosted on GitHub may be used by anyone without any reference to the terms of the license under which the code is published, if any.
- Similarly, developers writing code for proprietary software will likely be aware that anything hosted on GitHub may be used in any other developer’s code without payment of due license fees, royalties, etc.
- It becomes apparent that any form of open publication of code (such as on GitLab, self-hosted Gitea instance and so on) will be the basis, either now or in the future, of a corpus for Copilot or a project like it. Copilot-like engines will scrape internet sites for code examples and subsume them into their own bodies of knowledge.
- Open publication of code for the collective good declines as a method of distribution and as an ideology based on collectivism. Copilot and its ilk will encourage insularity, “hoarding” of patches, slower release of security updates, a scarcity of new code made public, slow or non-existent revisions to existing code, fewer helpful libraries, and unmaintained frameworks.
- Technical innovation stagnates, software becomes less powerful, less secure and a much greater proportion is held as property of companies and individuals, rather than distributed for the collective good.
Clearly, this thought experiment could quite easily have moved in quite a different direction, with the final prediction along the lines of every piece of code gets written faster, and is more powerful, thus bringing benefits to the entire technology community and the human race in general.
It’s a simple case, arguably, of an argument for the glass being half full or empty. Whatever your opinion, the decision to engage with Copilot — and its inevitable imitators and competitors — should, this author believes, be taken with more forethought than “it makes my life as a developer easier, and it’s cheap (or free), ergo it must be good.”
30 November 2023