24 minutes
AI and the future of software development
In this text I’m going to speculate on how AI will affect the software engineering profession.
I’ve been wondering for a while what my job will look like in five or ten years or so. Could it be substantially different then? Could it even no longer exist?
To be honest, the last two years of AI breakthroughs and hype have taken me a bit by surprise. I had heard of machine learning before, but it seemed to stay stuck on pretty much the same level for a long time. Maybe I could have noticed that something was brewing in the face of subtle improvements in autocomplete predictions, translation tools, voice assistants, and so on. When things change gradually, it can be easy to miss the wider implications. Maybe it was down to actual breakthroughs in recent years. The theorists came up with new ideas (such as doing the same thing in larger). The hardware improved. Companies have probably started to figure out how to build AIs in a more “agile” way, splitting the workload among teams.
Whatever the main causes, we’ve clearly seen accelerated progress recently, and the hype around AI looks likely to turn into a self-fulfilling prophecy, psychologically and economically boosting the field. Similar to other high-tech moonshots, such as fusion energy or virtual reality, there is no clear point of falsification. There is no single finding that will easily show any of the promises to be unredeemable.
Further improvements to the processes, hardware, and theory all seem fairly likely at this point. Let’s assume that everything hits a snag, though! Will we live on pretty much the same in that case? My take is: No, we most definitely won’t. Even the current level of progress will have big implications. Just to have a multi-modal stochastic engine and instantaneous paraphrasing as a machine power is a major innovation. Harnessing this power, even at its current level, will turn a lot of things upside down.
How well can we actually harness this power? How will we stitch up Large Language Models (LLMs) with regular programs? How will we combine them with other recent developments in AI? How long will it take us to make the connections? What are the other things that need to change in the world to accommodate these tools? And how long will these changes take? Where will all of this lead us?
Repetitive and non-repetitive jobs
One thing that already seems pretty clear is that the more repetitive any given profession is, the worse it will fare in light of LLM capabilities. In ten years, hardly anyone might still be working as a proofreader or translator for example. Unpredictable workplaces such as kindergartens seem far less likely to be affected any time soon. What, then, about jobs with a medium level of predictability? I would argue that a lot of jobs are of that medium type, and that this includes quite a few software development jobs.
Of course, the predictability of everyday work varies a lot even within a single profession. Some software developers might be working in a job where they need to decide every hour whether to firefight some novel bug, go to a stakeholder meeting as promised, or add features to one of several projects. This is of course more of a pathological working environment that is best avoided. The other extreme is a software developer with a specialization that is quite deep but not very generalizable. This type often has to acquire a certificate to do the job, perhaps to set up and configure a certain piece of enterprise software, with just a few snippets of code added on here and there. For these types, changes might be coming sooner rather than later.
Of course some of the more repetitive software jobs have already fallen prey to progress long before AI was a thing. About 20 years ago, DIY tools and platforms such as WordPress started to crop up, dramatically reducing the effort to build simple websites – a task that might have otherwise grown into a major profession. Building simple websites had arguably become too predictable of a task to survive.
I think a lot of software development that exists today has settled in the medium range of predictability, or just above medium. According to Wikipedia, there are 27 million software developers in the world. A lot of us work in some web-related capacity at medium-sized companies, according to the numbers of job postings and the latest StackOverflow survey. Whatever the domain, I think all of us do some busywork at times that can be taken over by a fairly basic AI. Just consider for a second how often in your daily work you ask yourself: “Should I complete this task manually or is it worth it to do it in a more programmer-y way?”
A counterargument that often comes up in these discussions is that the software development job has plenty of unpredictable aspects: Grappling with code, according to that argument, is just one part of our duties. The other parts, where we interact with other developers or with stakeholders, are unlikely to be automated any time soon. Companies would make a mistake if they were to automate away developers, if that was even possible, because then they would lose the brain involvement and feedback that they get from them. These are good points, but I’m wondering how unique and unpredictable the interactions in question really are. In a typical product meeting we might say stuff like the following:
- Do you have a design for mobile as well?
- Have you thought that this might conflict with privacy laws?
- Have you considered that this change might affect the A/B test running in that same area?
- While we can make every event X trigger Y, have you considered that Y always entails Z in the current setup?
- By “user”, do you mean an anonymous visitor, or a registered user from the “users” table?
And it goes on like this. ChatGPT has been dubbed a “stochastic parrot”, yet if you’ve been in any software industry for long enough, you might start feeling like one yourself. These topics don’t even vary that much between individual companies. What does vary between individual companies is mostly their business domain and the way their software is built.
The weirdness of software
Another way to tackle the question how likely AI is to replace a profession is to ask yourself: how often do people succeed at that job? If even competent people have a high failure rate, there is no reason to assume that there is any training data of consistently high quality that would enable AI to do a better job, or even just a similarly good one. This is also a way to detect bullshit AI claims. In the near future, AI will not be able to predict whether someone will do well in a team, or in a relationship, or in a therapy, seeing how even experts struggle to perform such predictions reliably. Software development is a profession where failure is fairly common as well. It is no coincidence blameless post mortems have been invented here.
There are many reasons for the high failure rate in software, but I would like to focus your mind on three of them: the logical complexity of software products, their inconsistency, and their dependence on changing outside factors.
When it comes to logical complexity – by which I mean the sheer amount of business logic, the web of interdepencies in the code, and so on – I can totally see AI getting the hang of this. The other two challenges will be the trickier ones.
Pieces of software often have at least some degree of uniqueness. Worse than that, they are usually internally inconsistent. Large pieces of software contain compromises, repurposed structures, remnants of deleted features, refactorings and implementations that are still in progress, and so on. The individual styles of contributors and changing fashions adorn almost any codebase that has survived for long enough. Many a major old app has some unique hack that is fundamental to its behavior. As Ellen Ullman observed, “we build our computers the way we build our cities — over time, without a plan, on top of ruins”. Software is an immature and idiosyncratic field, both in terms of its evolutionary progress as well as in terms of its culture. Getting acquainted with a large codebase is often like delving into the personality of its creators. In this way, the difference between a software developer and a kindergarten teacher might not be that big after all.
At a more microscopic level, subtly mislabeled code is surprisingly common. We’re not always perfectly precise with our language, and not every prudent implementation lends itself to a semantically clear way of writing it down. Any minor code change that comes later may further diminish the adequacy of the involved variable and method names until everything is eventually rewritten.
These logical, stylistic and semantic inconsistencies mean that AI can’t reliably take shortcuts by looking for known design patterns, or by looking at semantics in the code. In other words, I think we’ve insulated ourselves from the biggest impact of AI for a while by making things an untractable mess.
What will prove even more challenging for AI is taking outside factors into account. Code is not written in a vacuum. We add error handling depending on the errors we’ve either encountered already or deem likely to encounter in the future. We structure nested loops depending on the relative sizes that the involved iterables have at runtime. We set up data stores depending both on the amount of data and the read and write characteristics that we expect, or that we’ve already observed.
To get a grip on the inconsistency of software and its dependence on outside factors, AI would basically need to run a system and be able to observe and describe its control flow, its behavior, and its load levels. Even then, it couldn’t predict upcoming changes that might require adaptation in advance.
How far can code completion go?
Plenty of room remains for AI tools to increase developer productivity, even if they lack the capacity to observe a system at runtime. Code suggestions that are picked and reviewed by human programmers will almost certainly get better over time. Improving them is perhaps as much a classical engineering and product problem as it is one of AI science. GitHub Copilot could probably read through a codebase, recursively summarize what it finds at different abstraction levels and from different perspectives (data model, logic flow etc.), store these summaries, and then ingest any relevant summaries when dealing with a given problem. Such an approach would enable it to do more, without requiring any increase to its AI capabilities such as the size of its context window. I think it will do something roughly like this at some point. As a result, it will gain the ability to produce larger sections of code that will fit in better with a given codebase.
Approaches such as CodePlan by Ramakrishna Bairi et al already demonstrate that, by appropriately structuring the LLM invocations, LLMs can not just churn out code snippets, but they can also carry out refactorings across a codebase - as long as that codebase is strongly typed or has sufficient test coverage.
However, the more code AI tools deliver, the more it will become apparent that we can’t just limit ourselves to checking the results for functionality. We will need to check for consistency with the rest of the codebase, for readability, and for real-world fit. We will have to check that the right abstractions are used and that they are used in a way that reflects the teams’ intention for moving the codebase forward. We will have to check that no existing abstractions are duplicated. We will have to check that humans can still understand the intention of the code. And we will have to check that the code is adequate for the live requirements encountered by the system. None of these criteria deal with the internal correctness of a given code snippet, yet when they are ignored this will quickly make any codebase untenable for humans and computers alike.
If we have to review the functionality, maintainability, readability and real-world adequacy of AI-generated code – or at least keep the AI mindful of our relevant standards and requirements – this will certainly limit the productivity gains delivered by code completion. The only impact of code completion that I’m fairly sure about is that we’ll need to type less, which will certainly please our wrists.
The situation is of course different when it comes to software developers who write smaller pieces of code that need to work well in isolation first and foremost. They are more likely to feel the squeeze. As a result, they might be pushed into more holistic development roles, putting a bit of a squeeze on this job market in turn.
AI and the future shape of software
While AI tools might not in themselves bring a dramatic change for all developers, they could encourage them to build their codebases in a way that makes it easier to apply such AI tools. The tools might work better when applied to less abstract, more repetitive code for example. This is, however, just one of several ways in which AI could affect the shape of software. And this is where it gets interesting as far as I’m concerned. It seems to me that there are quite a few ways in which AI could affect software design. Consider, for example, that we might start to write software in a more AI-friendly way simply due to the business requirement to integrate AI interfaces into it. Or consider that there could be an AI-fueled growth spurt of the low-code/no-code movement. In the long run, there could even be a technological shift moving us towards a new, AI-centric programming language. All of these developments might take off in parallel and combine in unexpected ways.
The first visible change in this direction will probably be a growing interest to integrate AI into existing applications. LLMs, even at the current level, open up the possibility to control software through vague, non-expert language, and make the software do stuff for you either right away or through a feedback cycle that is still quicker than manual control. Many products will develop chat interfaces, even if developers will need to wire up the triggered actions manually for now.
As soon as useful AI interfaces appear in widely used office software, better compatibility with AI will become a seductive argument in favor of adopting new technologies and carrying out rewrites and refactorings that facilitate AI integrations. Integrating with AI will also require an improved separation between the business logic and the demands of the UI or in-house API clients. High quality descriptions of endpoint behavior will become invaluable. As a result, this alone will drive software development into a direction where there is more standardization and predictability.
Once AI interfaces become more common, people will get used to formulating what they want instead of learning the specifics of individual apps. Concrete skills and mastery of tools will lose relevance compared to the mastery of concepts and ideas. People will let AIs do technical stuff for them in their daily life. People will be less and less inclined to manually go to a website in a browser, look through that website, and submit the right form after filling in the right data. Platforms that do not have powerful and standard-like APIs will suffer as a consequence.
Standardizing APIs might be just the beginning, though. In the long run, more software might be built in a highly standardized way. AI could be of help in using low-code tools, and these tools might also see greater adoption if they manage to generate software that is more readily integrated with AI. Classical no-code and low-code approaches may not have taken off as much as predicted so far. They have nevertheless started to take sizable chunks out of the B2B and B2E markets in recent years. This prior success might provide them with the resources they need to turn into more generic tools – both financially and in terms of training data. With their medium-complexity products, they will also be under particular pressure to innovate. More tools in the style of CodeWP might crop up as well, where the approach is to have an AI dialogue that adds pre-made code snippets to an adaptable piece of software, so that the AI only needs to write glue code at worst.
Large cloud providers are another natural early adopter when it comes to rolling out AI assistants and AI-enhanced UIs. With stuff like Step Functions they already have versatile no-code solutions in the pipeline that can only benefit from an AI integration. Many tasks in their existing admin UIs are repetitive and constrained by discoverability more than anything else. Once the cloud providers have AI interfaces, we might progressively see further services being offered through them. It might start with a simple “do you want some DNS records with that cluster?”, and at some point it might ask you whether you want to add a scalable authorization service, seeing how you have all those pesky requests on your sessions table. Such changes might remain constrained to the cloud hosters themselves, or they might trigger adaptations elsewhere. In any case, AI seems likely to cement the dominance of industry giants, with fewer startups still able to cross the chasm and catch up to them. The possibility of a self-improving, strong AI has often been dubbed a “singularity”, to indicate what a dramatic and potentially dangerous invention it would be. However, the only kind of singularity I see opening up right now is an economical one.
I find it frankly amazing that using ChatGPT has suddenly surpassed googling for some types of factual questions, and that Bing actually has a use case for once. I expect Google to come back with a vengeance anyway, and I also expect successful newcomers to remain the exception in a world where a few software giants hold almost all the potential training data for specialized AIs. As Fin Barr has pointed out, a major niche for AI startups might be in generating experiences that are outside the ethical mainstream. Such under-the-counter AIs could be an interesting source of socio-cultural innovation, but whether there will be a wider technological and economical impact from them remains to be seen.
A new programming language?
Depending on the future capabilities of AI, there might even be less of a need for bespoke software. If new software is indeed still written, it might look different than it does now. Andrej Karpaty once predicted a transition to “software 2.0”, with machine-friendly programs that are no longer human-editable. I don’t think such programs will become mainstream any time soon. Gibberish code is appropriate for stochastic purposes, and in cases with countless variables, but most programs created today remain strictly deterministic and employ way less variables than machine learning and neural networks do.
Francisco Marcondes and others have suggested another possibility [PDF]: LLMs could be built into a new programming language that works at an even higher level than today’s high-level languages. I wager this language would either be plain natural English (full of disambiguating statements and reading a bit like a law text), or it could come with a somewhat more specific syntax akin to BDD frameworks or RFC documents. There could also be a storybook-like visual editor for it. It could transpile to more than just one target. For example, it might emit a congruous bundle of infrastructure code, backend code, and client code. These transpilation results could still consist of somewhat readable code – if only to allow debugging the transpilation process.
To ensure the smooth operation of such a new descriptive programming language for the full stack, it would probably be convenient to freeze and standardize that stack. Maybe we don’t need new programming languages and new frameworks as urgently as we thought we did. Maybe security updates will do. There could be a future where everyone creates some infrastructure-as-code, perhaps indirectly, to deploy some kind of cluster, perhaps with just one VM in it if that’s enough, and they always run the same static language. The code is modularized internally, and across possible other services, in a standardized way. There is some arbitrarily scalable DB with okay-enough performance for pretty much any case, serving data in a standardized format to any client. The clients share most of their code, perhaps using web assembly for the web app, or inversely something like React native for native clients. All of these parts send OpenTelemetry data to some unified analytics tools. The result is a solution that is slightly sub-optimal in terms of performance for almost every case, but perhaps not enough to make it worthwhile to build a custom solution instead.
Software developers themselves would be the initial drivers of this change vector. We like shiny new things and new abstractions, and we certainly like new languages. It could start out in a very basic way – just some high-level description of a system that generates some integration tests and rough API specs, with most of the code still handmade. In the long run, however, this innocent experiment might turn into an avalanche of standardization that decimates everything in our field: programming languages, libraries, software architectures, deployment strategies, service architectures, databases, tools, and so on. It would foster monopolies and spell the end of many less widely adopted concepts and implementations.
This possible future is a more static one. There is less progress in terms of software trends, because it now matters less whether a specific framework or language runs somewhat faster, or offers slightly better abstractions, or has a few more batteries included. If humans have to deal with the actual code only in exceptional circumstances, and if AI can explain the code in detail in these cases (and perhaps even explain the associated business rationale), no one will care how the code looks and whether it is verbose or repetitive or scattered.
It seems fairly obvious to me that this would affect large parts of the current software job market. I’m not saying we will have business people build complex programs with this new language. They never really got around to writing their own BDD scenarios either. However, programmer types would certainly become more productive. Let’s imagine an average company with a bunch of teams: backend, frontend, infrastructure, and so on. A sufficiently automated and standardized stack, controlled by a robust new description language, could at some point replace all of these jobs with just a few wizards of that new language. In this possible future, smaller companies don’t even hire their own developers anymore. They outsource the development to agencies, who see notable growth in their productivity and ability to serve multiple clients.
A large number of people now seem to be able to create complex software products at first glance, so we still need agencies to employ people and give clients the guarantee of a certain level of competence. Of course it is still beneficial to know and understand individual business models, so the agency workers of the future look a bit different than those of today. There is less need for interlopers if transferring requirements to code is a fairly direct process. The agents might not work in teams as much anymore, except to provide QA. Instead, they might talk to their growing list of simultaneous clients more directly. Many might work in a way that is more similar to today’s freelancers, becoming self-reliant, perpetually learning, context-switching, note-taking, machine-whispering journeymen of sorts. They would work even more closely with stakeholders than today because they could immediately demonstrate possible changes in automatically generated illustrations and sandbox environments.
Software development is still not simple in this scenario. Once the default approach encounters real-world issues, we either need a self-observing and self-correcting system again, or otherwise developers will need to continuously add further specifications: Which error scenarios to handle how, which performance parameters to prioritize in which subroutines, which endpoints to optimize for what kind of scenario, and so on. It might not be necessary to specify how exactly that is achieved, e.g. by asking for certain algorithms, DB indices or whatnot. Perhaps the AI will be smart enough to deduce these solutions from the requirements at some point.
Some codebases in the old style still continue to exist and are still maintained at this point, but they are considered legacy for their behavioral opaqueness. It now feels archaic to have to ask a senior dev about how the software handles a certain case, and to have that senior dev debug it manually. For modern software, you just check the description document with its slightly strange, non-ambiguous, but still human-readable language. The opaqueness in this new approach is a technical one. The new language makes it highly obscure how its instructions are turned into instructions for the machine. It does not take long, however, for this to feel as irrelevant as the inability of most contemporary developers to envision the bytecode resulting from their JavaScript, or the control flow in a modern CPU. The black boxes just extend further up in the stack.
The consequences
In all of the scenarios laid out above, productivity will increase - in some cases more, in other cases less so. In the more extreme cases, incubators will be able to flood the world with startup software. App stores will be inundated with new uploads, much like publishing houses are already drowning in AI-generated dime novels right now. Software products could even be randomized and basically A/B-tested at the level of complete product units: themed, specialized, condensed or extended formats – you name it. Successful product formulas will be discovered and rediscovered at an increased pace.
Another thing that all of the above scenarios have in common is that they are all self-reinforcing. When software is created in an increasingly similar way, this eases the challenge of training a neural network on it. Take all the JIRA tickets in the world, use their contents as input data, and the linked git diffs as expected output. Different business domains are now akin to different handwriting styles that a number recognition network glosses over without much ado.
Some might say that automation does not always eliminate jobs and that more productivity simply allows a market to produce more goods. However, when I look through the job adverts today, I already see a lot of software companies with dubious value propositions. I’m not sure we need the ability to launch even more software. I don’t think the golden age of our profession is over quite yet, but its dawn might be setting in ever so slowly.
When exactly to expect changes is of course the million dollar question. Maybe AI investment enters an early boom-bust cycle and the integration of AI interfaces is delayed to avoid appearing out of sync. Maybe the jump from a low-code B2B offer to a more universal solution is too big to undertake for now? Even if a high-level language comes alive on top of a frozen stack, maybe a chaotically evolving stack can still beat it in some relevant way, for some time?
Is anyone going to invest the huge amount of time and money needed to rebuild software development from the ground up in an AI-friendly way? Or will it happen through incremental evolution and take so long that we should not waste too much thought on it?
Let’s assume that no-one is overly keen to do the big rebuild just in order to disrupt our market. Imagine that no low-code company jumps into the fray and that no new language picks up much speed. Things might still end up as described above step by step. Imagine a growing list of startups that solve minor and isolated problems with AI. Surely the tech giants will hoover them up and thus build a complete set of AI solutions over time. However, even if we disregard this likely scenario, such startups will still end up consolidating into larger companies slowly but steadily. They will develop ever more proven procedures for detecting and resolving recurring business needs at the code or service level. At the same time, developers are already making noteworthy progress when it comes to engineering prompts to generate scalable code. It seems likely that these two trends will eventually converge.
Having a new solution to a problem is of course just half the story. The other half is the adoption of that solution. In the software world, it can take many years for superior solutions to achieve widespread adoption. When it comes to the sphere of business and consumer decisions, things move even more slowly.
All in all, if I had to guess what my job will be like five years from now, I expect to be using AI a lot more, but I doubt the job market will be totally different for me at that point. Juniors might be pushing up a bit more because they’ll be able to learn quicker and will be under more pressure to exceed “AI level” productivity. Looking ten years ahead feels a lot more tricky. As described above, many routes could lead to substantial change after a while, and they might converge in unexpected ways. In the long run, our field might very well become unrecognizable – not for the first time, we should add.
For the time being, I’m more worried about other white collar jobs that have not had the same opportunity as software developers to entrench themselves in a chaos of their own making. Fears of all office workers losing their job within a decade sound a bit overblown to me. However, a more plausible ten, twenty, or thirty percent are surely bad enough. OpenAI’s CEO Sam Altman mused that AI might boost productivity and scientific progress so much that we just have to share the spoils and we’ll live in a kind of utopia where all our needs are fulfilled by self-replicating robots. An endearing scenario, yet one that feels at least a bunch of decades away. Designers, writers, customer service agents and so on will start to feel the crunch way earlier than that. It seems we’ll have to come up with some answers based on our current, all-too-human capabilities.