Grassroots AI: beyond the moonshot

Abraham Marin-Perez
10 hours ago
14 min read

It would appear that, with AI, everyone is trying to go for the moonshot, the magical combination of elements that will make code write itself autonomously, reliably, sustainably and, more than anything, cheaply. This is not a new idea, and it's not working.

Look, I get it, I understand the allure of the moonshot: breaking new ground, creating new paradigms, being the celebrated hero on the front page of Times Magazine. After all, if you're not disrupting an industry, what are you even doing? I can see what a massive win it would be if we managed to create that system that writes code while no one is looking, and I could almost forgive people for trying if it weren't for the countless cautionary tales available in the software lore.

"We will no longer need to look at the code", a history

XML was first published in 1998. At the time it was supposed to be a revolution, it came with all sorts tools: XSL, XPath, XSLT... Documents written with XML were supposed to be composable, and the vision was that documents all over the internet would be connected to each other forming an interconnected information landscape. It was at that time that we started talking about the semantic web. The future was rosy.

The truth is that XML was clunky, nobody really liked it, but it was ok because it wasn't meant for humans, it was meant for machines. You weren't expected to read or write XML documents, you would only deal with graphical interfaces that would interpret and modify the XML content for you. Fast forward a couple of decades and XML has been largely replaced by JSON and YAML, which are more programmer-friendly. It turns out that people do like to read and manipulate files directly.

It was around that time that the first graphical IDEs like KDevelop and Eclipse came up. They had tons of plugins and graphical interfaces that showed multiple views of the project: one panel for Ant commands, another panel for listing the available methods, etc. This, together with the smaller screens and lower resolutions of the time, meant that the code panel was a tiny window in the middle, barely showing 15-20 lines of code; this

was ok because you weren't really expected to look at the code, you only interacted with graphical elements. Creating a method implied File -> Add -> Method (or something to that effect), and you'd get a dialogue box where you entered name, parameters, at al. Adding a dependency to Maven also had its own dialogue box. You didn't interact with the code, that was for machines. You interacted with a UI.

An early version of Eclipse IDE. Not a lot of space for code; not that it mattered.

Today, IDEs make the code panel as big as possible, or even show you two or more code panels in parallel. There are even minimalist options like Sublime that get rid of all the graphical elements so you can focus entirely and solely on the code.

That was also the time when Poseidon came up. UML was all the rage, OOP was taking off, and Poseidon combined all of that with a renewed promise: you draw the entities that compound your program using UML, you indicate the relationships and the attributes, and I generate the code. Because you are a highly-valued engineer and you shouldn't waste your time and talents writing code. No, sir.

The God of the Seas was apparently very good at programming.

Needless to say, that didn't stick. Their defenders would say things like "a junior engineer reads code, a senior engineer reads diagrams". They'd compare themselves to architects who handle blueprints, not bricks. But, in the end, we went back to the code.

ETL, LowCode, NoCode... I could go on forever. The dream has always been the same: we will no longer need to look at the code. Every time doomsayers like me would mutter "we've tried that before". Every time we'd hear back "but this time is different". And every time it became yet another broken dream. And every time we went back to the code.

In fact, what history has taught us is that not only we wouldn't give up on manipulating code directly, we doubled down! We took activities that were traditionally managed through configurations, settings, and UIs and we turned them into code too, giving birth to Infrastructure as Code, GitOps and, eventually, Everything as Code.

The code is the spec

In today's "but this time is different" camp, the theory is that you write a detailed spec of what you need and AI creates the code for you. So much that people are comparing AI specs to compilers, saying that these are just another layer of abstraction. They claim that detractors of AI-generated code are just like the early programmers who didn't like high-level languages and preferred to continue using assembly. However, this analogy breaks easily upon scrutiny.

A programming language is typically described by a BNF grammar that can be used to verify whether it is syntactically correct. It also has a compiler that transforms the source language into a target language with mathematical precission. The use of the word "mathematical" is no exaggeration here: a compiler truly can be described as a mathematical function that transforms an input into an output in a well-defined and predictable way. In fact, this quality has threatened the very concept of software patents since its inception: all software can be rewritten as a mathematical formula using lambda calculus, and mathematical formulae cannot be patented. But I digress, the point here is that, when you compile source code, you know exactly what you're going to get (or at least what you are supposed to get).

The same is not true of AI; or, rather, of LLMs. Natural language cannot be formally defined. Natural language is messy, ambiguous, and, worst of all, its meaning changes with time, location, and culture. Or, as Wittgenstein put it, "the meaning of a word is its use in the language". That's why lawyers have such a hard time drafting agreements that won't come back to bite them. In theory, you should be able to write a spec in natural language that indicates what you want. In practice, that spec has the potential to be misinterpreted, or to miss implicit knowledge, or to be based on uncommunicated assumptions. AI, obliging as it is, will try to produce a program that matches your spec, but there will be inevitable hallucinations, omissions, and deviations. What's more, taking into account that LLMs are non-deterministic by design, invoke it twice in a row with the same input and you'll have different outputs. The overarching behaviour may be preserved, but tiny drifts will be introduced here and there. And here is where human expectations clash: nobody likes when buttons are randomly moved about, even if they function in exactly the same way.

The solution is, allegedly, very simple: make sure that your spec contains all the relevant detail, all the requirements, all the edge cases, all the knowledge. Make sure that it's written in a way that leaves no room for ambiguity. Make sure that it explicitly discards any potential misintepretations. Make sure that all stakeholders have reviewed and approved it. In summary, make sure that your spec is well-defined. Sure, and a side of fries, please. The well-defined spec has been the holy grail ever since software development started as an industry. And it has never happened. The well-defined spec is the software equivalent to the spherical cow.

But let's play along and assume that we can indeed create a well-defined spec that contains the minutiae of the project, now we find ourselves facing the perfect map paradox: the only way to specify without room for error everything that the code has to do is to specify it in as much detail as the code itself. The map becomes as big as the territory that it's trying to describe. The spec becomes redundant because it no longer is an abstraction of the code, the spec has turned into a copy of the code and, for that, we already had the actual code.

Ultimately, the code is the only source of truth.

After all, it's a matter of basic entropy, perfectly encapsulated in Plato's Theory of Forms: you can envision an abstraction from the particular, but creating the particular from the abstraction will always be imperfect.

Creating the spec with AI

In predictable "AI all the things" fashion, defenders of the newfangled Spec-Driven Development argue that you can embed AI in all your business interactions so the specs write themselves automatically (sounds familiar?). Do you have a meeting? Enable AI note-taking so the conclusions of the meeting are automatically available. Did you record a user research session? Ask AI to extract the main points. Do you now have more documents than you could possibly deal with in a lifetime? Use AI to analyse the lot and distill the main points. Is this information sparse, contradictory, and potentially wrong? AI will fix that for you.

It is at this point where we need to learn from those who have been dealing with potentially unreliable data for decades: the Intelligence Community, aka CIA, NSA, SIS, and their friends. First of all, we need to distinguish between three concepts: data, information, and intelligence.

Data: facts, with no judgement or analysis attached to them. Things like "Bernice said X", or "200 toys were sold in March".
Information: structured data, organised and filtered in such a way that it becomes relevant in a particular context or for a particular purpose. Things like "toy sales have been in steady decline".
Intelligence: the analysis of information from the perspective of goals, values, and environment so as to generate actionable directives. Things like "competition is eating into our marketshare thanks to their more effective marketing".

AI can easily give you a lot of superficial data, most of it correct. AI can try to help generate information from the data, but this will often miss important nuances that are not written anywhere because they belong to the collective, unspoken knowledge (common assumptions, institutional inertia, cultural cues, etc.). If you try to use AI again to go to the next step, all the tiny little errors will compound to the point of creating intelligence that seems superficially fine, but that will have enough inaccuracies so as to make it unusable. Kind of like making a photocopy of a photocopy of a photocopy.

"O ye of little faith! That has a simple solution, you just need people to review and amend, if necessary, all the content that AI generates. So after a meeting, people review the minutes and approves them. Then, after AI has consolidated them, people review the summary and approves it. Then, when the spec is written..."

Go on, I'll let you finish, you can write your own rebuttal when you're done.

But dark factories!

Ah, yes, dark factories. In case you haven't heard, dark factories (or lights-off factories) are factories that are so incredibly automated that they don't even need humans any more. And, since they don't need humans, they don't even need to keep the lights on, so they run dark. So very cool. This idea is getting some people's juices running with the next sci-fi analogy: AI is our dark factory, you set a bunch of agents loose and they create code without any human intervention. "I built a crypto trading platform while I was walking my dog". That kind of thing.

Let's take a couple of steps back. First of all, there is the matter of time scales. The industrial revolution started almost 250 years ago, and only now we're beginning to talk about fully automated factories where there are no humans involved. The whole software industry hasn't been here for more than maybe 70 years, and AI for less than that; it is excessively optimistic to think that we're going to turn a highly manual job into a fully automated one in such a short time.

Second, there is a fundamental difference between manufactoring and software: in manufactoring, you produce multiple identical copies of the same item. Since you're always building exactly the same thing, you can refine the process to remove all of the kinks and have it running autonomously, or at least try. But in software we're constantly changing the product. What we deliver today is different from what we delivered yesterday and from what we will deliver tomorrow. If we were to compare software with, say, a car factory, it'd be one where each car was a redesign of the previous one, never two cars the same. I honestly doubt you can run that dark.

And finally, we don't even have that many dark factories, the vast majority of them still need human operators. What's more, even the ones that claim to be fully automated still need humans to review, clean, replace, and recalibrate the many parts and sensors used in the factory. To top it all, they're not even really dark, because they need monitoring, usually through cameras, and those work best with the lights on. The concept of a lights-off factory is mostly a marketing ploy.

"Ok, maybe not now, but we're going in that direction. We're getting there. I'll be sipping margaritas by the pool while my computer is doing my job. Retire at 27. I can't wait."

Oh, my sweet summer child, I'm sorry to burst your AI bubble but that's never going to happen. The reason is, again, in the history books.

Progress is a moving target

It may feel that, once the technology matures, we'll be able to fully automate what we do today and get to the point where humans are no longer in the loop. Or, if they are, they'll be just orchestrating and maintaining. The key issue is "what we do today".

One of the main reasons we haven't got fully automated factories after 250 years of industrial revolution is because we're not manufacturing the same things that we did when we embarked upon this journey. If we were building the same things with the same level of quality and precision, we would probably have fully automated factories. But once we have a new tool, like the steam engine, electricity, or AI, we don't just use it to do the same things that we've always done, we push it to see how far we can go. We push and push until we get to its limits. And then, in order to do things beyond those limits, we get the humans involved.

This is why predictions like that of the great economist John Maynard Keynes's close to 100 years ago turn out to be so wrong. Keynes argued that, as mechanisation advances and people become more productive, we wouldn't need to work so much. He said that, by 2030 (almost there), people would work at most 15 hours a week, and that our main issue would be facing boredom. Imagine that.

Essentially, we will never get to the point where AI, or any other form of automation, can solve all of our problems because, if it ever does, then we'll come up with new problems until we find the ones that AI cannot solve. A sort of a reverse hedonic treadmill. We will keep raising the bar to the point where technology isn't quite enough and we still need the humans to take charge. The Peter Principle, but applied to technology.

If this all sounds a bit abstract, let me put it in more concrete terms with a practical example that I experienced recently: using AI to create a Java application using Spring Boot 4. I was using GitHub Copilot with Opus 4.6, which has a knowledge cutoff of August 2025. Spring Boot 4 was released in November 2025. This means that Opus 4.6 has not seen any code written in Spring Boot 4, and therefore doesn't know how to write code for it. It does know Spring Boot 3, so that's what it writes. Most of it works, but there are breaking changes between Spring Boot 3 and 4 which means that some of it just fails spectacularly. I saw Copilot struggle, rewriting, failing. In the end, it decided to skip Spring Boot entirely and write some sections in plain Java. I'm talking 300 lines of raw OutputStream handling for something that could be done with a few annotations and a little boilerplate. When this happens you have two options:

Reject the new, stick to Spring Boot 3 (what Opus 4.6 knows) or even to the raw Java. After all, if you're not looking at the code any more, who cares?
Work with Copilot to instruct it what the differences between Spring Boot 3 and 4 are, and how to code in Spring Boot 4.

You may think that there is a third option: wait for a new model that does know how to write code with Spring Boot 4. However, models are trained using available code, so for a new model to know how to write code in Spring Boot 4, you need a lot of code already written in Spring Boot 4, and who is supposed to write that? Classic chicken and egg situation.

So there you have it: for AI to be able to do things autonomously without any human intervention, we need to stop progress, stick to the platforms, libraries, and tools that it already knows. Kaynes's flawed idea of progress. But if you want real progress, if you want something truly new, you'll have to get your hands dirty.

Grassroots AI: an approach that delivers

Hopefully I've convinced you by now that you will still have an eye on the code for the foreseeable future, and that you need to make sure that you understand it and own it; in other words, that the moonshot of becoming code-agnostic is unattainable[1]. So, instead of trying what has failed time and again, why not trying what has succeeded time and again? That's right: iteration, organic growth, compounded benefits. Grassroots progress applied to AI.

You see, it takes time for the performance gains of a new technology to be really palpable, it needs to permeate all elements of the industrial fabric. The same thing happened with computers, something that Nobel Prize-winning Robert Solow was quick to note in what has been coined Solow's Paradox: you can see the computer age everywhere, except in productivity statistics. What you need to do is check that your initiatives present actually tangible progress before you call victory.

This is easier said than done. A lot of the individually-perceived progress turns out not to be progress at all at the holistic level, and sometimes it's even a drag. That's because a lot of the perceived performance gain is not really saving effort, it's just shifting the effort somewhere else, often where it's more expensive. This can happen because some of our initiatives make one person faster a the cost of making everyone else slower (net loss), or because a short-term gain is achieved through a long-term loss (debt).

The answer: slow down so you can go faster. Identify areas where you, as an individual, are not providing a huge amount of value, and see if you can delegate that to AI. When you do, give yourself some proper evaluation time before you introduce the next improvement. Chances are your idea won't work, or at least not fully as intended; you need to give yourself time to find the wrinkles so you can learn from the experience. Mistakes are the best learning tool, but we can't learn from them if we don't give ourselves enough time to realise that we have even made them.

"A complex system that works has evolved from a simple system that works. A complex system built from scratch won't work", John Gall.

I'll give you an example you can try: linting. We all know about the usefulness of linting, it provides a predictable, consistent format to our files so we know what to expect. It makes reading files easier and, given that we spend about 10 times more time reading than writing code (probably even more these days) this sounds like a net gain overall. We typically have two types of linting problems:

Mechanical: linting that can be done automatically, like indentation or line length
Semantic: linting that requires some knowledge of the code, like variable names

Mechanical linting can already be handled automatically, and hopefully you already have some kind of CI/CD job that looks after this. Semantic linting is something that AI can help you with: run the linter, collect the violations, ask an AI agent to resolve them. Easy to set up, clear value-add, low-risk if the agent makes a mistake. It allows you to stop thinking about linting while you're developing, freeing up some cognitive load, and it provides a benefit to everyone, not just you. Will it work? not at first, the AI linter will likely try to solve violations in the most simplistic way, and you'll have to refine your instructions, but you'll gradually refine them to a point where you can trust the output of the agent, and this experience will give you the tools for the next iteration.

This is a pattern that has given me good results: a deterministic, non-AI tool that identifies issues, paired with an AI agent to address them. After linting, you can try code coverage, or static code analysis, or vulnerability scans. Or several other ideas that I covered during a talk with the GuateJUG (in Spanish).

Whatever approach you take, your AI strategy will only work if you think about it as delegation instead of abdication. Go slow, remain in control. As I previously said, AI can help you do your job, but it's still your job.

Footnotes

[1] this doesn't mean that moonshots are useless, there are a lot of spillover benefits. As Jodorowsky said, if you shoot arrows at the Moon you'll never hit it, but you'll master archery. Moonshots are great as a learning tool, just not so much for delivering.