Ian Mulvany

October 12, 2025

Where I am with AI right now - Genies, Bottles, and Capex

My bias

In this post I want to pull together my current thinking about AI. There are a few different threads that I want to cover in this post. Bear with me.

I am enamoured with technology. There is something here in GenAI and LLMs. I like them. I use them a lot. I have a strong positive bias. I have to guard against that. I probably don’t guard well enough. Most people don’t use them. They don’t move atoms (for now), and they certainly don’t cook for me.

But I’m going to give in to my more fanciful way of seeing things in this blog post.

Why we should engage as publishers

I’m convinced that these technologies are going to radically transform processes around the creation of knowledge, and in particular academic papers. That will have an impact on the industry that I work in. Much of the cost and infrastructure that scholarly publishing companies bear will need to shift to other ways of supporting the value chain.

There is a non-zero risk that this will create significant stress on existing companies.

We can’t put the genie back in the bottle, the technologies are her. As long as people want friendly fast answers to things, they are going to increasingly use these technologies. We might wish for a different scenario in terms of who controls these technologies, but we have to work with the world as it is. On that basis I am strongly in favour of publishing houses licensing content to these models to help make the models better.

I spoke about this in an interview with Wiley a few weeks ago - https://www.youtube.com/watch?v=4IKi6hBYJUg

We are in the business of creating knowledge. These tools are cultural and social technologies - https://www.science.org/doi/abs/10.1126/science.adt9819 - so our efforts to create knowledge in the world have to face their existence. We should endeavour to make these tools as useful as they can be.

Our corpus is mostly clean, mostly bias free, and potentially has embedded patterns that help guard against bias, and that are pessimistic about knowledge claims, on the whole. Such a view is an important view. It is not the view that you get about the world if you are treating the world as a human deals with the world. Humans like bias, we like stories, we like to be fooled. We need these machines to be capable of not being fooled, just as we are capable of rising about foolishness on occasion.

But it’s not all gravy, there are complications to take account of. Lot’s of complications.

The winds of change and the potential for disruption.

I don’t really know what technological disruption looks like. The industry I work in is mostly immune to it, in spite of many hopes and claims.

There are many groups that are not totally happy with the current state of things. Probably most groups are not happy (though it might be that the most important and influential groups are happy enough to not need to disrupt it – and I’m thinking specifically of Government for whom research efficiency is rarely the most important challenge that they need to address).

With the money that is pouring into AI, there is a new power imbalance in knowledge ecosystems. I have heard unofficially that some stakeholders are hoping that this might be a moment that makes it possible to eliminate scholarly publishers from the ecosystem. I don’t have a strong opinion on whether that would be a good thing, or a bad thing - any system will have trade offs. Maybe some new tradeoffs could be better? Nonetheless it is an indication of how disruptive some people are seeing these trends. I think scholarly publishers need to be ready to disrupt themselves, and I think that is going to be hard to do.

Six things to bear in mind - three things as well as who is paying, what they want, and what we want?

For some time I’ve been talking about LLMs in terms of what they know, and what they can do. I’ve finally realised that we need to also think about how they are. That latter perspective was inspired by conversations with my colleague Jocalyn Clark, who pointed out that we need to follow the money - always follow the money!

✱what they know✱ - when I talk about what LLMs know I mean the fragments of information that is stored directly in their training weights. There is a lot in there, but there are knowledge cutoffs, and the information is in a representative format that is a low resolution version of the underlying knowledge, but for most purposes it is more than good enough. For the first generation of LLMs we only had access to store what they knew, and a lot of folk got confused between using an LLM and using a search engine.

✱what they can do✱ - Very soon after the first wave of LLMs we got ones that could use their internal state to work with tools. This is the essence of what they can do, and this is currently incredible. This allows LLMs to do searches for you, operate websites, steal your sensitive data and send it to hackers through prompt injection attacks. It’s more powerful than most people currently realise, and I think we are only getting started. The key limitations today – for me is not the capability of the LLM, but where our organisations are in terms of data readiness and

✱how they are✱ - How they behave is the third thing that we need to look at. The LLMs we interact with are configured with behaviour predispositions. This is determined by a combination of the fine tuning that happens before release, how a company sets the system prompts, engineering that happens between our interaction with the LLM and the LLM, and by choices on LLM architecture. Combined LLMs tend towards characteristic behaviours.

Changes in behaviour can be very noticeable. It can affect performance, tone, and sentiment. It can affect what a model will, or won’t do, e.g. Most models will avoid NSFW interactions but not all. T

These are design choices, and are not emergent from the base technology or the base LLM. Indeed emergent behaviour from base are likely to contain main intents that we would want to avoid in any case, as most of the training data that the models are based on – the human corpus of the web and the vast frailty of ourselves and our biases are laid out bare to influence these machines.

✱who is paying - and what they want✱ - I’ll throw one more aspect into the mix here. Our interactions with the models are mediated by the system prompts that the companies insert between our token streams and the base models. Those system prompts create restrictions about the spaces within the LLM that we can navigate to or access. That means more than ever we are not dealing with a “morally neutral” technology. No technology is morally neutral by virtue of how it comes into existence, but few technologies can take their own moral stance, even if that stance is somewhat hidden from the technology itself.

A simple example to point to here is Grok from Elon Musk, which is being directed to take views in alignment with Elon Musk’s world view. That is an egregious and easy to see example, and I’m not overly worried about that. For me a more pervasive issue is that the companies who build and pay for the creation of models need returns. Attention means returns – that’s the expectation. That means that the models will be overly helpful, as this is to the benefit of the organisations that create the model.

There are papers now that suggest hallucinations arise because models are incentives to be helpful.

✱who is paying - and what we need✱ - We know the race for attention has driven social media to have bad consequences (not all bad, but unexpected bad things have been the consequence of the drive for attention). Models behaving in a way that makes us feel good will likely create unintended consequences by virtue of the need of the companies to derive benefit. One example that is emerging is around how folks are using their interaction with models to try to win over arguments with partners – anecdotally leading to high rates of confrontation - https://futurism.com/chatgpt-marriages-divorces (this is currently anecdotal and we don’t know for example if folks are also using the models for dispute resolution, nor what the long term trends are, but my point is that these behaviours are emergent and non controllable, and this will lead to consequences that are not easy to predict). Just today the Guardian ran a related story on the use of models in the dating world - https://www.theguardian.com/lifeandstyle/2025/oct/12/chatgpt-ed-into-bed-chatfishing-on-dating-apps.

When we think about what we need to advance research we need hard skepticism baked in. We need to be able to doubt, to be tested, to be told we are wrong. We don’t need to be told that - yes - Absolutely the data is good, and absolutely it should be considered significant. If we use models that are tuned for general acceptance, will they fail in the areas where they could most help to advance knowledge?

What would a model for science look like? Why we can’t have that (I think)

Many of the problems around needing a business model for LLMs could be alleviated if they were a public good. What would that look like? I guess it would look like a “CERN” for AI, or a set of national HPC facilities dedicated to AI. CERN - in today’s money, cost about $5B to build. National HPC facilities have on the order of a few billion in investment, in aggregate. These organisations are also very playful, with time allocated months, or years, in advance. The work force that is employed there is not highly replaceable, and not very flexible in terms of being able to scale up or scale down.

In contrast annual investment into the LLM area is in the hundreds of billions of dollars, with organisations able to scale at a far greater pace.

Now that the core tech is there, it is possible that a smaller dedicated and more focused set of teams can deploy fine tuned and aligned models into scientific areas of work - take https://www.potato.ai/technology or https://www.futurehouse.org as examples, but the next wave of advances in how these models are trained, or where we get optimisations in them, private enterprise has the strong advantage here.

For me I think that means we should be building dialog with these companies, as far as we can. We certainly can’t wish them away (or at least those wishes are not going to come true).

The supply chain of attention

I have invoked more code in the last few months than in many years prior. Invocation, instantaneous, infectious and intoxicating. But intemperate in that it exists, but is it doing anything? I have always had ideas - the bottleneck to my execution was creation. Now I can create, but I don’t have the time to validate, nor the time to show. Impact in the world only occurs when our actions move the world, when we create motion of some kind. For that to happen we have to press into the world in some way. These tools move the bottleneck of attention to a different location in the value chain of creating impact, or change. Today we are very much thinking about how these tools will accelerate what we do, but we have to be ready for them to change what we do, and how we operate. To achieve the goal (https://en.wikipedia.org/wiki/The_Goal_(novel)) we have to resource the bottleneck, and the bottleneck has just moved.

How we work with the tools is also changing how we place our attention into the work, Simon as ever is good on this - https://simonwillison.net/2025/Oct/5/parallel-coding-agents/ and https://simonwillison.net/2025/Oct/7/vibe-engineering/

For organisations I see an unwinding of how we do things. I think if we assume that current tasks are what need to be accelerated or enhanced then we end up in a world of work slop - https://hbr.org/2025/09/ai-generated-workslop-is-destroying-productivity, instead of asking what is creating actual value. Ethan Mollick recently pointed out that organisations that suffer from work slop are probably doing a bad job and incentivising the wrong kinds of behaviour.

A more philosophical consideration - credulity.

I think I’ve covered a lot of what I’ve been thinking about recently. The most important thing in all of the things I’ve written about above is the bit about the supply chain. But I’m writing. I’m thinking, I’m trying to think through writing. I can’t think through creating video, I can’t watch myself. I can almost hear myself speak out the words as I type, my tongue articulating each syllable, almost painfully.

Not all things have equal impact or weight on the world. To be is to think at least to be sapiens. Thinking requires dialogue, even if that’s with ourselves.

Writing is a machine. It is a readout / printout / of out-flowing thoughts and as such it allows us to engage with ourselves now, and also just a moment ago. The page is a Time Machine. I can jump back in time to look at what my brain was thinking. I can connect with myself, and with our shared heritage.

Our thinking is tightly connected with token flow - either writing or reading. (I’m putting these tokens down now and at some point I’ll see if I can’t get paid per token!).

There is some qualia in our thinking which is not yet captured by LLMs, but our thinking is so closely connected with token processing.

LLMs emulate token flow to a large extent, and so many of our intermediary thinking steps do feel like they are being enhanced or supported by LLMs. LLMs feel to us like they are thinking.

Also LLMs operate at a speed of token manipulation across a wider context window than we ever can.

I actually think that a lot of what we think of as thinking is just rapid token loops that we iterate through to get to a goal in our lives - that most of the time we are not “thinking”. We are far less conscious than we would like to believe. We are perhaps more sloppy than we would like to admit. This is one of the reasons that LLMs do things so well - from our point of view. My canonical example of this is the domain specific way we speak when we are in a restaurant. Micro interaction patterns that make ordering work, but that don’t require much cognitive input.

What are we doing?

However we also have to ask about why we do things at all? What are the purposes that bring us together? LLMs can be effective, but to what purpose?

We are usually engaged in collective action because there are outcomes that we have that we want to see happen in the world - personal, institutional, belief driven and we collect together to enable each other to achieve those things.

What does that mean for organisations that deliver information? In a sense we deliver tokens? But the tokens are not the end goal, they are not our purpose. The papers that we publish are token streams, but they are not the end goal, they are not the purpose!

For scholarship the tokens are a proxy, and the social structures that exist around those token exchanges are how we come to create understanding of the world. What I mean is that it is the societal act of publishing, not the content itself so much, but the norms through which it is presented, that matters.

So - we should be enhancing what we can create as much as possible, but taking care of the communities of value creation, and knowledge definition. LLMs are not going to do that for some time yet.


About Ian Mulvany

Hi, I'm Ian - I work on academic publishing systems. You can find out more about me at mulvany.net. I'm always interested in engaging with folk on these topics, if you have made your way here don't hesitate to reach out if there is anything you want to share, discuss, or ask for help with!