I am in a state of confusion and wonder. Most of the time I’m able to suppress this and stumble through my days without thinking too much, but now and again I fail and my attention gets arrested by the riotous confusion and intricacy of the world around me.
We are odd creatures, bags of chemicals loosely held together, sinew and bone built in such an odd way that the world impresses us, in upon us. We are emotional, it is wonderful to feel. To be exuberant, to feel like we are learning, to yearn. We are also almost infinitely capable of confusing ourselves, of fooling ourselves into thinking we know things, being certain of things, it’s just so hard to know.
Science is all about working hard to disabuse ourselves of our pretensions our confusions. And it works some of the time, hardly at all really. But enough, enough to make I suppose progress.
Have you looked at water as it flows from your jug, or paused to feel the air brushing your face? We have equations that describe these phenomena, and they are fiendishly difficult to compute. There is a stunning webpage describing airflow - https://ciechanow.ski/airfoil/ and of these equations Bartosz writes:
The quite informal description of these balances that I’ve presented can be formalized mathematically using the Navier–Stokes equations. These equations describe the motion of liquids and gasses, collectively known as fluids, subject to various forces like gravity, or, most importantly for us, pressure.
Navier–Stokes equations are notoriously difficult to solve analytically, but a lot of insight about the behavior of fluids can be gained with computer simulations with various degrees of complexity.
For us these equations are hard, yet reality itself just gets on with being, the wind blows, water flows. All of the explanations we construct are at least one step removed from the reality of things. And that’s just in the domains where we have analytic sciences to help us. When we interleave the interactions of people, then things get messy as well as complex. Biological systems are barely fathomable.
The world is hard for us to understand (even though the world as far as it’s concerned just carries on regardless), and that it is easy for us to confuse ourselves and to think that we know more than we do. We project out our wonder with language, language dimly mirrors some of reality, but it’s a shadow world to some extent, built up on top of reality, perhaps floating above it, easily unmoored. And above even this are language models.
So now - on to Prism - https://prism.openai.com the scientific document editor from OpenAI released recently. I got asked to share my thoughts. It’s not the only product that does this, there are other tools out there in the market as well, such as https://www.useoctree.com and of course Overleaf, but it’s interesting for many reasons. It has come out from the ‘OpenAI for Science’ group led by https://www.linkedin.com/in/kevinweil/.
I’ve been thinking about it from a few different viewpoints. I’ve also seen two takes that I’ll address. One take is that you can’t vibe code science and the tool could be problematic because it makes that seem possible which leads to a lot of negative consequences. The other take is that OpenAI is doing this to capture training data. I’ll comment on both of these takes. I’ll also add some other thoughts.
Much as the world is hard, predicting how tools will get adopted, or what their long run effects or externalities end up being is also very hard, so even though I have thoughts, we will generally have to wait and see what happens. What OpenAI does is useful to discuss because they have money and attention, hence power. Any actions they take can create a dint in things.
My first very brief impressions - and some things that make marginal sense at this point.
It’s a super slick tool that can render LaTeX documents well, and help manage complex LaTeX projects. I threw in my master thesis on supernovae from many years ago and got it to compile. There were missing files, and it was a very old project. It initially threw up some errors so I got the LLM to fix the errors, it did, and it compiled. Nice. It’s fast, it’s well designed. It’s not shit.
Marginal one - where is the work happening?
But - It wants me to work over there, and not over here. It wants me to work in the online tool, and I think most research projects happen in the context of where the researchers have created their workspace. Most researchers that are productive today have spent a lot of time honing their tools. This tool wants them to move their locus of work. That’s very hard. I think it’s going to be easier to bring the LLM to the workspace, than to get researchers to change their workspace. This is the first thing that only makes marginal sense at this point.
That said, things like Claude Cowork and OpenAI’s own Codex Desktop app show that these companies have plenty of talent and resources and the boundary lines between the visual interface we use, the LLM, and where we locate work is a fluid one right now. If OpenAI gets the traction that they are looking for from Prism then we don’t have any reasons to expect that the current iteration is going to remain the only one that is available.
A signal to look for - will they ship a non browser based version of this in the next two months?
Marginal two - why LaTeX?
LaTeX is a markup language for mathematics. (I spent a lot of time working with LaTeX, and my knowledge of LaTeX got me my first job in scientific publishing). But it is not a majority use tool. I estimate that the number of papers that use LaTeX as their source format is well under 35%, so this tool will not be used by a majority of researchers. My first reaction was why start with LaTeX if you want to help science, but I think there are a couple reasons why it may make sense in the context of OpenAI.
Most of the research background of people working in OpenAi will be from a computer science background, so for them LaTeX is likely the norm.
They were able to buy a company that had built up the tool they are launching - https://crixet.com - so that it gets them immediate product and user traction. Being able to build on what you can goes a long way. It helps to be able to buy talent and companies. (Sometimes it helps, sometimes it doesn’t. eg Mendeley and Elsevier).
Folks working in Word have MS Word or Google Docs - both of which already have massive market share. Getting people to move away from either of these tools is going to be really hard. LaTeX by contrast is relatively underserved online by comparison, so you have a chance of getting some traction and adoption.
LaTeX is like a programming language and if you go to the arXiv you can get the PDF and LaTeX source files, so there is a large body of training data for an LLM to work with.
Overall there are plenty of reasons why you might start this kind of thing with LaTeX.
Another signal to keep an eye on – will they release a tool that supports Word and Google Docs?
Vibe coding science?
You can’t vibe code science - here are two very good BlueSky threads on this.
The objection here is that by hooking a tool that can make plausible looking papers, with an LLM, you open the floodgates to shit work that will overwhelm current research publishing infrastructure. But here is the thing. That’s already happening anyway. We see comp sci conferences changing their policies as a result, and indeed the ArXiV is changing its policies, and projects like https://github.com/Technion-Kishony-lab/data-to-paper have been around for a couple of years. So yes, this tool might accelerate this trend, but it’s not going to make this trend suddenly happen.
I think it’s important to read what OpenAI have said about their motivations for creating the tool - they don’t take about vibe coding science what they say is
We’re still early, but it’s clear that AI will play a meaningful role in how science advances.
At the same time, much of the everyday work of research–drafting papers, revising arguments, managing equations and citations, and coordinating with collaborators –remains fragmented across disconnected tools. Researchers often move between editors, PDFs, LaTeX compilers, reference managers, and separate chat interfaces, losing context and interrupting focus.
Prism is our first step toward addressing this fragmentation.
The everyday work listed here is not the purpose of science, let’s keep that very clearly in mind. That touches a bit on the second reaction I’ve seen with is that this might be a …
Trojan horse training tool?
Another take that I have seen is that OpenAI is doing this in order to generate training data on what actual collaboration looks like. If they get a sufficient volume of people using the tool they might be able to train on the interactions that happen when knowledge is created. This would allow them to train their models to do more high value work. There are a number of other initiatives looking at AI in support of science, here are some:
- https://platform.edisonscientific.com
- https://www.qedscience.com
- https://www.aria.org.uk/ai-scientist/funded-projects/
- https://potato.ai
- https://www.futurehouse.org
A signal to keep an eye on - If this kind of training works then we should see more announcements from OpenAI partnering with research heavy organisations providing B2B services for their researchers. Anthropic are heading in that direction https://www.anthropic.com/news/anthropic-partners-with-allen-institute-and-howard-hughes-medical-institute
What’s the problem?
OpenAI should of course be trying to do this. They have an enormous value promise to deliver on, they have one of the most advanced pieces of technology to hand, market share and traction, they should be pushing everywhere all at once. It is also possible that they are trying to make science better and the process of collaboration better. There is a benefit to building with a positive vision for the world, it can be a talent magnet. But what exactly is the problem of collaboration in science, and where might the opportunity lie?
The process of doing science is not simple. Two of my favourite books on this are The Unnatural Nature of Science - https://sackett.net/walpert_unnatural-nature-of-science.pdf and https://www.amazon.com/ and The Knowledge Machine - Knowledge-Machine-Irrationality-Created-Science/dp/1631491377. They draw out the messy nature of the process of doing science.
We are trying to make claims about the state of the world, and we are trying to make those claims accepted by our peers. The act of writing is the last piece, and many factors flow in - who is an author, what’s cited, where is the work disseminated, how far can we push a claim in this sentence right here, how well supported is this, where is the data that supports this, how might we format this image, in what order are we creating the story, how well maintained is the code, how much does this publication fit in with the lab research program, where is the data from that PhD student from 18 months ago, who cleaned this specific data, where are we with the ethics panel, how does this specific paper fit in with the grant we are writing, which labs are we implicitly writing against in this piece, and on and on and on.
The process of writing is not just about the work itself, it’s a move in the game of making knowledge claims and those claims are a collective move from the lab or authors that are making the claims.
I work for a scientific publisher, when you strip down what we do in its essence it should be simple - enable the process of knowledge claims that the academy has collectively agreed is the process, maintain the integrity of the scholarly record (now there are depths to both of these activities).
A tool that aims to really support the process of writing can be helpful and gain traction if it starts to make some of the above easier and removes extrinsic load from these efforts, but you will see than the actual writing of the paper is in many respects that last component, and the things that are hard are oftentimes wicked problems, ones where an LLM is probably not going to be able to help - on it’s own.
Opportunities
Research Clippy
There are scenarios in which a tool made with an LLM could start to create a lot of value here if LLMs improve, and how they integrate with our wider knowledge ecosystems improve (I’m thinking easy turnkey auth and IAM like permission structures for agents, and I think this will come in the next year, along with model improvements.)
One should be very cautious about imagining potential futures when there are so many low hanging fruits available right now – with the current power of LLMs and with market and commercial dynamics uncertain. I’m going to go ahead and do some prognosticating in any case. I think these potential futures are plausible and they point in a direction of travel that could be interesting for these class of research space products.
So speculating - one could start to create a coordinating agent that attempts to act as an interlocutor between collaborators, keeping track of conversations, meetings, decisions, and providing paths to resolve the open kinds of questions that arise during the collaborative writing process. A clippy on steroids, that had all of the lab context. This would be ambitious but could start to create lab level value to this process. Generally these kinds of all encompassing efforts - integrating communication tools, document management, and workflow, have failed, but if we have systems that can take on administrative burden, can we create value?
Signal to watch out for - the product starts to introduce a research companion that uses context from group ChatGPT sessions.
Context machine
Coming back to what OpenAI wrote:
At the same time, much of the everyday work of research–drafting papers, revising arguments, managing equations and citations, and coordinating with collaborators –remains fragmented across disconnected tools. Researchers often move between editors, PDFs, LaTeX compilers, reference managers, and separate chat interfaces, losing context and interrupting focus.
The first is the big underused capability of these systems TODAY is their very large context windows. While I might be working on this specific piece of writing right now, how do any claims in any given sentence connect to any prior claims across any papers that I have read, any journals I have published, and data that I have generated across all of my experiments? If you give the agents access to the context, and a route to interrogating that context in a useful way, then these tools can expand the contextual capability of the people working with them. I mentioned before what I think the core jobs are of publishers, but I deeply believe that we have an obligation to create high context environments - because we can - in a way the jobs to be done are clear but we should execute on them through the creation of context engines. There is a massive opportunity here. LLMs could really start to be tools to connect different types of search (classic and RAG) to personal context, to agentic companions, and to wider bodies of literature.
A signal to look our for - they build out interactions with MCP services that expose information into subscribed resources, like Wiley AI Gateway, and connect this into Prism
Peer reviewer machine
The next big opportunity is around connecting the work being done with the people doing the work. Have you ever asked ChatGPT what it knows about you? It tells me that I think in systems, pipelines and scale (I’ll take that!). Chatbots are amongst the most personal of technologies that have ever been built because they present to us as humans and we often interact with them as humans. They are general purpose machines that can create a representational picture of who we are, in a much much more fine grained way than any technology we have built before - even beyond the targeted marketing capabilities of Facebook or Google.
One of the great scaling problems with publishing science is getting the time from a peer to review the work in a way that we have decided is a required step in the game of making knowledge claims. If you are using a tool to write your paper that not only knows about you, but know about the expertise, workload, and reputational weight, of a large number of other researchers working in the same tool, then you could imagine a workflow where you are asked to review work, at just the right moment, on just the right question, perhaps in exchange for others reviewing your own on your own work? This kind of technology has the potential to go a long way towards eliminating the peer review bottleneck.
Signal to watch for - Prism starts to ask for access to your wider ChatGPT history and context to allow it to do its job better.
Peer review machine
Now for the more intriguing idea. This is the one that gets closer to the fears of those that say that science can’t be vibe coded (which I believe). Everything I’ve mentioned before under the two opportunities is possible today with the current capabilities of LLMs and other technologies that we have. They are just engineering problems.
This next idea is not possible today, but there are a lot of people working on trying to create this kind of possibility. What if LLMs gain enough domain knowledge and have a sufficient set of tools to work with, and sufficient context, that their peer review assessments on accuracy, novelty, really on any dimension - is probably better than human peer review? In that scenario one of the current jobs of peer review becomes untenable for humans to do. We would lose a lot, but pressure on academics, on institutions, and commercial pressures, would drive this to be the predominant means of review - it would take the peer out of the equation, but it would do peer review for all intensive purposes. What the public would make of this is unclear as the public don’t understand what peer review is today, and the technical capability here might not match the cultural expectation - but I think that could be managed. Would that be universally bad? I can think of a lot of bad things about it, but I can think of a lot of good things about it too. What is clear is that it would be a disruptor.
Signal to watch for - this scenario is so far beyond where we are today that I don’t have a good signal to look for from the upcoming Prism roadmap, but I am paying attention to ongoing conversations and experiments around AI in peer review and there is a lot happening. (The best blog on this topic is https://scalene-peer-review.beehiiv.com/).
What might we think, and might publishers do?
OK OK, let’s not lose ourselves here, it’s just one online tool, effectively rebadged, that helps people write LaTeX online. It’s not the end of things, and of all the things that start, this is just another one. So I don’t think we need to freak out here. But it is super super interesting.
Publishers should be investing in systems that allow our content to flow to where our users are, so even if OpenAI doesn’t come calling for a conversation, we should be looking at the surface area of our content and making sure that it can be served to where our researchers need it.
What might this all be hard for OpenAI?
Getting research systems to shift is hard. Getting collaborators to shift tooling is hard, and the publishing part of the ecosystem is rife with sub-optimal tools. Getting integration that work takes effort, schleppy hard grunt work. Launching a product that focuses on the technology and the content of the text is really the easy part. I’ve seen any number of “Githib” for science projects, or “article of the future” products. This tool needs to be integrated into research workflows to be useful, and I don’t know how much outreach is happening from OpenAI to other entities in this space. The fragmented nature of research disciples, incentive systems, publishers, and institutions is why this might be hard for OpenAI.
Signal to watch for - integration with a major publisher or submissions system vendor.
Should Overleaf be worried?
I originally thought I would be writing most of this post about workflow integration, and how the current iteration of Prism has none built in right now. This is one of the two current moats for Overleaf. Their two moats are - workflow integration with submissions systems, and a large ecosystem of paying customers. Free is not a sufficient price point if the tool does not allow the key steps of collaboration to happen - proper collaboration and review, and article submission. Now neither of these are beyond the capability of OpenAI to execute on, but they are as I said above schleppy and boring features to build, and so a small product team that is focussed more on what AI can do, rather than the rigorous boring engineering and integration work within a complex landscape of publishing systems might not get there.
In the short term the thing that the folk at Overleaf should be looking at is can they integrate more meaningful and useful and fast AI integration through an API layer – can Overleaf become like Cursor with AI agents and bring your own model, built in, faster than Prism can become Overleaf? It’s possible. Overleaf has been around for a while so there is almost certainly a hill of tech debt to overcome, but AI is making that cheaper to tackle than before.
In the short term my gut tells me that Overleaf is probably going to be OK if they can execute well.
Should we be worried?
In the longer term if the practice of making knowledge claims changes in the way I’ve outlined above, then Overleaf is in trouble, but you don’t need to worry about that, because if you are a reader from my usual audience all of our jobs will be going away and we will all be too worried about that to worry about Overleaf!