I don't know if you know the Avengers movie from 2012, our hero are defending the city from an alien onslaught, they are in a desperate situation and need to have the help of Hulk, but Hulk's alter ego Bruce Banner arrives on the scene.
Captain America says to Bruce "Now might be a really good time for you to get angry" "That's my secret cap, I'm always angry", right before transforming in Hulk.
When it comes to LLMs and hallucination, it's kind of the same. We often say they hallucinate every now and again. Actually, they are always hallucinating, it's just that most of the time their hallucinations seem perfectly reasonable to us (I suspect that this is because we don't think as much as we think we do).
Of course, when working on academic texts this leads to the problem of rigour in text generation (the "making shit up problem"). This came up in conversation this week around citations with some colleagues and I stated that I thought that the citation problem is effectively solved. Then this morning I read this: https://retractionwatch.com/2025/06/30/springer-nature-book-on-machine-learning-is-full-of-made-up-citations/ - now I'm not saying that their was use of LLMs to generate the text here, but it is a bit of a smoking gun.
So what do I mean when I say I think this problem is solved?
Well, the way I see it is that there are plenty of services that will allow you to look up citations - Open Alex, Semantic Scholar, others. This means if you are building and app or a workflow, you don't have to rely on the LLM to create citations for you, you can go and ask it to look up citations, and you can verify that the citation exists. This is what data-to-paper does but using the Semantic Scholar API (https://github.com/Technion-Kishony-lab/data-to-paper) and this is also the approach that https://symbyai.com/ are taking for their AI peer review tool.
You have to realise that LLMs don't think, so their ability to retrieve citations is going to come down to how performant the discovery service is that they plug into.
I think there might be some opportunities to improve discovery in this space (indeed folk maybe working on this already). At the moment if I ask an LLM to do some tool use to drive the interface to a discovery service it is plugging in some related keywords to the service and we hope that those keywords have reasonable levels of recall from the discovery service.
It would be nice if we had a vector representation for all of the literature, with each patch of the vector space connected to a verified set of citations. Furthermore it would also be nice to find out whether the shape of the vector space for full text articles mapped closely to the representation for titles, and to the representation for abstracts.
Something like that might get us the ability to get more citations than keyword search, but for now LLMs, if provided the right tools, can do a reasonable first pass at finding some related real existing literature for a query.
The fact that this is a solved problem will not prevent people from doing the dumb thing, that I get, but if you are building a serious system you have no excuses not to do the right thing here. In fact, I might try to build that into a system that I am working on at the moment.
Captain America says to Bruce "Now might be a really good time for you to get angry" "That's my secret cap, I'm always angry", right before transforming in Hulk.
When it comes to LLMs and hallucination, it's kind of the same. We often say they hallucinate every now and again. Actually, they are always hallucinating, it's just that most of the time their hallucinations seem perfectly reasonable to us (I suspect that this is because we don't think as much as we think we do).
Of course, when working on academic texts this leads to the problem of rigour in text generation (the "making shit up problem"). This came up in conversation this week around citations with some colleagues and I stated that I thought that the citation problem is effectively solved. Then this morning I read this: https://retractionwatch.com/2025/06/30/springer-nature-book-on-machine-learning-is-full-of-made-up-citations/ - now I'm not saying that their was use of LLMs to generate the text here, but it is a bit of a smoking gun.
So what do I mean when I say I think this problem is solved?
Well, the way I see it is that there are plenty of services that will allow you to look up citations - Open Alex, Semantic Scholar, others. This means if you are building and app or a workflow, you don't have to rely on the LLM to create citations for you, you can go and ask it to look up citations, and you can verify that the citation exists. This is what data-to-paper does but using the Semantic Scholar API (https://github.com/Technion-Kishony-lab/data-to-paper) and this is also the approach that https://symbyai.com/ are taking for their AI peer review tool.
You have to realise that LLMs don't think, so their ability to retrieve citations is going to come down to how performant the discovery service is that they plug into.
I think there might be some opportunities to improve discovery in this space (indeed folk maybe working on this already). At the moment if I ask an LLM to do some tool use to drive the interface to a discovery service it is plugging in some related keywords to the service and we hope that those keywords have reasonable levels of recall from the discovery service.
It would be nice if we had a vector representation for all of the literature, with each patch of the vector space connected to a verified set of citations. Furthermore it would also be nice to find out whether the shape of the vector space for full text articles mapped closely to the representation for titles, and to the representation for abstracts.
Something like that might get us the ability to get more citations than keyword search, but for now LLMs, if provided the right tools, can do a reasonable first pass at finding some related real existing literature for a query.
The fact that this is a solved problem will not prevent people from doing the dumb thing, that I get, but if you are building a serious system you have no excuses not to do the right thing here. In fact, I might try to build that into a system that I am working on at the moment.