Ian Mulvany

December 2, 2021

what is going on with all the fake papers?

The following paper - https://app.dimensions.ai/details/publication/pub.1139702911 https://arxiv.org/abs/2107.06751 (by the way, as an aside FU dimensions, for basically hiding the outbound link to ArXiV on your webpage (grey, no hyperlink indication, and the thing that looks like a link just links to another dimensions page - super shady my friends)) points out a way of detecting fake papers. From their abstract: 

"Probabilistic text generators have been used to produce fake scientific papers for more than a decade. Such nonsensical papers are easily detected by both human and machine. Now more complex AI-powered generation techniques produce texts indistinguishable from that of humans and the generation of scientific texts from a few keywords has been documented. Our study introduces the concept of tortured phrases: unexpected weird phrases in lieu of established ones, such as 'counterfeit consciousness' instead of 'artificial intelligence.' We combed the literature for tortured phrases and study one reputable journal where these concentrated en masse. Hypothesising the use of advanced language models we ran a detector on the abstracts of recent articles of this journal and on several control sets"

This has been used to create a problematic paper screener tool - https://dbrech.irit.fr/pls/apex/f?p=9999:1:::::: 

Nature picked this up - https://www.nature.com/articles/d41586-021-01436-7?utm_source=twt_nat&utm_medium=social&utm_campaign=nature 

There are hundreds of these fake papers. What the actual fuck? 

In a way I'm kind of glad that I get to sit at home on my nice sofa writing about these kinds of things, and not having to actually read any scientific papers any more. The idea that there is a network of knowledge that is a weaving of a representation of reality has always comforted me. But now the narratives around us are diverting from reality, and it looks like the underlying fabric of the story we tell ourselves is having random papers injected into it. 

This is a concern, and while it may be that we are the point where machines are the largest readers of the scientific literature, I think we should not just sit back and ignore these kinds of issues.