What questions might we ask of AI systems in scholarly publishing?

Today I chaired a small panel discussion for the Friends of the NLM group on the topic of AI and NLP tooling in the scholarly literature.

You can see the outline of the workshop here: https://www.fnlm.org/product/lessons-from-covid-19-finding-synthesizing-and-communicating-research-that-matters/ and on the panel I chaired were Lucy Wang PostDoc AllenAI - Semantic Scholar - https://www.linkedin.com/in/lucylw/, George Tsatsaronis Vice President, Data Science Research Content Operations, Elsevier - https://www.linkedin.com/in/georgetsatsaronis/ and Zubair Afzal - Director of Data Science Elsevier - https://www.linkedin.com/in/georgetsatsaronis/.

It was a cracking discussion, but we didn't get close to covering all of the topics that we could have. Here are the questions that I'd prepared in advance, and I'm blogging these mostly to make it easier for me to find them again at some point in the future:

Framing:
We have heard from our earlier speakers about the huge spike in papers that came out during the pandemic. Even beyond this pandemic we all recognise that the literature is now so vast that getting a clear view across varying that is coming out is beyond the capabilities of even a dedicated group of individuals, and really in the future the likelihood is that machines are going to be the only entities that will have the capacity to, if not read, at least scan, everything. With that in mind, where are we today with these technologies, and are they living up to expectations?

Questions:
How do you tell if any of these techniques are actually helping with the pain points in the research process? What metrics, if any, do you track to help you guage whether your efforts are working here?

Are we in an era of experimentation, transformation, or revolution?

When looking at the tooling that you have created for the Covid Pandemic, how gereralisable are these tools, how ready are they to be applied to future health emergencies, or indeed to other critial reseach areas?

AllenAI is well funded, and Elsvier has reasonably deep pockets, but what are your thoughts on where the plane of accessibly of these technologies is heading, in terms of developing and using custom AI tools, what are going to be the ongoing barriers to entry, or are those barriers going to erode to the point where use and development of these tools will become commonplace? How do we make these efforts sustainable?

I’m interested in the bredth problem of the literature, across all domains there are on the order of maybe 100M peer reviewed publications, across 5K publishers, and maybe 35K journals.

Do you need to get to complete coverage of the literature, before ML adds value to search as a capability? How do you select, how do you gather, and how do you filter?

How do you shift researcher behaviour from keyword search to semantic search, now that our language models can do so much better at interpreting the intent of questions? Is there even benefit in trying?

How do you deal with bias in the literature in terms of how that bias can creep into the models that we train – e.g. we know that women’s work is read more, but cited less, we know that there is a bias for publication from tier one universities, and we know that to date language models have tended to work better in English over other languages?

We saw earlier the large effort involved in high quality peer review, with even optimised workflows taking a few months in the normal run of events.

We have seen remarkable improvement in the ability for machines to synthesise text, but it’s still very much a Chinese room approach under the hood, with no underlying understanding. What are the most complex human tasks that you think this approach will be able to augment, or perhaps even disrupt?

Apart from your own efforts, are there efforts in this space that you have been particularly impressed by?

Are there things that could help in other areas of the publishing process that would make NLP easier, or are our models robust enough, or going to be robust enough, that it will only create a marginal gain to increase e.g. structured tagging of content.

We saw a huge rise in misinformation through the pandemic, and the hugely negative health impact that misinformaiton has had. How do you design these systems to be robust against misinformation?

What are your thoughts on where we are with computational knowledge, and the ability of our ML models to create theory free knowledge?

*what*you have learnt, and *what*you might do differently in the context of *ML*- if you $100 to bet on next steps, where would you put your money?