Ian Mulvany

December 17, 2022

What trends in open science make me think about.


I was reading through this update on open science news from September (I’m on a kick to try and read through all of my open browser tabs before the end of the year)

It’s a really good update with a lot of activity happening. The general sense I get is around discussions on standards, institutional and national alignment, some funding coming online, a continuing pondering of preprints and a little bit of discussion on data publication. Research assessment is also mentioned. 

I’ve been around long enough that I remember when open access was slightly subversive. It’s now mainstream, but what the above update tells me is that system wide change at a global level in a decentralised system takes a long time. To me it’s a no-brainer that we should be publishing open access, but moving entire systems takes a long time. Add to that that I have a huge bias in favour of open access, but a ton of academics not only don’t care, but are either indifferent or hostile. I believe that the transformative benefit of open access remains unproven and until it can make the needs of these researchers it will only ever grow incrementally. 

If I let my thinking wander a bit the things that I am interested in seeing happening over the coming years all have had some interesting movement in the past couple years. 

Data publication - this is the next area after research publications. The most important thing here is the OSTP memo, not because it tells us how to do data publication, but precisely because it doesn’t, while at the same time calling for it. Getting data to be reused, even after being published, is nails hard. Combining it with a live analysis pipeline can help a lot, but it remains hard. The creation of underlying sustainable infrastructure is something I’m close to now as treasurer of Dryad, but I still don’t know what the future of this is going to look like. 

Computable environments - back in the day there was this tool called binder - since then we have notebook environments on demand in google cloud platform, in AWS, integration into VSCode, plumbing available through GitHub actions. There is the ever present activities of the HCP community (of which I know far too little). In short the idea of connecting a typed artefact of a code notebook into a running environment has gone from a hint of a proof of concept into something where the barrier to doing it is much lower. There are few standards in this area that I know of. Things like codeocean don’t seem to have gotten traction, I’ve not heard much recently about the elife ERA project, the path to a clean integrated ecosystem looks not much closer, but the components that you would need actually exist now, which they didn’t when I started thinking about this stuff. So I’ll mark that down as progress. 

Research assessment - this is the mother lode, given the way funding flows in our ecosystem. I guess we just really have to wait for all the old people (and I am increasingly entering this category) to die off, but in the meantime I’m seeing increased dissatisfaction with institutional rankings - I take that dissatisfaction to be a good thing. 

There are a couple of trends that I think are going to be critical that I don’t see so much discussion on. 

Research integrity - I mean, it’s a given, right. Research integrity is a given. BUT the rise of paper mills and the ability to more cheaply create papers using large language models is going to make this a much much bigger problem. The solutions could be - publishers coordinating efforts to help scale identification of bad actors. Get groups who build LLMs interested in this problem. Remove the incentive for folk to want to buy fake papers (China, I’m looking at you - a big driver of this is the need for medical students to publish papers before they can qualify). 

New models of research institutes - shifts in capital flows of a few tenths of a percent of US GDP have gifted a small number of individuals with a large amount of wrath. For better or worse some of those have decided to create their own scientific research institutes, for what is more appealing than placing a stake on the future of knowledge itself. We have a modern cohort of Getty’s and Rockefeller’s. And yet, and yet, the unblinking and cold lady of reality herself cannot be moved or persuaded into giving up her secrets any sooner than she is wont to. So for me there are two useful things these orgs could do 1 - really be open about things like experimenting with random allocation of funding, and other kinds of structural bets that existing systems can’t take. 2 - invest in areas of boring, unexciting, hard, unsuccessful areas of research. These are the underfunded areas that struggle to compete for funding, but research progress is a game of incremental gains and we make good increments when we can move forward in all areas. We underserve ourselves by leaving certain pockets languishing. 

New powers of inference - I’m blown away by large language models. If we could harness them to all of the literature and make them be able to hallucinate less, and be able to return answers tied to the existing literature, but be able to finally provide instant systematic reviews on all topics, along with guiding perspectives, that would be something. It would be something. I worry a little that my idea here is being delivered from the stands, and maybe I’m just too far removed from the pains of actual researchers, but this is the bet I most want to see come to life. What it would take to get it working in a sustainable and adoptable way, I don’t know. Someone like elsevier should build it (as much as they get shade they are the only one at the moment who might be able to build it). 

I have other thoughts, but these were the ones that sprung up from looking over that open science update.