eLife Reproducible Articles

I've been meaning to write about eLife's executable research articles (ERA) for some time. This (https://elifesciences.org/labs/51777514/elife-authors-relay-their-experiences-with-executable-research-articles) recent set of Q&A's with authors who have used them is a great prompt to set down some thoughts.

These papers are backed by a compute environment that can be spun up in a container and allow the reader to run code as they read the paper. You can modify the parameters, and see how that affects the figures.

They are working on the full stack, from authoring environments, all the way through to the infrastructure needed to run, host, and publish these kinds of articles.

The executable document pipeline is open source, and you can get working with it here:https://stenci.la/.

When I first saw this being released I mistakenly thought that they had made a terrible mistake. From the early material that I read I had thought that they had built a very eLife specific workflow that would require authors to come to the format to use it, but in conversations with Giuliano from eLife I found that I was wrong on this. The pipeline can integrate with git workflows and can use Jupyter or R notebooks as the source data. That's important because these are the engines of choice for most academics working in computational domains with a lean towards open science and open infrastructure.

What they have delivered on here has been imaginable for some time, essentially since the launch of the Binder project, and the growing maturity of containerisation as a tool. I applaud the team for taking this imagined future, and brining it closer than ever to us.

What might we imagine next?

The ERA allows us to unpack data presented in a paper, and to uncover how that data point was derived, to ask how it might change. As we begin to get to scale with mining larger corpora of papers, and as machine learning tools become as ready to hand as containers are today, we can imagine appealing directions for the next level of innovation. What I want to see is when we look at one data point in a paper, not only can we interrogate that data point vis a vis how it materialised form the underlying data and analysis, but also how it compares to all other assertions about the fact in the world that it relates to. Is this data point in line with the rest of our understanding, is it telling us something key or critical, does it just actually look like a suspicious outlier?

While I am excited about this platform, we also have to see that uptake of this format remains very low. The technical sophistication for an organisation to work ERAs remains relatively high. In spite of streamlined integration points, the barrier to adoption will remain high. What will be the long term future of these things? I think for that we have to ask more about what the nature of the open science movement is going to look like. Infrastructure is hard, we remain at the mercy and tyranny of MS Word in its various formats. And yet open science is a ratchet, its a percolating activity, it's a position that will never be possible to argue against. While adoption and movement towards it is slow, it is very hard to see that movement doing anything other than increasing over time. OA has taken over a decade to begin to get embedded in. The next few decades are going to be about open science, and tooling like ERAs may end up being some of the initial pieces that a revolution is founded upon.