May 17, 2024

Interesting webinar next week on how to increase access to data

There is a compelling case that can be made that in many cases data is more important any research paper that is attached to the data, and certainly at governmental and state level, access to underlying data sets more important than written reports.

For scholarly publishing, the role of data, and what should constitute a first-class citizen of research is an evergreen topic.

I’ve had the pleasure of working with Julia Lane on a number of initiatives, and recently asked to contribute to a special issue of the Harvard Data Science Review relating to a project that she has been deeply involved in - https://democratizingdata.ai/

What is so critical about this project is that it has found buy-in, and participation, from a number of US government agencies - the ERS, NASS, and USDA.

The work involved in this project led to a special issue of the Harvard Data Science Review which presents 19 papers about using AI tools to support evidence building in public policy and science. You can see the full special issue here https://hdsr.mitpress.mit.edu/specialissue4

The Editorial for the special issue - https://hdsr.mitpress.mit.edu/pub/m1o4oblm

You can see my own contribution here https://hdsr.mitpress.mit.edu/pub/k8ci2fwe

I had an interesting experience writing my piece. The reviews that I got back were significantly longer than the piece I submitted. You can see the reviewer comments here - https://github.com/IanMulvany/Ian-mulvany-hsdr-editorial/blob/main/reviewer-comments.txt.

Initially perplexed about how to manage such large, and in parts diverging, reviews, I reached for GPT. I created a number of prompts to run over the reviews to find clusters of similar issues. This came up with an overview of clusters like this https://github.com/IanMulvany/Ian-mulvany-hsdr-editorial/blob/main/aggregated_review_points.md.

I then selected clusters I thought reasonable to address and turned them into issues in GitHub, and closed out the issues as I addressed them. https://github.com/IanMulvany/Ian-mulvany-hsdr-editorial/issues

The process, and reviewer comments led to a very significant improvement in the manuscript, and was a working case study for how LLMs can help with this kind of task.

