Ian Mulvany

May 17, 2024

Interesting webinar next week on how to increase access to data

There is a compelling case that can be made that in many cases data is more important any research paper that is attached to the data, and certainly at governmental and state level, access to underlying data sets more important than written reports.

For scholarly publishing, the role of data, and what should constitute a first-class citizen of research is an evergreen topic.

I’ve had the pleasure of working with Julia Lane on a number of initiatives, and recently asked to contribute to a special issue of the Harvard Data Science Review relating to a project that she has been deeply involved in - https://democratizingdata.ai/

What is so critical about this project is that it has found buy-in, and participation, from a number of US government agencies - the ERS, NASS, and USDA.

The work involved in this project led to a special issue of the Harvard Data Science Review which presents 19 papers about using AI tools to support evidence building in public policy and science. You can see the full special issue here https://hdsr.mitpress.mit.edu/specialissue4

The Editorial for the special issue - https://hdsr.mitpress.mit.edu/pub/m1o4oblm

You can see my own contribution here https://hdsr.mitpress.mit.edu/pub/k8ci2fwe

I had an interesting experience writing my piece. The reviews that I got back were significantly longer than the piece I submitted. You can see the reviewer comments here - https://github.com/IanMulvany/Ian-mulvany-hsdr-editorial/blob/main/reviewer-comments.txt.

Initially perplexed about how to manage such large, and in parts diverging, reviews, I reached for GPT. I created a number of prompts to run over the reviews to find clusters of similar issues. This came up with an overview of clusters like this https://github.com/IanMulvany/Ian-mulvany-hsdr-editorial/blob/main/aggregated_review_points.md.

I then selected clusters I thought reasonable to address and turned them into issues in GitHub, and closed out the issues as I addressed them. https://github.com/IanMulvany/Ian-mulvany-hsdr-editorial/issues

The process, and reviewer comments led to a very significant improvement in the manuscript, and was a working case study for how LLMs can help with this kind of task.

Tags from OpenAI:

data analysis, government data, scholarly publishing, public policy, artificial intelligence, data science, peer review, research methods, data-driven decision making, automation in publishing, computer science 004.

Chinese Executive Summary:

很多情况下,数据比与之附带的研究论文更重要,尤其是在政府和国家层面,访问底层数据集比书面报告更重要。科研出版中对数据的关注,并将其视为研究的核心组成部分,是一个常常讨论的话题。与Julia Lane合作的项目得到了多个美国政府机构的支持,并发布在《哈佛数据科学评论》上,介绍了19篇关于使用AI支持公共政策和科学证据构建的论文。该杂志还将在下周举办网络研讨会。作者借助GPT来处理长篇评论,显著改善了手稿。

German Executive Summary:

In vielen Fällen sind Daten wichtiger als die dazugehörigen Forschungspapiere, insbesondere auf staatlicher Ebene, wo der Zugang zu grundlegenden Datensätzen wichtiger ist als schriftliche Berichte. Das Thema, welche Rolle Daten in der wissenschaftlichen Veröffentlichung spielen sollen, bleibt aktuell. Ein Projekt, an dem Julia Lane beteiligt ist, hat Unterstützung von mehreren US-Regierungsbehörden erhalten und wurde in einer Sonderausgabe der Harvard Data Science Review veröffentlicht, die 19 Artikel über den Einsatz von KI-Tools zur Unterstützung von Politik und Wissenschaft enthält. Der Autor nutzte GPT, um lange Überprüfungen zu verwalten und verbesserte dadurch das Manuskript erheblich.

Spanish Executive Summary:

En muchos casos, los datos son más importantes que cualquier documento de investigación adjunto a ellos, y ciertamente a nivel gubernamental y estatal, el acceso a conjuntos de datos subyacentes es más importante que los informes escritos. En la publicación académica, el papel de los datos es un tema recurrente. Un proyecto en el que Julia Lane ha participado ha recibido el apoyo de varias agencias del gobierno de EE. UU. y ha sido publicado en una edición especial de la Harvard Data Science Review, presentando 19 artículos sobre el uso de herramientas de IA para apoyar la creación de evidencia en políticas públicas y ciencias. El autor usó GPT para gestionar largas revisiones, lo que mejoró significativamente el manuscrito.

About Ian Mulvany

Hi, I'm Ian - I work on academic publishing systems. You can find out more about me at mulvany.net. I'm always interested in engaging with folk on these topics, if you have made your way here don't hesitate to reach out if there is anything you want to share, discuss, or ask for help with!