As well as the recording above, please find panelists’ slides and responses to attendee questions below.
Date: Wednesday February 1
Time: 4 – 5.15 pm UK/UTC
Other timezones: 8.00 am Pacific Time | 10.00 am Central Time | 11.00 am Eastern Time | 12.00 pm Brasilia Time | 5.00 pm Central European Time | 4.00 pm West Africa Time | 5.00 pm South Africa Standard Time | 8.30 pm India Standard Time | 11 pm Central Indonesia Time (Time converter tool)
OASPA is pleased to announce our next webinar which will focus on the what about and the why of data sharing.
The recent OSTP “Nelson memo” served as a re-focus on data as a first class research output. But maybe that’s a misrepresentation for those of us who think ‘hold on, we’ve been focused on data this whole time!?’ Well here’s a chance to learn from and with a group of experts who are thinking carefully about data sharing: what that means from different perspectives, tangible steps to take and policies to make around data, and what we can do next in our communities of practice. Attendees are more than welcome to bring their own perspectives!
The webinar will be chaired by Rachael Lammey (Crossref).
We welcome our panelists: Sarah Lippincott will give a repository perspective with insights into where data is going post Nelson Memo and NIH Policy. Aravind Venkatesan will share the thinking, data science and workflows employed at EuropePMC to support data linking. Shelley Stall will talk about how AGU are leading the line with their data policies, and Kathleen Gregory will conclude by considering researchers’ perspectives regarding sharing and reusing data.
The panellists will each speak for 12 minutes, and then we will open it up to questions from the audience and discussion.
Please join us live for this free webinar and contribute to the discussion.
Link to registration page: https://bit.ly/February2023_OASPA_Webinar. Please share with your networks.
Please note that views expressed in OASPA webinars are those of the individual speaker and do not represent the view of OASPA
Responses to attendee questions
Q. How many curators does Dryad have? How much time does it take per dataset to perform curation? Does it involve lots of back and forth with the researchers?
SL: We have a team of 8 curators. It takes an average of 20 minutes to curate a dataset, with a turnaround time within a week. We rarely publish a dataset without some communication with the author, but usually not extensive.
Q. Can one of the low hanging fruit solutions out there: OSF, Zenodo, FigShare, DataVerse act as a data repo or preferred if institutional?
SS: This question is very complex to answer. Depending on your country, institution, funder, selected journal for research publication, and perhaps other criteria, the repository options change. In general, repositories that provide curation support for your data, usually these that are known as discipline or domain specific preservation repositories, are the highest preference. They require time prior to making data available to prepare it properly. After that, generalist repositories, including instututional repositories, are the next option. There is a comparison chart that I helped to coordinate that assists researchers with determining which to use: https://doi.org/10.5281/zenodo.3946719
Resources such as re3data.org, fairsharing.org can also help. Repositories with CoreTrustSeal certification are very good choices. Selection criteria (the basics): Repositories that register datasets with a persistent identifer (e.g., DOI) to make citation possible; that provide support to researchers to include as much documentation as possible; that provide a way to link the dataset to related items such as a publication; that provide a way for all creators to include their ORCID and support attribution and credit; and provide licensing options that are as open as possible.
Q. Thank you to Sarah and Shelley (so far) for being so positive about the solutions. Shelley mentioned the challenges institutional repositories face, but also their huge importance. Would you recommend that they pursue Data Seal of Approval as a journey toward data stewardship (and pursue institutional investment toward this)? SS: You can certainly explore this certification, or do your own self-assessment using their tools — they are excellent. One issue that may not have been resolved yet is the requirement for the repository to have a “designated community”. In the past, generalist repositories like yours were dissuaded from seeking official CoreTrustSeal certification — talk to Jon Petters about his experience. However, many agree that the bulk of the elements apply nicely. I’ve not checked in for a few months, and I know they were working on the concern. They may have found a way forward.
Q. Under current models of data sharing, data created by an institution’s researchers are scattered across repositories all over the web. Are any universities using their local institutional repository to index and support discovery of research data stored elsewhere? Is there an automated way to harvest information about data created by an institution’s affiliated researchers (perhaps using CrossRef or DataCite APIs to retrieve metadata associated with particular PIDs)?
SS: This is a really common issue. There a couple approaches. As you mention, use institutional resources to help connect the research products. Also, use PIDs to ensure linkage. Also, use a newer PID – RaID – that is a “Project ID” to help connect research output over time no matter the grant.
Q. So we are listening with a publisher’s perspective – what role do you envisage or wish for from the publishers in this?
SS: There is an excellent resource for publisher to follow when building their data sharing policies. There is also an webinar on this happening soon: Hrynaszkiewicz, I., Simons, N., Hussain, A., Grant, R., & Goudie, S. (2020). Developing a Research Data Policy Framework for All Journals and Publishers. Data Science Journal, 19(1), 5. DOI: http://doi.org/10.5334/dsj-2020-005 Upcoming webinar (sign up by 9 Feb – limited spaces): https://forms.gle/1JW21rkLoz9ZCpka6
Q. Question for Sarah: how does Dryad, as a generalist repository with curation, see its role alongside specialist repositories? For example if data are submitted without using the appropriate standard or providing a sufficient level of completeness, is Dryad’s role to improve metadata and publish in Dryad itself or to facilitate access to an appropriate specialist repository? (Based partly on the Metadata Game Changers and CEDAR initiative Dryad had a couple of years back).
SL: It depends. Where disciplinary repositories are available, we encourage authors to submit to those. We are also still working on the CEDAR initiative to make Dryad metadata more robust in certain disciplines.
Kathleen Gregory (@gregory_km)
Kathleen Gregory is a postdoctoral researcher at the University of Vienna and the Scholarly Communications Lab at the University of Ottawa. She holds a PhD in in Science & Technology Studies, a Master of Science in Library and Information Science, and a Master of Arts in Education. Her research focuses on data practices in scholarly and science communication, particularly practices of data care and curation and what those practices afford. Her past and current projects investigate, e.g., how people discover, make sense of, and reuse research data in academia and in public life.
Sarah Lippincott is a librarian and library consultant with a decade of experience supporting open access, digital scholarship, and scholarly communications through strategic planning, research, service design, facilitation, and communications work. As Head of Community Engagement at Dryad, Sarah works with institutions, funders, and researchers to increase awareness of and engagement with data sharing and data reuse. She received her MLS from University of North Carolina-Chapel Hill and prior to joining Dryad, she worked in a variety of roles within and adjacent to libraries. Sarah started her career as the founding Program Director for the Library Publishing Coalition and went on to coordinate assessment, user experience, and strategic planning activities for a major research library; led strategic consulting services for a digital services agency specializing in open source web development for the cultural heritage sector; and consulted on projects for the Educopia Institute, the Next Generation Library Publishing project (NGLP), the Library Publishing Coalition, Candid, the Preservation of Electronic Government Information (PEGI) Project, the Association of Research Libraries (ARL), and the Digital Public Library of America (DPLA).
Shelley Stall (@ShelleyStall)
Biography Shelley Stall is the Senior Director for the American Geophysical Union’s Data Leadership Program. She works with AGU’s members, their organizations, and the broader research community to improve data and digital object practices with the ultimate goal of elevating how research data is managed and valued.
Aravind Venkatesan (@aravindvenkates)
Aravind Venkatesan is a senior data scientist with the Literature services group, EMBL-EBI. He has a background in molecular biology and holds a doctoral degree in the application of knowledge graphs in biological hypotheses generation. He is experienced in data representation and integration by applying ontologies and knowledge graph related technologies in the life science domain. At EMBL-EBI, he has extensively worked on literature-data integration with a specific focus on FAIRification of text-mined outputs.
Rachael Lammey @rachaellammey
Rachael is the Director of Product at Crossref. She has been involved in many projects and initiatives related to data sharing and citation during the 10 years she has spent at Crossref, and currently co-chairs the RDA/WDS Scholarly Link Exchange (Scholix) WG.