NIH Request for Information (RFI): Strategies for NIH Data Management, Sharing, and Citation

Notice Number: NOT-OD-17-015

Key Dates
Release Date: November 14, 2016
Response Date: New Date - January 19, 2017 as per issuance of NOT-OD-17-025 (old date - December 29, 2016)

Related Announcements
NOT-OD-17-025
NOT-OD-16-133    

Issued by
National Institutes of Health (NIH)

Purpose

Introduction

This Request for Information (RFI) seeks public comments on data management and sharing strategies and priorities in order to consider: (1) how digital scientific data generated from NIH-funded research should be managed, and to the fullest extent possible, made publicly available; and, (2) how to set standards for citing shared data and software.

Response to this RFI is voluntary. Responders are free to address any or all of the items in Sections I and II, delineated below, or any other relevant topics respondents recognize as important for NIH to consider. Respondents should not feel compelled to address all items. Instructions on how to respond to this RFI are provided in “Concluding Comments.”

Section I.         Data Sharing Strategy Development
Section II.        Inclusion of Data and Software Citation in NIH Research Performance Progress Reports   (RPPR) and Grant Applications

Background

NIH has maintained the principle that “data sharing is essential for expedited translation of research results into knowledge, products, and procedures to improve human health.”[1] The agency has a long history and continued commitment to ensure that, to the fullest extent possible, the results of federally-funded scientific research are made available to and are useful for the general public, industry, and the scientific community (https://grants.nih.gov/policy/sharing.htm).  Further, effective data sharing relies upon appropriate identification, adoption, and crediting of good data management and sharing practices, thus, NIH is adopting principles to make data “FAIR” (Findable, Accessible, Interoperable, and Reusable; http://www.nature.com/articles/sdata201618).

On February 22, 2013, the White House Office of Science and Technology Policy (OSTP) released its memorandum entitled Increasing Access to the Results of Federally Funded Scientific Research (http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf). This memorandum directs federal agencies and offices to develop plans to ensure peer-reviewed publications and digital scientific data resulting from federally-funded scientific research are accessible to the public, industry, and the scientific community to the extent feasible and consistent with applicable laws and policies. In coordination with the U.S. Department of Health and Human Services (HHS) (http://www.hhs.gov/open/public-access-guiding-principles/index.html), NIH responded to the memorandum by developing the National Institutes of Health Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research (“NIH Plan”), released in February 2015.[1] In order to implement the NIH Plan and move forward with ongoing commitments to the data sharing enterprise, NIH is considering priorities for data management and sharing (e.g., which data types have the greatest value for sharing, the costs and value of sharing different data types, including the long-term resource implications), and how to expand upon its 2003 Data Sharing Policy.[2]

Data and software citation allows important products of scientific research programs to be recognized and may enable more quantitative assessment of both effective sharing approaches and valuable data and software resources. Citation of data and software may provide additional incentives, as data and software sharing citation metrics could help to quantify these activities. Such data citation metrics would help to identify valuable data or software, to ensure that the researchers who produced them are appropriately attributed, and to facilitate broader re-use of valuable data and software by the broad research community.

Scholarly publications typically include citations to previously published research articles where these citations provide context for the motivation of the current study and the interpretation of the results presented in the publication. Nonetheless, citations in many research articles are limited to previous publications and the concepts within them, and do not cite the specific scientific data, software tools, or workflows that underlie them. However, expectations of scholarly citation are evolving, and there is an apparent groundswell of support for data and software citation among the scientific research community.[3]

Feedback obtained through this RFI is intended to be used to inform the development of NIH policies pertaining to the management and sharing of digital scientific data generated from NIH-supported research, including how these data and software should be cited, and other applicable NIH activities. Additionally, to support the long-term preservation of data and sustainability of repositories holding such data, NIH released the related “Request for Information (RFI): Metrics to Assess Value of Biomedical Digital Repositories” (http://grants.nih.gov/grants/guide/notice-files/NOT-OD-16-133.html).

Information Requested

SECTION I. Data Sharing Strategy Development
NIH recognizes that many factors must be considered when determining what, when, and how data should be managed and shared. These factors include, for example, the purpose for sharing, supporting data re-use and reproducibility, maturity of the science, the infrastructure uniqueness of the data, and ethical considerations.

The NIH seeks comment on any or all of the following topics to help formulate strategic approaches to prioritizing its data management and sharing activities:

  • The highest-priority types of data to be shared and value in sharing such data;
  • The length of time these data should be made available for secondary research purposes, the appropriate means for maintaining and sustaining such data, and the long-term resource implications;
  • Barriers (and burdens or costs) to data stewardship and sharing, and mechanisms to overcome these barriers; and
  • Any other topics respondents recognize as important for NIH to consider.

SECTION II. Inclusion of Data and Software Citation in NIH Research Performance Progress Reports (RPPR) and Grant Applications

Currently, NIH grantees are required to report “other products of the research,” including data, databases, and software, in section C5a of their annual RPPR submission (http://grants.nih.gov/grants/rppr/rppr_instruction_guide.pdf). However, limited guidance is available on how data, databases, and software should be reported or cited.

NIH recognizes that data and software citation indicates proof of productivity that translates to publications and patents. More thorough reporting of data and software products in the RPPR and in Competitive Grant Renewal applications may strengthen documentation of productivity and may also identify projects and investigators who most effectively share data and software.   

The NIH seeks comment on any or all of the following topics: 

  • The impact of increased reporting of data and software sharing in RPPRs and competing grant applications to enrich reporting of productivity of research projects and to incentivize data sharing;
  • Important features of technical guidance for data and software citation in reports to NIH, which may include:
    • Use of a Persistent Unique Identifier within the data/software citation that resolves to the data/software resource, such as a Digital Object Identifier (DOI) (https://www.iso.org/obp/ui/#iso:std:iso:26324:ed-1:v1:en)
    • Inclusion of a link to the data/software resource with the citation in the report
    • Identification of the authors of the data/software products
    • Granularity of data citations: when might citations point to an aggregation of diverse data from a single study and when might each distinct data set underlying a study be cited and reported separately
    • Consideration of unambiguously identifying and citing the digital repository where the data/software resource is stored and can be found and accessed;
  • Additional routes by which NIH might strengthen and incentivize data and software sharing beyond reporting them in RPPRs and Competitive Grant Renewals applications;
  • Any other topics respondents recognize as important for NIH to consider.

Submitting a Response

Comments on the topic areas of interest should be submitted electronically to the following webpage: http://osp.od.nih.gov/content/nih-request-information-strategies-nih-data-management-sharing-and-citation  or mailed to: Office of Science Policy (OSP), National Institutes of Health, 6705 Rockledge Drive, Suite 750, Bethesda, MD 20892, or by fax to: 301-496-9839 by December 29, 2016.

This RFI is for planning purposes only and should not be construed as a policy, solicitation for applications, or as an obligation on the part of the Government to provide support for any ideas identified in response to it. Please note that the United States Government will not pay for the preparation of any information submitted or for its use of that information.

Responses will be compiled and shared publicly in an unedited version after the close of the comment period. Please do not include any proprietary, classified, confidential, or sensitive information in your response. The Government reserves the right to use any non-proprietary technical information in summaries of the state of the science, and any resultant solicitation(s). The NIH may use information gathered by this RFI to inform development of future funding opportunity announcements and policy development.

We look forward to your input and hope that you will share this RFI document with your colleagues.

References

[1] National Institutes of Health Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research https://grants.nih.gov/grants/NIH-Public-Access-Plan.pdf.  For the purpose of the NIH Plan, “scientific data” is defined as the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, including data sets used to support scholarly publications. Scientific data does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, invention disclosures or patent applications, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens.

[2] 2003 Final NIH Statement on Sharing Research Data - https://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.htm.

[3] Ongoing data and software citation activities:

Inquiries

Please direct all inquiries to:

Office of Science Policy
Division of Scientific Data Sharing Policy
Telephone: 301-496-9839
Email: SciencePolicy@mail.nih.gov