Request for Information (RFI): Input on Development of a NIH Data Catalog

Notice Number: NOT-HG-13-011

Key Dates
Release Date: June 6, 2013
Response Date: June 25, 2013

Issued by
National Human Genome Research Institute (NHGRI)


This Request for Information (RFI) is to solicit comments and ideas for the development and implementation of an NIH Data Catalog as part of the overall Big Data to Knowledge (BD2K) Initiative.


Biomedical research is becoming more data-intensive as researchers are generating and using increasingly large, complex, and diverse datasets. This era of ‘Big Data’ in biomedical research taxes the ability of many researchers to release, locate, analyze, and interact with these data and associated software due to the lack of tools, accessibility, and training. In response to these new challenges in biomedical research, and in response to the recommendations of the Data and Informatics Working Group (DIWG) of the Advisory Committee to the NIH Director(, NIH has launched the trans-NIH Big Data to Knowledge (BD2K) Initiative.

The long-term goal of the BD2K Initiative is to support advances in data science, other quantitative sciences, policy, and training that are needed for the effective use of Big Data in biomedical research. (The term “biomedical” is used here in the broadest sense to include biological, biomedical, behavioral, social, environmental, and clinical studies that relate to understanding health and disease). The term ‘Big Data’ refers to datasets that are increasingly larger, more complex, and which exceed the abilities of currently used approaches to manage and analyze. “Big Data” is also meant to capture the opportunities and address the challenges facing all biomedical researchers in accessing, managing, analyzing and integrating large datasets of diverse data types. Such data types may include imaging, phenotypic, molecular (including omics), clinical, environmental, behavioral, and many other types of biological and biomedical data. “Big Data” also includes data generated for other purposes (e.g. social media, search histories, cell phone data) when they are repurposed and applied to address health research questions. Biomedical Big Data primarily emanate from three sources: (1) a small number of groups that produce very large amounts of data, usually as part of projects specifically funded to produce important resources for use by the research community at large, or large collections of electronic health records; (2) individual investigators who produce large datasets for their own project, but which might be broadly useful to the research community at-large; (3) an even greater number of investigators who each produce small datasets whose value can be amplified by aggregating or integrating them with other data.

One of the DIWG recommendations was to promote data sharing through the establishment of central and federated Data Catalogs. Among the issues raised were how to establish minimal and relevant metadata to facilitate data sharing, broad adoption of standards to enhance data retrieval, as well as data citation and adoption of the catalog by the broader biomedical community.

BD2K is now considering the development of a biomedical Data Catalog to make biomedical research data findable and citable, as PubMed does for scientific publications. Such a Data Catalog would make it easier for researchers to find, share, and cite data, as well as the publications and grants that they are associated with. A Data Catalog is distinct from a data repository, but would help make data in such repositories more easily findable and citable in a consistent manner. In addition to supplying core, minimal metadata to ensure a valid data reference, it is envisioned that a Data Catalog would include links out to the location of the data, to the NIH Reporter record of the grant that supported the research, to relevant publications within PubMed or journals, and possibly to associated software or algorithms.

An NIH BD2K Working Group charged with exploring the concept of a Data Catalog has determined that it would be important to query a broad mix of Data Catalog designers, stakeholders, and potential users about their experiences and advice to the NIH as it considers development of a Data Catalog. In order to better appreciate the issues that need to be addressed and the possible solutions that could lead to implementation of a Data Catalog, the NIH thus seeks input from the broader research communities.

Establishing such a Data Catalog could also be part of NIH’s response to the White House Office of Science and Technology Policy February 2013 memorandum, “Increasing Access to the Results of Federally Funded Scientific Research.”

Information Requested

To maximize the impact of this potentially valuable community resource and facilitate its use by scientists with a broad range of expertise, we seek input on a proposal to develop a Biomedical Data Catalog. Your comments can include but are not limited to the following categories:

  • Your area of expertise and interest in a Data Catalog. This may include, biomedical researcher, informatics professional, library sciences expert, publisher, professional society, or participation in another stakeholder community.
  • The critical barriers, opportunities, or incentives to making data more easily discoverable and citable, and the possible impact of a Data Catalog.
  • Possible Data Catalog linkage to existing data repositories to ensure data within the repository are findable and how to ensure that such linkages remain up to date and accurate.
  • If your research field has no existing repositories to store data, comments can include how a Data Catalog might usefully link out to the data and where such data might be located.
  • How the lack of a data repository might affect data discoverability, usability, and citability.
  • The useful level of granularity for a Data Catalog entry. For instance, a Data Catalog entry may correspond to all the data in a publication, only a particular data type within a given study, or individual dataset from a single experiment.
  • Any potential requirements for Data Catalog registration of data by NIH-funded or supported investigators.
  • Whether a Data Catalog entry benefits from a scientific abstract that describes the data, including its potential uses and the rationale for its creation.
  • The feasibility of the development of a Data Catalog to potentially support future uses.
  • The appropriate metrics to use to create a successful Data Catalog.

Submitting a Response

All responses must be submitted via email to by June 25, 2013. Please include the Notice number NOT-HG--13-011 in the subject line. Response to this RFI is voluntary. Responders are free to address any or all of the categories listed above. The submitted information will be reviewed by the NIH staff.

This request is for information and planning purposes only and should not be construed as a solicitation or as an obligation on the part of the Federal Government. The NIH does not intend to make any awards based on responses to this RFI or to otherwise pay for the preparation of any information submitted or for the Government's use of such information.

The NIH will use the information submitted in response to this RFI at its discretion and will not provide comments to any responder’s submission. However, responses to the RFI may be reflected in future funding opportunity announcements. The information provided will be analyzed and may appear in reports. Respondents are advised that the Government is under no obligation to acknowledge receipt of the information received or provide feedback to respondents with respect to any information submitted. No proprietary, classified, confidential, or sensitive information should be included in your response. The Government reserves the right to use any non-proprietary technical information in any resultant solicitation(s).


Please direct all inquiries to:

Jennie Larkin, Ph.D.
National Heart Lung and Blood institute
National Institutes of Health
6701 Rockledge Dr.
Rockledge II, room 8200
Bethesda, MD 20892-7940
Telephone: (301) 435-0513