Request for Information (RFI) on the NIH Big Data to Knowledge (BD2K) Initiative Resources for Teaching and Learning Biomedical Big Data Management and Data Science

Notice Number:


Key Dates

Release Date: November 4, 2014

Response Date: December 31, 2014

Related Announcements


Issued by

National Library of Medicine (NLM)


The National Institutes of Health (NIH) recognizes the increasing demands placed on biomedical researchers to share data they generate through federally-funded research projects. As part of its Big Data to Knowledge (BD2K) Initiative, NIH wishes to help the broader scientific community update knowledge and skills in the important areas of the science, storage, management and sharing of biomedical big data. NIH wants to identify the array of timely, high quality courses and online learning materials already available on data science and data management topics for biomedical big data.


The ability to harvest the wealth of information contained in biomedical big data has the potential to advance our understanding of human health and disease; however, the enormity of big data creates major organizational and analytical impediments to rapid translational impact. As biomedical datasets become increasingly large, diverse, and complex, they tax conventional methods for sharing, managing, and analyzing data. Furthermore, researchers’ abilities to capitalize on biomedical big data science-based approaches are limited by poor data accessibility and interoperability, the lack of appropriate tools, and insufficient training.

In response to the opportunities and challenges presented by the dawning era of "Big Data" in biomedical research, the NIH launched the Big Data to Knowledge (BD2K) initiative as a trans-NIH initiative to cultivate the digital research enterprise within biomedicine, to facilitate discovery and support new knowledge, and to maximize community engagement.

BD2K addresses four major aims that, in combination, are meant to enhance the utility of biomedical big data: 1) to facilitate broad use of biomedical digital assets by making them discoverable, accessible, and citable; 2) to conduct research and develop the methods, software, and tools needed to fully analyze biomedical big data; 3) to enhance training in the development and use of methods and tools necessary for biomedical big data science; and 4) to enable a data ecosystem that accelerates both basic and translational discovery as part of a digital enterprise.

Biomedical big data come from many sources, from massive stand-alone datasets generated by large collaborations to the small datasets produced by individual investigators. The value of all these data can be amplified through aggregation and integration. The BD2K initiative is a community-enabled endeavor towards maximizing the collective value of current and future biomedical digital assets to better inform and protect human health. BD2K is part of a larger ecosystem driven by data policies and shared infrastructure.

In the BD2K initiative, the term "Biomedical Big Data" is inclusive of the diverse digital objects which may have impact in basic, translational, clinical, social, behavioral, environmental, or informatics research questions. Such data types may include imaging, phenotypic, genotypic, molecular, clinical, behavioral, environmental, and many other types of biological and biomedical data. They may also include data generated for other purposes (e.g., social media, search histories, economic, geographical, or cell phone data). Finally, they also encompass the metadata, data standards, and software tools involved in data processing and analysis.

As universities begin to implement federal policies requiring them to share research data that have been gathered with the support of public funds1, scientists, graduate students, librarians and other professional and administrative staff are learning about or refreshing what they know about data science and the management of biomedical research data. Some organizations and universities are already engaged in providing courses and other instruction for staff, faculty and student researchers to help them master new skills needed for working with biomedical big data. As part of the BD2K program, a shared Biomedical Big Data resource will soon become a reality, providing public access to shared biomedical research data and research tools2. Curriculum and training materials relating to data science and data management topics will also become part of the shared biomedical big data research resource, including those funded by NIH through other BD2K initiatives.

Information Requested

With this Request for Information (RFI) Notice, the NIH invites interested and knowledgeable persons to inform NIH about existing learning resources covering Biomedical Big Data management and data science topics such as, but not limited to:

Data Management

Data capture and storage

Data mapping and integration of disparate types

Data annotation and curation, including metadata

Version tracking and multi-site data management

Pipelines for data processing and analysis


Modeling (including methods for data integration)

Inference (including large p, small N problems)

Prediction (including Machine Learning)

Quantification of uncertainty for Big Data

Computer Science

Algorithms, algorithmic complexity


Distributed storage or processing


Programming languages

NIH is interested in collections (aggregations) of the topics listed above as well as resources focused on individual topics.

Some needed instructional resources for training in data management and data science are already available. Please identify resources and materials of interest with characteristics such as, but not limited to:

  • Graduate-level short courses, tutorials and workshops (online, in-person or hybrid) that are open to all;
  • Graduate-level online tutorials and modules;
  • Massive Open Online Courses (MOOCs;
  • Curriculum plans or resources (including sample datasets or data management plans used in data management training;
  • Evaluation approaches for online data science or data management courses.

NIH is interested in collections (aggregations) of the above as well as individual topics. Materials for self-guided learning must be available online or for download in standard digital formats.

For each class or learning resource, please provide information that will help NIH identify and locate the resources, such as:

  • The name of the course or resource;
  • A URL for the online resource or a site that describes or offers the resource;
  • The sponsor of the resource, such as organization or instructor.

Additional information, is also welcome, including comments about the course or resource.

How to Submit a Response

All responses must be submitted electronically by December 31, 2014, in the form of an email to , using the subject 'data management'. Responses to this RFI Notice are voluntary. The submitted information will be reviewed by the NIH staff and later made available to the public. Submitted information will not be considered confidential. Do not attach PPT files or other curriculum materials themselves to your response. Responses are welcome from associations and professional organizations as well as individual stakeholders.

This request is for information and planning purposes and should not be construed as a solicitation or as an obligation of the Federal Government or NIH. No awards will be made based on responses to this Request for Information. The information submitted will be analyzed and may be used in reports or presentations. Those who respond are advised that NIH is under no obligation to acknowledge receipt of your comments, or provide comments on your submission. No proprietary, classified, confidential and/or sensitive information should be included in your response. The NIH and the government reserve the right to use any non-proprietary technical information in any future solicitation(s).


Please direct all inquiries to:

Valerie Florance, Ph.D.
National Library of Medicine
Telephone: 301-496-4621