Request for Information (RFI): Making Data Usable--A Framework for Community-Based Data and Metadata Standards Efforts for NIH-relevant Research
Release Date: November 5, 2014
Response Date: December 5, 2014
National Institute of Environmental Health Sciences (NIEHS)
National Cancer Institute (NCI)
National Institute on Aging (NIA)
National Institute of Allergy and Infectious Diseases (NIAID)
National Institute of Biomedical Imaging and Bioengineering (NIBIB)
Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD)
National Institute on Deafness and Other Communication Disorders (NIDCD)
National Institute of Mental Health (NIMH)
National Institute of Neurological Disorders and Stroke (NINDS)
National Institute of Nursing Research (NINR)
National Library of Medicine (NLM)
National Center for Complementary and Alternative Medicine (NCCAM)
National Center for Advancing Translational Sciences (NCATS)
Office of Strategic Coordination (Common Fund)
The mission of the NIH Big Data to Knowledge (BD2K) initiative is to enable biomedical scientists to capitalize more fully on the Big Data being generated by those research communities. BD2K aims to develop new approaches, standards, methods, tools, software, and competencies that will enhance the use of biomedical Big Data1 by supporting research, implementation, and training in data science and other relevant fields. In addressing this goal, an important aspect is to make biomedical research data and resources maximally shareable and reusable. For this reason, BD2K is formulating approaches to
encourage development and facilitate the use of data-related (including metadata) standards more broadly across the biomedical research community and is, therefore, interested in the issues involved in developing community-based standards. This Request for Information (RFI) solicits comments and ideas related to how community standards activities are initiated, developed, disseminated, and sustained and any role that NIH might play in helping to catalyze such efforts.
Community-based data and metadata standards have been generated at many levels across the biomedical research community, from small research consortia to multinational enterprises to facilitate the comparison and integration of data from different sources, to accelerate collaboration, and to enable the broad sharing and reuse of data. Examples include grass-roots efforts such as the Gene Ontology (GO) and more heavily organized efforts such as Logical Observation Identifiers Names and Codes (LOINC), and the Digital Imaging and Communication in Medicine (DICOM) standard for radiological image transfer. Any such effort must not only address specific but, also, a set of common issues, the latter including (but not necessarily limited to) definition of mission and scope, governance and operational procedures, such as processes for creating, publishing and maintaining the standards to make them useful and widely accepted. Different groups have employed a range of strategies with variable degrees of complexity, formality and documentation to carry out their activities in support of community-based standards development.
The opportunities and value for secondary uses of data are increasing, i.e., scientists who are not those who originally generated the data are increasingly able to extract new knowledge from them. Researchers may combine existing data sets across studies and integrate different and complex data types to address questions unanticipated by the original investigator(s). The ability to do this is highly affected by the extent and quality of the annotation of the original data sets. The evidence suggests that without appropriate data and metadata standards, meaningful data sharing and the promise of new knowledge created from those data, are not possible2 Thus, the widespread use of high quality data and metadata standards, as part of a larger effort to promote data access and reuse, is essential if NIH is to fully capitalize on the explosion of biomedical Big Data for advancing fundamental knowledge of complex human biology and its translation to human health.
NIH recognizes there are already numerous standards groups, both public and private, across scientific disciplines. Many of these have developed proven processes, infrastructure, and community support methodologies. The NIH is interested in exploring how the BD2K initiative can contribute to the improvement of policies, governance, administrative procedures, and funding to support community-based standards (CBS) efforts to develop and/or extend data and/or metadata standards, and how these activities relate to other ongoing or nascent biomedical research activities. Within this context, ‘community’ encompasses a broad range of stakeholders who may be engaged in the process of data standards development and use, including technical developers, librarians, science domain experts, researchers, information scientists, vendors, funders, publishers, and other end users.
All stakeholders with an interest in CBS are invited to provide information. Your response can include, but is not limited to, your membership within an industry, government, or academia. If you choose, you can categorize your area of expertise by including all that apply:
- Standards Efforts Data Management Clinical Science
- Basic Science Research
- Information Science (e.g., biomedical informatics)
- Library Science
The NIH is seeking information to include, but not limited to, the following areas:
- Effective approaches, processes, and activities that could advance the community-based standards landscape (e.g., creating a collaborative workspace or an advising structure toward standards development, extension, or adoption).
- Gaps in community-based data standards of relevance to biomedical research, including real use-cases (e.g., emerging fields and technologies, or research domains with multiple existing data standards that could benefit from additional work, integration and/or reconciliation).
- Lessons learned from existing CBS efforts, particularly examples with field-tested processes and infrastructure or known examples of failures by CBS efforts.
- Common challenges in CBS development (e.g., methods for community engagement or building interoperability with other related standards).
- Considerations for evaluating progress and milestones to assess data standards development and utility.
- Effective approaches for addressing the need to sustain useful standards, and to update existing standards as a field develops.
Submitting a Response
All responses must be submitted to BD2K_CBS_RFI@niehs.nih.gov by December 5, 2014. Please include the Notice number NOT-ES-15-002 in the subject line. Response to this RFI is voluntary. Responders are free to address any or all of the categories listed above. The submitted information will be reviewed by the NIH staff. Submitted information will be considered confidential.
Responses to this RFI are voluntary. Please do not include any proprietary, classified, confidential, or sensitive information in your response. The NIH will use the information submitted in response to this RFI at its discretion and will not provide comments to any responder's submission. The collected information will be reviewed by NIH staff, may appear in reports, and may be shared publicly on an NIH website.
The Government reserves the right to use any non-proprietary technical information in summaries of the state of the science, and any resultant solicitation(s). The NIH may use the information gathered by this RFI to inform the development of future funding opportunity announcements.
This RFI is for information and planning purposes only and should not be construed as a solicitation or as an obligation on the part of the Federal Government, the National Institutes of Health (NIH), or individual NIH Institutes and Centers. No basis for claims against the U.S. Government shall arise as a result of a response to this request for information or from the Government’s use of such information.
1 The term 'Big Data' is meant to capture the opportunities and challenges facing all biomedical researchers in accessing, managing, analyzing, and integrating datasets of diverse data types [e.g., imaging, phenotypic, molecular (including various '–omics'), exposure, health, behavioral, and the many other types of biological and biomedical and behavioral data] that are increasingly larger, more diverse, and more complex, and that exceed the abilities of currently used approaches to manage and analyze effectively. Big Data emanate from three sources: (1) a small number of groups that produce very large amounts of data, usually as part of projects specifically funded to produce important resources for use by the entire research community; (2) individual investigators who produce large datasets, often empowered by the use of readily available new technologies; and (3) an even greater number of sources that each produce small datasets (e.g. research data or clinical data in electronic health records) whose value can be amplified by aggregating or integrating them with other data. See http://bd2k.nih.gov/about_bd2k.html#sthash.IF3zQOrz.dpbs.
2 See the report of the Data and Informatics Working Group of the Advisory Committee to the Director, NIH (ACD), available at: http://acd.od.nih.gov/Data%20and%20Informatics%20Working%20Group%20Report.pdf.
Please direct all inquiries to:
Cindy P. Lawler, Ph.D.
The National Institute of Environmental Health Sciences (NIEHS)