Request for Information (RFI): Data Annotation in Biomedical Core Research Facilities and Related Needs for Community Education and Training

Notice Number: NOT-OD-16-091

Key Dates
Release Date: May 17, 2016
Response Date: June 30, 2016

Related Announcements
None

Issued by
National Institute of General Medical Sciences (NIGMS)
National Library of Medicine (NLM)
Division of Program Coordination, Planning and Strategic Initiatives, Office of Research Infrastructure Programs (ORIP)

Purpose

This Notice solicits feedback from stakeholders who operate or use university core research facilities or research centers that provide specialized services to generate, manage and analyze high-throughput data in biomedical research. The issuing components recognize that there are multiple contributors involved in creating, describing, and handling of high-throughput biomedical data and is interested in learning about their practices related to these activities.

Background

Biomedical core facilities provide access to state-of-the-art instrumentation and services in a variety of research areas, such as genomics, transcriptomics, proteomics, metabolomics, cell biology, neuroscience, and bioimaging. These core facilities rely on technologies including sequencing, electron microscopy, NMR, mass spectroscopy, flow cytometry, optical imaging, MRI, CT, and others. The core research facilities have a crucial role in ensuring that high-quality, high-throughput research data are generated. They foster sharing and future re-use of the data by supporting data annotation with essential descriptive information: metadata. Metadata are terms that describe qualities about a data set (such as file size or date created) and provide essential information that helps others find and understand the scope of a data set (such as topical keywords or digital identifiers). Metadata are vital to the digital research enterprise as they can enable reproducible research and simultaneously support more effective data sharing. High-quality metadata can also help ensure that the research projects meet NIH’s goals of rigor and reproducibility (https://www.nih.gov/research-training/rigor-reproducibility).  

In order to more fully capitalize on the wealth of information contained in biomedical high-throughput data, the NIH launched the Big Data to Knowledge (BD2K, https://datascience.nih.gov) initiative. BD2K aims to develop new approaches, standards, methods, tools, software, and competencies that will enhance the use of biomedical Big Data by supporting research, implementation, and training in data science and other relevant fields. In addressing this goal, an important aspect is to make biomedical research data and resources maximally shareable and reusable.

In connection with this initiative, the NIH is interested in learning about the cores’ practices in production of high-quality, high-throughput data and their annotation with high-quality metadata. The NIH is interested in understanding the role(s) that academic researchers, staff scientists at the university core research facilities, and other university staff such as librarians or IT specialists play in creating essential metadata associated with the data that are generated. The NIH is interested in the roles these different groups have in helping to document features of the data, in processes used to decide on appropriate metadata, in techniques used to ensure that the metadata are robust and complete, and in sources of standards or best practices that guide creation of metadata for datasets generated in a core research facility.  

Information Requested

The NIH invites university research communities engaged in and supporting biomedical investigations to comment on practices related to the generation and handling of high-throughput data. Any comments will be helpful and may include but are not limited to the following areas:

1. Core research facility or research center and its services

a) Type of technical and data service(s) the research core/center provides;

b) Current practices for documenting the generation of high-throughput data and describing the output of the data itself.

2. Metadata associated with the data generated by the research core/center

a) The extent to which metadata related to data sets generated by the research core/center are documented and stored in the records of the facility;

b) The nature of metadata developed at the core/center to accompany the data generated, such the operating system, level of specificity, terminology used, or other factors;

c) The way metadata related to a data set are transmitted to the investigators who requested the data.

3. Institutional partnerships

Currently existing institutional partnerships that provide support and training for effective data generation, annotation, and sharing. Partners may include institutional computer/IT support, library expertise, or others.

4. Approaches to learn skills and concepts

Approaches by which staff scientists and others involved in generating and managing the data set learn skills and concepts relating to data generation, annotation, curation, and sharing.

5. Unmet needs

Any unmet needs relating to the development and retention of metadata for research data sets generated in core research centers, including needs for training, development of standards, implementing best practices or supporting institutional partnerships that assist effective data sharing.

6. Other comments

Additional comments on data annotation in biomedical core research facilities and related needs for community education and training.

Please identify your expertise or area of interest, for example:
a) Manager of or staff at a biomedical core facility;
b) Biomedical researcher;
c) Expert in data management, data evaluation or data curation;
d) Other.

How to Submit a Response

Response to this RFI must be submitted on the website https://dpcpsi.nih.gov/MetadataRFI by June 30, 2016.

Responses to this RFI are voluntary. This RFI is for planning purposes only and should not be construed as a solicitation for applications or an obligation on the part of the Federal Government, the National Institutes of Health, or individual NIH Institutes or Centers.  The government will not pay for the preparation of any information submitted or for the government’s use of that information.
The information provided will not be considered confidential. The NIH will use the information submitted in response to this RFI at its discretion; the submitted information will be reviewed by the NIH and shared with the NIH Institutes and Centers that have an interest in this matter. The NIH will not acknowledge receipt of information submitted or provide comments to any responder. No proprietary, classified, confidential, or sensitive information should be included in your response. The government reserves the right to use any non-proprietary technical information in any resultant solicitation(s), policies or procedures; responses to the RFI may be reflected in future funding opportunity announcements. The information provided will be analyzed, may appear in reports, and may be shared publicly on an NIH website.

Inquiries

Please direct all inquiries to:

Alena Horska, Ph.D.
Office of Research Infrastructure Programs (ORIP)
Telephone: 301-435-0815
Email: ORIP-RFI-METADATA@nih.gov