February 20, 2024
Office of Data Science Strategy (ODSS)
CDEs are a type of data standard used for collection, comparable analysis, and exchange of data in biomedical research settings. CDEs are standardized, precisely defined questions paired with a set of specific allowable responses, used systematically across different sites, studies, or clinical trials to ensure consistent data collection https://cde.nlm.nih.gov/home. They provide a common language for systematic and consistent capture of research data and routinely collected real-world data. CDEs can range from single data elements such as height and weight, to a bundle of questions that evaluate concepts such as depression and quality of life. A glossary of terms relevant to CDEs can be found on the RFI response website (https://datascience.nih.gov/cde-rfi) to provide further background on CDE use within NIH ecosystems.
Data consistency is a key factor contributing to its interoperability, which is one of the FAIR data principles guiding scientific data management and stewardship. Biomedical data are often collected in different ways for various study purposes, using different data models, which presents significant challenges for collaborative research, meta-analysis, and management/sharing of data. Use of CDEs makes health data speak the same language and become interoperable, both structurally and semantically. Since CDEs can be linked across common data models (CDMs) and standard vocabularies/terminologies used in healthcare, such as SNOMED CT, LOINC , RxNORM , and UMLS (among others catalogued in public repositories such as the National Library of Medicines Value Set Authority Center), they provide means to align clinical research studies with real-world data from electronic health records, healthcare coverage claims, patient-generated data streams, and patient-reported outcomes. CDEs can be expressed in machine computable formats (as defined in the Glossary) to enable mapping, transforming, and combining of existing data, and in turn, create big data resources by readily integrating data across disparate sources. Implementation of CDEs has potential to accelerate knowledge discovery by harnessing the power of innovative data methods such as machine learning and artificial intelligence.
Resources established by NIH cross-cutting initiatives such as the Rapid Acceleration of Diagnostics (RADx) COVID-19 initiative, and the NIH CDE Repository have recently raised general awareness and facilitated use of CDEs in NIH intramural and extramural research communities. The successful adoption of CDEs in NIH institutes programs has accelerated the pace of new scientific breakthroughs. These resources also highlight the need to standardize a minimum core set of CDEs across NIH Institutes and Centers.
The NIH Scientific Data Council (SDC), an internal NIH committee made up of senior NIH Institute and Center (IC) leaders and data scientists, has established a governance process to designate CDEs that meet criteria (such as human & machine readability, semantically clear definitions of variable, measure prompt and response) as NIH-endorsed and publish them in the NIH CDE Repository, but no minimum core set of CDEs has been established for use across all clinical studies/trials supported or conducted by ICs.
Beyond NIH, a consortium of mental health research funders and journals has launched the Common Measures in Mental Health Science Initiative to identify common measures for mental health conditions that funders and journals can require all researchers to collect, in addition to any other measures they require for their specific study. For example, mCODE™ (Minimal Common Oncology Data Elements) allows oncology electronic health records (EHRs) exchange between health systems and enables comparative effectiveness analysis (CEA) of cancer treatments through assembling a core set of structured data elements. While the NCI is participating in this initiative in an attempt to harmonize cancer CDEs in EHRs and cancer research, without an effort to standardize a minimum core set of CDEs for use across the NIH, these and other important data initiatives miss the opportunity for data to be more easily integrated and analyzed.
The 21st Century Cures Act highlights the need for a core set of common data elements and associated value sets. Development of a core set of CDEs will greatly enhance data interoperability. Recently, the NIH SDC has directed a new CDE working group to provide recommendations on a consistent set of minimum core CDEs that could be utilized across NIH clinical research/trials. The minimum core CDEs would not preclude the use of additional CDEs that are specific for clinical studies/trials. Social determinants of health (SDoH) core CDEs have been identified as priorities, because of increased awareness that social, economic, and environmental factors influence health equity. This RFI seeks feedback on the development and implementation of CDEs including a set of minimal core CDEs across the NIH programs.
Despite all the efforts and progress, wide adoption of CDEs across various clinical domains is not without challenges. For example, the presence of numerous duplicative CDE sources in some clinical domains costs researchers extra time and effort in selecting the appropriate CDEs for use, especially when looking to integrate responses with real-world data. Technologies and tools are needed to map CDEs, to transform data, and to align CDEs with controlled vocabularies, terminologies, and existing data management systems. This RFI is also an NIH effort to understand these challenges and opportunities, to inform appropriate NIH guidance and mechanisms to lower the barriers to CDE use and improve the ability to aggregate and integrate CDE based data.
Note: Any Personally Identifiable Information or Protected Health Information will be restricted in its direct use to those interacting with participants (though aggregate-level measures may be derived for use in study datasets). All patient data to be used for study must be consented by the participant before the data can be used.
Specifically, NIH seeks comments on any or all of the following topics:
1. Recommended CDEs for NIH-funded clinical research/trials, including a set of minimal core CDEs.
Development of CDEs will facilitate data interoperability across NIH programs. Due to the heterogeneous nature of the data collected in various clinical domains, one viable approach to determining a set of recommended CDEs is using CDEs that are important for identifying cohorts for study, e.g., in categories (akin to Classification Schemes as outlined in International Standard 11179 where questions of a similar nature are grouped together). This approach allows more detailed, study-specific data elements to be added in each category as needed.
NIH is seeking comments on a set of minimum core CDEs in the demographics/personal characteristics category. We are also seeking comments on recommended CDEs in the clinical domains including autoimmune diseases and immune-mediated diseases, and high level (potential screening-purpose) CDEs for the SDoH domain as shown below.
Category: Demographics/personal characteristics
a. Please provide comments on the above minimal core CDEs that might be required for all NIH funded or conducted clinical research/trials.
Some examples of the categories, beyond those above, are Allergies, Adverse Events, Biospecimens, Clinical Tests, Informed Consent, Demographics, Diagnosis, Enrollment, Equipment, Health Assessments, Vital Status, Genomics, Imaging, Immunizations, Laboratory Tests, Language, Marital Status, Medical History, Medications, Patient/Person, Outcomes (including patient reported), Procedures, Treatment.
High-level CDEs are an approach for structuring a question to minimize the burden of data collection. This approach still captures vital information about the social and environmental factors recognized as important to assessing SDoH. It bundles a number of factors that might be asked individually into one high-level question and the questions would be asked with a specific set of permitted responses. Some examples of high-level SDoH CDEs are shown below.
1) Would you say that your life has been impacted adversely (negatively, badly) by any of the following (current or past)? Please check all that apply.
2) In the past year, have you – or any family members you live with – been unable to get any of the following when it was really needed? Select all that apply.
a. Please provide comments on the above minimal core CDEs (please see above) that might be required for all NIH funded or conducted clinical research/trials.
b. Please provide comments on CDEs for autoimmune diseases and immune-mediated diseases, as well as CDEs for clinical and research domains and high-level CDEs for the SDoH domain.
c. Please provide suggestions on alternative approaches to determining a set of minimum core CDEs which might be required for all NIH funded or conducted clinical research/trials, as well as whether any categories of importance to research across the NIH that were missed.
2. Technology standards for using NIH CDEs. NIH seeks broad input on tools and technologies that could enhance the use of NIH CDEs. NIH CDEs are defined as CDEs recommended or required by an NIH body, and/or found in the NIH CDE Repository.
a. In what ways could NIH CDEs in studies best be made discoverable?
b. Provide feedback on how NIH could facilitate the selection of appropriate NIH CDEs, to be used without modification (except those modifications that are allowed that still maintain interoperability), especially in the clinical domains where numerous CDE sources exist.
c. Identify tools and processes to assist in data transformation and mapping between existing data and NIH CDEs.
d. Suggest resources to streamline and design upstream data collection (e.g., new case report forms) that support downstream data harmonization using NIH CDEs.
e. Provide feedback on resources needed to ensure that CDEs are tagged with appropriate terminology codes to unambiguously define their meaning and facilitate mapping across coding systems. Specifically, how NIH could facilitate access to authoritative and validated ontologies/crosswalks between the commonly used healthcare and terminology standards such as SNOMED CT, ICD, LOINC, and drug terminologies; and how these may align with efforts to make healthcare data readily integrated with CDE-based research by use of such standards like the Health Level Seven International (HL7®) Fast Healthcare Interoperability Resources (FHIR®) standard.
3. NIH policies and governance on CDEs. NIH seeks input on policies and governance that could facilitate and incentivize broader CDE usage in research and in data sharing and management. Please provide your feedback on:
a. Suggestions on enabling the adoption of the proposed requirement to use minimum core CDEs in all NIH-funded clinical research/trials (example: NIH should provide the necessary mapping tools for data transformation).
b. Potential difficulties or obstacles foreseen in the adoption of the proposed requirement to use minimum core CDEs in all NIH-funded clinical research/trials.
c. Useful policies and governance on establishing new CDEs.
d. Useful policies and governance regarding CDEs for data sharing and management.
Glossary: A glossary of key terms can be found on the response website https://datascience.nih.gov/cde-rfi.
Comments should be submitted electronically to the following webpage https://datascience.nih.gov/cde-rfi, or submit a PDF response by email to [email protected]. To ensure consideration, responses must be submitted by 11:59:59 pm (ET) April 20, 2024. Responses to this RFI are voluntary and may be submitted anonymously. You may voluntarily include your name and contact information with your response. If you choose to provide NIH with this information, NIH will not share your name and contact information outside of NIH unless required by law. Responses from professional organizations are welcome and encouraged.
This RFI is for informational and planning purposes only and is not a solicitation for applications or an obligation on the part of the Government to provide support for any ideas identified in response to it. Please note that the Government will not pay for the preparation of any information submitted or for use of that information.
Responses may be compiled and shared publicly as unedited version in an anonymous manner after the close of the comment period. Please do not include any proprietary, classified, confidential, or sensitive information in your response. The Government reserves the right to use any non-proprietary technical information on public websites, in reports, in summaries of the state of the science, in any possible resultant solicitation(s), grant(s), or cooperative agreement(s), or in the development of future funding opportunity announcements. The NIH may use information gathered by this RFI to inform development of future guidance and policy directions.
We look forward to your input and hope you will share this RFI with your colleagues.
Please direct all inquiries to:
Belinda Seto, Ph.D.
Office of Data Science Strategy
National Institutes of Health
Email: [email protected]