|
A110: NIH Response to OMB
TO: John Callahan Assistant Secretary for Management and Budget FROM: Director, NIH Subject: Request for Comments on Clarifying Changes to Proposed Revision on Access to Research Data. The OMB has solicited comments on their clarifying changes to the proposed revision of A110 regarding access to research data. Specifically, they asked for comments on the following: Definition of "Data": NIH concurs with the clarification of the definition of data. Data are now referred to as “research data” and are defined as “the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues.” The exclusion of information that would constitute a clearly unwarranted invasion of personal privacy, including information that could identify a specific participant in a research study, is absolutely essential. We recommend that in section III. A., the term "files" be replaced by "information" in the sentence "Moreover, under the proposed definition, 'research data' would exclude (A)… and (B) personnel and medical files and similar files the disclosure of which would constitute a clearly unwarranted invasion of personal privacy…." This would ensure that the privacy of such information is protected equally well if it resides in a medical file or record or in some other research format. The changes proposed by OMB mitigate many of our concerns about the agency obtaining such data, redacting it, and holding both the redacted and unredacted data sets. Scope: The earlier language applied the amendment to Federal policies and rules, which caused concern to NIH since it opened a vast arena of Federal activities. Such concerns have been addressed in the revised language, which focuses specifically on regulations. The proposed further limitation to regulations of significant economic impact is also desirable. It would have been especially useful if the application of this amendment also focused on significant scientific findings. When a regulatory agency cites research in the regulatory process, that research may be critically or marginally applicable to that regulation. A brief review of regulations revealed that some cite hundreds of research studies, all of which would be subject to FOIA under this amendment. It would greatly reduce the burden of this legislation if access were afforded to data from only those studies that were critical in the formulation of the regulation. There are several existing mechanisms for making available data to other researchers, none of which are as burdensome as FOIA. These mechanisms include archives, such as the Inter-University Consortium for Political and Social Research (ICPSR) at the University of Michigan, where data are made available at a modest cost and come with complete documentation and often technical support for the user. Some investigators make data available on the web, building in protections for privacy through the software while allowing analysis of the data. Yet another mechanism involves data repositories. Data repositories maintain control of the data but receive and fulfill requests for analysis. The National Center for Health Statistics serves as a data repository in cases where the risk to privacy is too great to allow the data to leave their site. Nevertheless, they want to allow others to use the data for their own research and thus conduct the analyses “on demand” as specified by outside investigators. In each of these existing models, the goals of the legislation are already being met. We urge that the scope of this amendment be restricted to "data not otherwise already available for reanalysis". Definition of "Published": The OMB proposed clarification of the definition of published research findings is very valuable. Published findings are now defined as those published in a peer reviewed scientific or technical journal or those publicly and officially cited by a Federal agency in support of an action. NIH concurs with the clarified definition and finds that it will eliminate many concerns about premature release of data while fulfilling the spirit of the original language. Cost reimbursement: Determination of costs associated with providing data and mechanisms for reimbursement presents significant challenges. At this time it is not clear how those challenges will be met. The costs associated with providing data under this amendment are likely to be substantially greater than costs incurred to fulfill current FOIA requests. Unlike data that are currently provided through FOIA, the data covered by the proposed amendment are not in the possession of the agency. Thus, in order to administer the request for data, the agency must request the data from the investigator or the grantee organization, import the data set, review and redact the data set, and release it to the requestor. The process of importing and exporting data sets can be difficult and expensive, especially if the investigator used software that was custom made for the project. The process of reviewing and redacting data will require the time and skills of individuals with a range of specialized training, including training in the substantive area covered by the research data as well as epidemiology and biostatistics to ensure that redaction adequately protects the identity of research subjects. This is a broad set of skills, not typically found in a FOIA office. It is unclear how agencies will identify and make available individuals with the broad array of backgrounds and training needed to process requests for data. Similarly, the grantee organizations will need to establish a structure and procedures to handle requests for data, as discussed below. Thus, the “administrative” costs associated with the proposed amendment for both agencies and grantee organizations constitute a significant expansion over current FOIA capabilities. Current FOIA processes are costly for agencies, but the current amendment represents a significant expansion of costs. It is difficult – if not impossible -- to estimate with any accuracy the actual costs of providing the data. What we can safely say is that the range of costs will be huge. The cost of providing data from a small study that collected information on 20 variables from 50 rats at one point in time would be minimal. However, the cost of providing data from a face-to-face survey of 4,000 adults, with 300 variables and repeated measures over time, would be very much greater. Providing such a data set could include the costs of redacting a large and complex data set and producing a code book for the redacted data set to make the data set usable. Similarly, there are uncertainties about the mechanism to be used to recoup these costs. Costs are incurred in three basic ways. First, universities and other nonprofit organizations conducting research will need to put in place a structure and procedures for dealing with FOIA requests for data. Both the administrative and accounting structure and the procedures will need to be established before receiving an actual request in order to be in compliance with A110. This aspect of the costs to universities would likely provoke a request from the grantees relief from the 26% cap on the administrative component of indirect costs. Second, in addition to establishing an infrastructure to respond to requests for data, institutions will face costs associated with providing the data for specific requests. These costs would be appropriately paid by the requestor, as noted in the legislation. There are many difficulties associated with agencies being the conduit for such funds and we seek to avoid building any new accounting or budget procedure. Therefore, we recommend that the costs of filling a specific request be paid by the requestor directly to the research institution following agency confirmation that the agency has the data ready to send to the requestor. These funds would not be considered program income. This plan would ensure that institutions received compensation and that the administrative burdens were minimized. Finally, the amendment acknowledges the costs incurred by the agency but proposes the same compensation practices currently used under FOIA. This fails to recognize that the burdens on the agencies are likely to be far greater under A110 than in the current FOIA system. As the costs associated with A110 requests for data rise, it will be increasingly important that the fees paid be retained by the agency, not the Treasury Department. In the earlier draft of the amendment, it was observed that legislation would be required to solve the problem of agency retention of funds, but this was not discussed further. We are concerned that this amendment will be put into effect before a strategy for reimbursing agency costs has been specified. Thus, we recommend that a trans-agency solution be sought immediately. Remaining issues: The amendment states that the agency will need to provide data in a "reasonable time period," but there was little discussion of how that would happen. If data are not prepared for release until after they are cited in a regulation and a request is made, it is unlikely that they would be available and reanalyzed during the comment period associated with the development of a regulation. The FOIA process requires a response in ten days, a goal that would be unlikely to be met for data requests. If the agency does not meet that deadline, a requestor can bring legal action. Our concern is that unreasonable requests would be made of federal agencies and grantees as requestors attempt to obtain and reanalyze data within the time period for comment on a proposed regulation. By basing the access to data on FOIA, the privacy protections apply only to individuals, not other entities that participate in research. Research at the NIH includes projects that use clinics, hospitals, schools, and other entities as the unit of study. It is not uncommon for such entities to want their privacy protected. Even when there is no potential for commercial harm (e.g., a public health sexually transmitted disease clinic), there are other legitimate reasons why entities wish to remain anonymous, including embarrassment or other reputational factors. Participation in research may be of great research value but little value to the individual organization; inability to provide protection to organizations will undoubtedly lower their participation in research. We recommend that the definition of research data be amended to exclude "unwarranted invasion of personal or organizational privacy". The present "clarifying changes" do not address the problem created by the fact that many investigators are supported by funds from multiple sources. We view this as a very difficult issue since some projects involve funding from both Federal and non-Federal sources. In some cases, funding from non-Federal sources is important, providing access to data from pharmaceutical companies, state governments, private sources or foreign governments. In some cases, these funders would provide their own data to be merged with data collected with Federal support. We are concerned that by forcing uncontrolled access to data funded through other sources we would reduce the willingness of such groups to participate in NIH-funded research. It would be helpful to have clarification stating that the amendment would not apply to data that were not produced under the Federally supported grant, even if those data were used by the grantee. Such an exemption should also apply to NIH-funded analyses of data from non-Federally supported data used to create new variables. Despite our support for the constructive effort by the OMB in developing this regulation, serious concerns remain. These are generally rooted in the fact that the strategy for data sharing is based on the FOIA. FOIA was developed to provide public access to government records. FOIA does not provide the kinds of procedures or protections that are required for safe and effective access to research data. For example, FOIA places no restriction on who gets data, how they intend to use it, or to whom they may give the data. Access to research data typically requires that the recipient provides an assurance that they will use the data for research purposes, they will not try to identify or contact individual subjects, and they will not share or otherwise release the data to others. In the case of the Health and Retirement Study, there is a requirement that the user not merge the components of the HRS files with other files, such as driver's license or Equifax files. This requirement is in place to protect confidentiality of individuals. Informed consent documents need to be able to tell potential subjects what will be done with the information they provide. These boundaries are important, and yet they cannot be protected when data are shared through the FOIA. In conclusion, I view the steps taken by OMB as constructive and have provided several other modifications that we believe would greatly strengthen this rule. However, we remain convinced that basing access to research data on the Freedom of Information Act process is fundamentally flawed.
|