NIH Response to Notice of Proposed Rule Making (NPRM; Feb. 4, 1999, Vol. 64, No. 23, pp. 5684-5685 of the Federal Register) to amend OMB Circular A-110, extending the Freedom of Information Act (FOIA) to data produced under Federal grants and used for the development of Federal policy or regulation.
There is widespread support for facilitating access to research data, but that access must occur in the context of strong protections for research participants, protection of proprietary interests, freedom from harassment of researchers, and confidence that the process will further research, not harm it. The NIH has taken a strong stand to encourage the sharing of research tools, protecting the scientific community's access to early work in gene identification and other areas of science. At the NIH research is supported on critically important topics such as genetic risks to disease, illegal behaviors that threaten health, such as drug and alcohol use, abuse and addiction, child development and family interactions, and other topics which require the public's confidence in investigators' ability to protect individuals as research subjects. The desire to extend access to data must not jeopardize the ability to conduct the most rigorous research on these and other important health topics.
The NPRM took a positive step by limiting access to published data used for Federal policy making or regulation. It is essential that this limitation and the other limitations recommended below be implemented in order to limit the burden upon the research community, Federal research agencies, and ultimately upon the research enterprise and its valuable results. We recognize that the legislation and its implementation are controversial and that there may be a continuing dialogue on the important issues that have been raised. To further that dialogue, the NIH would welcome the opportunity to offer its views on alternative approaches to the provision of access to scientific data.
The NIH comments address four major areas of concern: (1) definitions of terms which require clarification; (2) the increased burden this new law would place on both agencies and grantees; (3) the effect it will have on protection of data and of the privacy of individuals and other entities; and (4) other issues such as timing of implementation, compliance, and data retention.
Recommendations for clarification of terms:
There are some key terms in the NPRM that require clarification, about which the NIH has specific comments.
While the law refers to the release of data, it does not define data. In the NIH Grants Policy Statement "data" is defined as "recorded information, regardless of the form or media on which it may be recorded, and includes writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, data processing or computer programs (software), statistical records, and other research data." However, this definition is not uniform across all agencies. The new rule would create considerable burden on grantees dealing with such variability across agencies, and raises concerns about the lack of distinction among different types of data.
In some areas of science, data are easily published along with the results. In many areas, the data are the result of experiments and the need is not to make the original data available, but to make available the methods used to obtain the results. If others challenge those results, they would try to replicate the experiment and would then publish their findings. If the rule makes clear that it does not apply in cases where the underlying data could be replicated, it would greatly minimize the administrative burden on agencies and institutions. In other cases it is virtually impossible to suppress the identity of individual human subjects and it should be possible to declare certain types of research data (e.g., videotapes) outside the scope of this rule. There also may be cases in which it is important to make underlying data available for others to review, reanalyze, or analyze for a new purpose; this is typical in situations in which it would be virtually (or literally) impossible for another investigator to replicate the underlying data. In these cases, agencies could negotiate the conditions for placing such data in a formal archive which would ensure their release, minimize the administrative burden, and provide protections for research subjects. However, the present rule makes no distinction for different types of data, does not recognize the different value (or lack of value) some types of data might have for reanalysis, and does not entertain options for data access that can be tailored to different types of data. This lack of understanding regarding different types of data and the different needs relative to data access is, unfortunately, a feature of the FOIA, which was not designed for this purpose.
The restriction to published data reflects an awareness of the havoc that could be wreaked by supporting access to unpublished data. Premature access to data could unblind clinical trials, lead to erroneous conclusions, undermine investigators' investments, and jeopardize their intellectual property rights, especially in regard to non-US patents. However, stating that release is only for data supporting published findings is insufficient and incomplete. It is essential that, in the final rule, OMB be more explicit about what constitutes publication that would trigger release of data under FOIA. There are many ways of disseminating early results, which could be construed as "publications" but which would harm the research process if they resulted in the release of the underlying data.
It is part of the scientific process to engage other scientists in discussion of early results so as to help refine one's thinking and ensure that the ultimate results are robust; the test of robustness is formal publication, typically in a peer reviewed journal. Abstracts submitted to professional meetings, progress reports sent to agencies, and other preliminary assessments of data should not be construed as publications. Clinical alerts which share early results of ongoing clinical trials also should not be construed as publications. In addition, there are times when an investigator will publish - perhaps even in peer reviewed journals - information and analyses of methodological aspects of a study. These would not constitute substantive findings and would be unlikely to lead to agency policies or regulations; therefore, they should be exempt from triggering the release of the underlying data. As part of the new Federal reinvention efforts to streamline the business process of research administration, Federal agencies are seeking to make annual progress reports available electronically. However, if this electronic availability were to trigger premature release of the underlying data, it would guarantee that grantees would be unwilling to provide substantively valuable information in those reports. Eliminating from the scope of this law some early, semi-formal methods of sharing results should not undermine the law's intent, since it is unlikely that an agency would depend on such reports of early findings for policy development or rulemaking.
Another issue underlying release following publication is that it is not unusual for research to result in a series of publications, not all of which make use of exactly the same data. As publications unfold, that might occasion a series of data releases. It would be reasonable that there be some provision for a planned data release to occur after a series of publications. This should not be a vehicle to impede release unduly, but to minimize the potential for harassment of investigators. For example, release could be required within six months of publication. Longitudinal data sets present a special problem since the release of data early in a long term study could effect later waves of data collection and could risk identification of subjects. In a complex longitudinal study there are risks to identification of human subjects if the investigator is not able to group data or otherwise suppress detailed information. If research subjects are unsure of the investigator's ability to keep their private information private, great harm will be done to the overall purpose of scientific inquiry and specifically to clinical studies. Because applying FOIA to grantee data may erode confidentiality protections for the participants in research, the NIH recommends that the amendment impose a positive obligation on Federal agencies to protect that confidentiality as they obtain and disclose the research data.
"Regulation" and "Policy"
While the OMB NPRM limits the rights of FOIA access to data produced under grants to those data used to support Federal policies or regulations, there is a need to be more explicit about this linkage. Regulations should be understood as those formal agency regulations that must go through a rulemaking process, and the link to data must be clear and explicit. Agency "policies" should be defined as actions on the part of agencies that are similar to rules. That is, policies that are widely disseminated, are intended to influence the behavior of entities outside the Federal government, and that are permanent or semi-permanent. This would not include "best practices guidelines" that are developed to provide information and guidance to grantees, but do not bear the weight of regulation. Data used to develop public information, public or professional advice or internal agency practices also should be excluded.
Regulations may invoke data in several different ways. In some cases data are explicitly linked to the regulation and may even be used to calibrate the ways the regulation is enforced. In other cases, there is a broad body of knowledge that informs the regulation, but without any direct link between a given research project and the regulation. Ambiguity would be lessened if the law were restricted to only those data explicitly and directly linked to a regulation or policy, but not those which constitute general information on the topic of the rule. For example, data that are cited as the basis for a regulation might be covered, but the general scientific knowledge base that informs the development of a policy or regulation would not be covered.
"Regardless of the level of funding"
The scope of this law must not be underestimated. Circular A-110 applies to institutions of higher education, hospitals, and other non-profit organizations, but the extension of FOIA will also apply to institutions which receive funds through A-110 grantees as well. Since the law applies regardless of the level of funding, it also means that many research partners of the NIH - foundations, private health care providers, and industrial partners - are affected as well.
Applying this law to data regardless of the level of Federal funding creates an undue burden on grantees who may have been able to bring other partners into their research endeavors. Such partnerships are typically very beneficial for the Federal agency since other partners may provide unique data, shared funding, or other unusual research opportunities. However, partners may enter into these agreements under special circumstances or with explicit requirements placed on the Federally supported partner. For example, a state agency may elect to share health care data with individual researchers in whom they have confidence, or private partners may be willing to share data only with investigators who have peer reviewed grants; still others want an agreement that the data will be used only for research purposes, or that there will be no attempt to identify individuals. The present law would invalidate those conditions when the data were passed on to those requesting them under FOIA. This would exert a chilling effect on such collaborations, which would be a significant loss to Federal agencies. While it would be very helpful if this law applied only to those data collected with Federal funds and not to any data collected using non-Federal funds, this would create administrative burdens in compliance.
It is recommended that the comment period be extended in order to ensure that partner institutions have an opportunity to comment on the potential adverse effects of the proposed amendment.
Increased burden on agencies and grantees:
This law, while extending the familiar FOIA, is quite different in several ways. The most striking difference is that while the current FOIA applies to documents already in the possession of the Federal government, the new law requires that the Federal government go to the grantee to obtain such data. This makes the administrative burden far greater and extends the burden to the grantee institutions and their investigators. Clearly, the authors of the law were aware of this since they allowed compensation to those who would bear the burden of compliance. It is essential that the final rule make clear the mechanisms through which this can take place, in order to ensure that compensation can be made to those who do indeed bear the burden.
More importantly, the grantee who is burdened by responding should be compensated directly (i.e., not by simply making such costs part of their allowable indirect costs, since this would be part of the administrative costs which are presently capped). Since the requestor must pay for the incremental costs, it is only fair that this law not become an "unfunded mandate" for the grantee institution, but that a mechanism for direct compensation be put in place. For the grantee institution, this could simply be handled within the proposed rule: that is, the amended circular A-110 could include language indicating that "the amount collected that is attributable to the grantee shall be paid directly to the grantee, and shall not be considered program income." For the agency, the situation would need to be handled under statute. The Freedom of Information Act [5 USC 552 (a)(4)(A)(iv)] could be amended to permit each agency to retain fees that are collected under the Act. Alternatively, language could be included in the appropriations legislation for each agency or in basic authorizing legislation for each agency (which for the NIH would be Title IV of the Public Health Service Act). Amendment of the FOIA appears the simplest, most straight-forward approach.
For NIH grants, the awardee is the institution, not the individual investigator. Since individuals may leave institutions during the course of their research (and certainly within the three years following the completion of a project), it is quite possible that a request to the NIH to produce data would go to a university that no longer had an employer-employee relationship with the investigator. Institutions are concerned that they would now be liable for producing data that they would not have kept under normal circumstances; this would either require them to track down the individual and get his or her cooperation, or to keep copies of all data that individuals have produced while at these institutions. This would fundamentally change the relationship of the grantee institution to the investigator and require new levels of data management, protection, and archiving at the level of the institution. Anticipation of such requests could greatly increase the burdens on grantee institutions and create another potential risk to confidentiality of data. There is no apparent solution to this problem, but it does raise questions about the effectiveness of the use of FOIA to ensure access to scientific data.
Presently, FOIA does not require agencies to obtain data that they do not have at the time of the request. However, A-110 holds grantees responsible for data produced under consortium agreements; under the proposed rule, grantees would be required to obtain those data if requested under FOIA, even if they would not normally have done so in the course of the research. This would cause grantees to establish new procedures for those participating in a consortium and would add to their administrative burden.
Effect on protection of data and privacy:
The present FOIA provides many important exemptions, and they would continue to apply. However, it is important to realize that the new law is making a fundamental change in the relationship of Federal agencies to their grantees as well as the relationship between researchers and those who agree to be research participants. Federal agencies now make clear that data ownership resides with the grantee and this unambiguous relationship provides grounds for grantees to protect the privacy of individuals involved in research.
If those who obtain data through FOIA were to attempt to recontact individual subjects (perhaps to assess for themselves that the data had been properly collected or to harass the subjects), that would violate the informed consent process under which participants agreed to participate in the study and would constitute an invasion of the subject's privacy. Furthermore, such activity would undermine public trust in research and the protections offered through the review and approval process. While FOIA does enable an agency to remove identifying data before release, if an individual is known to have participated it would be very difficult to keep anyone from identifying that individual within the data set, unless data could be grouped or manipulated to avoid such identification. Thus, research data might be brought into a civil action (divorce or custody case) or otherwise be used to harass a subject. Of course, if the agency determines that there is no level of redaction that protects individual privacy, that agency may withhold all of the data, hardly serving a goal of providing access to scientific data, but protecting the individual research subjects. For example, data collected in a small village might be very vulnerable to the identification of subjects. In such a case, no data would be released. FOIA does not prohibit release. Assertion of the exemptions is discretionary. However, since the FOIA procedures are to be used, the original data must be provided to and retained by the agency in which the redaction or determination of applicability of the exemptions takes place.
While FOIA keeps sensitive information from being divulged, it also results in very personal and private information being provided to the Federal agency and held by that agency after the FOIA request is fulfilled. If any records are denied to a requester, as personal and private information would be, the FOIA office must retain the original data and the redacted data for at least six years following the release and perhaps far longer. Investigators, collaborators and potential research subjects are likely to find this an unwarranted risk that sensitive data will become known. During that time period, the data have become agency records and may be accessed by those for whom the FOIA exemptions do not apply, such as Congressional committees with subpoena power. Data could be requested by the GAO or by other government agencies and while that would not constitute public disclosure, it could be viewed as an inappropriate invasion of privacy by the research subject. The fact that sensitive personal, medical data will be held by the Federal agency may be sufficient to deter participation. The requirement of the FOIA that determinations of exemption be made by the Federal agency, and that the agency obtain data that may be highly sensitive (health conditions, sexual behavior, family information, or information about illegal behaviors), retain it for a lengthy period, and face a risk that it could be requested by those not subject to the FOIA and the FOIA exemptions, is a tremendous risk to the health research that is the NIH's mission. Therefore, it is recommended that in the case of a FOIA-generated release of grantee data, the National Archives and Records Administration be instructed to amend the General Records Schedule 14 requirement that agencies retain denied records, including the undredacted material, for six years. Such data could be returned to the grantee immediately upon determination that the redaction was appropriate.
While FOIA exemptions protect data that could reasonably be used to identify individuals and would thus constitute a violation of privacy, such protection of privacy is not extended to institutions, such as clinics or hospitals, which may have legitimate reasons to protect their anonymity. Such providers are typically willing to participate and make their data available for research with the understanding that their identities will not be divulged. With this law, it appears that the agency would have to release such information unless it were clear that release would cause them financial harm. However, some of these entities are state or local health departments which may not be able to claim financial harm and would not, therefore, have their privacy protected. However, it is important for the public health that such research be supported; in order for this to occur, the research entities involved must also be afforded some protection.
Under FOIA, Federal agencies cannot place restrictions on who obtains Federal records or on their intended use. This creates serious problems in using FOIA as the vehicle for providing access to research data. A look at data archives - which have as their purpose the sharing of research data - indicates some of the problems. Researchers know that removing names, addresses, and other obviously identifying data from files may be insufficient for the protection of privacy. In such cases, the provider may require a legally binding agreement with the person obtaining data to make no attempt to identify individuals and more generally, to use the data only for research purposes. In fact, language to that effect is frequently included in consent forms. The ability of the provider of research data to make these basic assurances must be continued under this law; however, it is not clear how this is possible. There are other strategies that are used to raise confidence that subjects cannot be identified: insertion of dummy data, release of less-than-complete data sets, grouping of data to blur identification. It is essential that these same protections be afforded to grantees who are forced to release their data through this law; however, the present law uses FOIA as the vehicle and FOIA does not allow such restrictions on use. The NIH concludes, therefore, that FOIA is a seriously flawed tool for providing access to research data and requests that the final rule be delayed until this issue can be resolved, perhaps through an amendment to the FOIA, or through the adoption of other strategies.
There are a number of technical issues that are also very important to have addressed in the final rule:
When research investigators access scientific data through collegial relationships or via data archives, there are established procedures for ensuring appropriate attribution for the original data. It is recommended that any procedure to force the release of data should include specific requirements that those who obtain data identify the grants from which the data came in any public presentations, that they note that the data were accessed via FOIA, and that they accept any liability for the use or interpretation they make of the data. At present, the FOIA has no mechanism to ensure that such protections would accompany the use of data. The scientific community understands that one cannot always control the subsequent research uses of one's data; however, those scientists who expend considerable effort in developing data are within their rights to ask assurance that they will be properly credited and that the circumstances under which the data were released be made clear.
Since the nature of data release could affect consent forms, it is essential that the OMB confirm that the amendment of A-110 will apply only to data collection activities that begin after the final implementation of the amendment by the agency. This will provide agencies the opportunity to build in strategies for access to and protection of scientific data from the outset of a research project. Since this extension of FOIA is effected through amendment of A-110, the NIH understands that FOIA access applies only to those awards made after A-110 has been amended and the agency has taken the steps necessary to implement that amendment.
Circular A-110 requires that data be kept for three years and three months after the final budget report. It is important that the requirements of this law not exceed the reach of A-110 and that FOIA-based access terminate when A-110 would no longer require grantees to retain data. Clearly, grantees may dispose of data after the three years and three months, but that would not be desirable from a scientific standpoint. It would be most unfortunate if, in seeking to make data available, the law created an incentive to dispose of data in order to be free of FOIA requests. However, it would also be unfortunate if grantees faced a never-ending obligation to retain and be prepared to provide data that may far exceed the tenure of the researcher at the institution, creating an administrative morass. Unless this is addressed in the final rule, FOIA would require that data be made available regardless of how long ago it was collected.
This rule is designed to institute changes in the research community to provide greater access to scientific data, but it might be helpful if the regulatory agencies had clear and widely understood policies regarding their use of data in support of regulations. For example, agencies could elect not to use data unless the analyses had appeared in peer reviewed journals. If the regulatory agency needed to return to the original data, then perhaps a plan for sharing relevant data used to support their rules and regulations should be developed as part of the regulatory process.
It is important that, as part of the process of commenting on this proposed rule, there be attention to the underlying purpose of the law. The law appears to have been designed to ensure that when a regulation affected the public, there would be an opportunity to ensure that the underlying data were valid. Such a purpose would require a degree of openness about the underlying data, but it would not require a process such as this rule envisions, that goes far afield of that goal and introduces opportunities to burden researchers, research subjects, and agencies.
While the NIH has the right to obtain research data from grantees "for a Federal purpose," this right typically has not been exercised. Also, it has been understood that a need for data for a Federal purpose, such as for audit, did not mean that these data would be published or made available to others. The current law rescinds the ability to not exercise this right of access and fundamentally changes the relationship between the NIH and the grantee institution.
From the perspective of a research funding agency, it is clear that there are examples of effective policies developed to provide access to research data in many areas of science. For example, in the case of x-ray crystallographers the NIH has a policy that requires the placement of coordinate data into a data bank at the time of publication. In another case, a data archive serves to make available complex, longitudinal data sets that are the result of a multi-agency collaboration. In the latter case, data collected by a Federal agency under contract - but with NIH grant funds used to supplement the data in order to enhance their usefulness for research - are archived, freely available, and constitute a cost-effective research resource that has been used by many investigators. By virtue of their release through an archive under which the risk of identifying participants rises, the archive can place restrictions on their use. For example, if one obtains the geo-coded data, one must sign an agreement pledging not to use the data for other than research purposes and not to attempt to identify individuals. Because they cannot provide that assurance, Federal agencies cannot obtain the geo-coded data set. Under the new law, because some of the funds are from Federal grants, the archive would no longer be confident that it could apply such a restriction, since someone could use FOIA to access those data through an agency. Ironically, this law could undermine some of the most successful data sharing mechanisms. The scientific community is making increasing use of the internet and of archives to find appropriate strategies for the sharing of research data; extension of those activities would be a much more positive approach than the application of FOIA.
Whatever strategies are used for data access, it is important that research subjects' confidence in the ability of researchers to live up to their promises of confidentiality be maintained, or clinical research will be seriously undermined. While access to data is an important part of the scientific process, it must not be carried out in a way that puts research subjects at risk.