NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Institute of Medicine (US) Forum on Drug Discovery, Development, and Translation. Emerging Safety Science: Workshop Summary. Washington (DC): National Academies Press (US); 2008.
Emerging Safety Science: Workshop Summary.
Show detailsFollowing the discussion of techniques being developed for use in the discovery and preclinical stages to predict and understand drug toxicity, the workshop turned to the postmarket stage and ways to monitor adverse events and identify safety concerns as quickly as possible. As demonstrated by the experience with Vioxx and other drugs that had to be withdrawn from the market, drugs can make it through the development and approval processes without unanticipated serious adverse effects being recognized. In such cases, some people will inevitably take the drug and experience adverse effects; thus the goal must be to identify the problem quickly to minimize the number of people affected. To this end, three speakers described approaches to pharmacovigilance that can be applied to identify safety problems as early as possible after drugs have been put on the market.
PHARMACOVIGILANCE AT GLAXOSMITHKLINE1
Dr. Almenoff described an aspect of GlaxoSmithKline’s (GSK’s) pharmacovigilance program called online signal management. This program combines a number of technologies into one tool that can help safety evaluators review information on marketed drugs more efficiently and in much greater detail than previously was possible.
Safety Data Mining
GSK receives approximately 90,000–100,000 spontaneous adverse event reports each year. Recognizing that its researchers needed new tools to help them understand and prioritize the most important data, in 2002 GSK began using data mining to evaluate its safety data.
Because postmarket information is reported voluntarily, there is no control group (i.e., it is impossible to know how many people took a drug, how many people experienced an event, and how many people experienced that event after taking the drug). Therefore, it may be difficult to evaluate precisely how rare or common a particular adverse event is. For example, if there are 30 reports of strokes occurring in individuals taking drug X, is that too many? This is a difficult question to answer as it depends on how common strokes are in the normal population, how much exposure there has been to drug X, and other factors. Answering such questions therefore requires an objective, systematic approach.
GSK uses a statistical approach called disproportionality analysis (DPA) to identify rare events that occur at a greater frequency than would be expected by chance. The DPA calculation is derived from a two-by-two table such as that shown in Figure 8-1. If the ratio of A/(A + B) is greater than the ratio of C/(C + D), there is a potential association between the drug and the event of interest.
For example, to determine whether there was an association between drug X and stroke, one would look at the number of stroke cases reported for drug X as a proportion of all the adverse events reported for that drug. One would then compare that result with the number of strokes reported for all drugs as a proportion of all the adverse events reported for all drugs and ask whether the proportion of strokes for drug X was greater than the proportion for all drugs. If it was, drug X might be associated with stroke.
The statistical tool GSK uses for such analysis is called MGPS (Multi-item Gamma Poisson Shrinker), and it produces a statistical output called the EBGM (empirical Bayes geometric mean). The EBGM is a measure of association, and it can be thought of as a relative reporting ratio—if the number is greater than 1, there is a statistical association between drug and event.
Almenoff warned that such statistical analysis is not sufficient by itself, as there are always biases in the data. Thus GSK researchers medically verify all signals identified by such data mining. And while such data mining can be an important tool in the armamentarium of postmarket product surveillance, it is intended only to enhance current pharmacovigilance techniques, not replace them.
Online Signal Management
GSK currently uses data mining in conjunction with other pharmacovigilance techniques to enhance and streamline the surveillance process. Online signal management (OSM), a tool GSK developed in collaboration with Lincoln Technologies, integrates safety data mining with case-based screening algorithms and also provides an opportunity for traditional case review.
The system constantly monitors GSK’s database, performing various analyses to look for patterns and changes and posting alerts when such events occur. The goal is to provide a filter that will allow GSK safety evaluators, without having to sort through every event, to focus on three things: new data, important safety signals, and fluctuations in the data.
When safety evaluators log on to the system, they are provided with a primary review that includes
- a listing of all serious adverse event reports for the drug they are responsible for monitoring in a particular time interval;
- all events with a rising trend; and
- all nonserious unlisted reports that have EBGM values above a defined threshold.
OSM combines this filtering capability with a number of other tools that enable safety evaluators to follow up on a signal to determine whether it represents a problem. For example, the system is equipped with data retrieval capabilities so that when an evaluator sees a signal, it is possible to click on the relevant medical issue and retrieve information on the cases that underlie that signal. The system also incorporates trend analysis and visualization tools, such as heat maps on which the EBGM signal scores are shown in red if they are high and in green or black if they are lower. Within the heat map, it is possible to click on a particular spot and view the cases that are represented by that spot. This is useful because by looking directly at the data, reviewers can generally verify whether a particular signal is a false positive.
In addition, sophisticated knowledge management tools help evaluators prioritize their time. With the capability to attach annotations to previous analyses that have been run, evaluators can easily track prior work and thought processes and include these when new analyses are run. Another valuable aspect of OSM is the ability to look at subpopulations. It is possible, for example, to scan the entire database and find for a particular drug what side effects are occurring more frequently in pediatric patients than in adults or the elderly. It is also possible to see what sorts of events are reported more frequently in overdose versus nonoverdose situations. This is done with quantitative algorithms that flag the situations automatically. There are also more qualitative flags, such as for product complaints, that help identify manufacturing problems.
OSM has been a success at GSK, receiving positive user reviews and saving safety evaluators between 30 and 40 percent of their time. Almenoff asserted that this program has dramatically improved the quality and focus of postmarket safety reviews at GSK.
STATISTICAL ISSUES IN ANALYZING SPONTANEOUS REPORT DATABASES2
The U.S. Food and Drug Administration’s (FDA’s) primary method of collecting postmarket data and monitoring for adverse events is passive surveillance. Reports of unexpected outcomes are submitted voluntarily by patients and health care practitioners on the FDA’s Medwatch form 3500, which includes a box by which suspected products can be linked to the outcome. The FDA receives more than 400,000 of these spontaneous reports each year (IOM, 2007). Dr. DuMouchel described some of the issues involved in the statistical analysis of spontaneously reported adverse event databases. In particular, the analytical capability at the heart of the OSM tool described by Almenoff was developed through a cooperative research and development agreement between Lincoln Technologies and the FDA, and DuMouchel offered further detail on this tool. In addition to analyzing the FDA’s Adverse Event Reporting System (AERS), the tool can be used to analyze a variety of other spontaneous report databases, including the Vaccine Adverse Event Reporting System run by the FDA and the Centers for Disease Control and Prevention, the World Health Organization’s VigiBase, databases for medical devices, and others.
Data Cleaning
Before the data in these databases can be analyzed, they must be put in a standardized form, a process called “data cleaning.” For the AERS database, there were two primary data cleaning issues: a lack of standardized nomenclature for drug names and duplicate submissions.
Because the AERS database does not use standardized nomenclature, it contains roughly 300,000 different names for drugs. These include both generic and trade names and many different misspellings, and the dosage is given with the name, so that “25 milligrams” or “25 mg” appears as part of the drug name. Since there are 3 million reports in the database and each entry typically includes several drugs, some 10 million drug names in the database had to be reviewed, one by one, and put in a standardized form. This was primarily a manual process, with little computer assistance, and took years to complete, but eventually Lincoln Technologies was able to reduce the 300,000 different names to about 3,000 ingredients in a standardized generic form. Because the FDA’s Medwatch forms continue to be collected using nonstandardized nomenclature, this process must be repeated every quarter when Lincoln receives new data from the FDA.
The second data cleaning issue was detection of duplicate submissions. If a person is taking three drugs from three different manufacturers, for example, an adverse event will often filter back to the FDA in three separate reports from the three manufacturers. In addition, follow-up reports sometimes are not linked properly to the original report, and the process of sorting these reports can be tedious. Sorting is done by means of a computer science discipline called record linkage.
Of the 3 million reports in the AERS database, there are probably about 300,000 duplicate reports that need to be removed. While this may not sound like a large number—only about 10 percent of the database—it can make a big difference in the signal that is extracted from the data. For example, a rare drug–event combination that should have a count of one might have four duplicates, raising an unnecessary safety concern. This example shows the importance of flexible computer tools; if the safety analyst can bring up the five detailed reports with a mouse click, the duplicates are much more likely to be detected.
Statistical Underpinnings
The major problem with performing statistical analysis on a database of spontaneous reports is that there is no denominator (no way of knowing exactly how many people took a particular drug), and there is no numerator (no way of knowing exactly how many events occurred because of underreporting). Thus there is no way to calculate an adverse event rate directly. The solution is to compare the adverse event rates for one drug with those for all other drugs.
To this end, the analysis works from a two-by-two table. For every drug of interest, D, and every event of interest, E, one obtains the four entries in the table by counting all the reports that do or do not involve D and that do or do not involve E. The top left entry, for instance, is the number of reports, n, that involved drug D and adverse event E. This is the number one must examine to determine whether it is larger than would be expected, and this is done by using the other three numbers to calculate an expected value, e, which is then compared with the actual value. There are a variety of ways to calculate this expected value, but regardless of which method is used, the final step is to divide the actual number of events associated with a particular drug by the expected number to obtain a disproportionality ratio, n/e. If this ratio is much larger than 1, there may be a problem.
The idea of computing these ratios is a simple one, so one might ask why the use of such calculations has become widespread only in the past decade or so. Part of the answer is that recent computer and database advances have made it easier to perform this sort of analysis, but another part of the answer is that biostatisticians are sometimes hesitant to conduct formal statistical analyses on data collected outside of a controlled clinical trial environment. Only recently did scientists begin applying statistical models to spontaneously reported data.
Inherent limitations make it necessary to analyze these data carefully. For instance, if a particular drug is taken primarily by one age group or one sex, spurious associations can appear in the database. An example is sudden infant death syndrome (SIDS) and childhood vaccines. If one simply performed the calculations naively, one would find a large disproportionality ratio even though there is no causal relationship between SIDS and the vaccines. The Mantel-Haentzel adjustment—whereby the data are stratified by age, sex, and report year, and the expected values are computed separately for each group—can be applied to deal with this problem.
A trickier issue is the fact that with thousands of drugs and millions of ratios being calculated, large ratios will inevitably appear. If there is just one event of a particular type, but the expected value is only a small fraction, say 10−5, the disproportionality ratio will be quite large without necessarily signifying anything other than random chance. For example, if there were 1 million different drug–event combinations, each with an expected frequency of 10−5, one would expect 10 of them to have an observed count of 1 by chance, with the remaining 999,990 having a count of 0. But the 10 that happened to show up would each have a p value of about 10−5, which a naïve analysis might deem significant.
The question thus arises of how to take into account simultaneously the proportionality ratios, the p values, and the multiplicity of counts. When calculating millions of two-by-two tables, there will inevitably be many cases with large ratios, and researchers must determine how they should be sorted to identify those cases most likely to be associated with real problems. Suppose, for example, that there is one case in which there are 2,000 adverse events compared with only 1,000 expected events. Then n = 2000, e = 1,000, and the disproportionality ratio is only 2, but the p value is minuscule, implying a very clear signal. Researchers must determine how this case should be compared with one in which n = 20, e = 0.2, and n/e = 100, but the p value is much larger.
This problem can be addressed with a statistical tool called a Bayesian shrinkage model, first applied to the FDA database in 1999. Working with statistics from the entire database, this model allows one to combine the disproportionality ratio and the p value into a single value—the EBGM mentioned earlier. This number can be thought of as an a posteriori estimate of the ratio based on looking at the data as a whole.
The practical effect of performing this statistical analysis is to shrink the calculated ratios for cases in which the p value—and thus the uncertainty—is large. In calculating ratios from the database, there are many cases in which n = 1 and e is some very small number, so that n/e is, say, 3,000. In such cases, the model realizes that there is so much variance in the estimate of the ratio that a value of 2 or 3 is a better estimate than 3,000. When n is in the range of 10 to 20, by contrast, there is typically only a slight shrinkage, and for an n of several hundred, there is generally no shrinkage at all.
The bottom line is that this statistical analysis modifies the original calculated disproportionality ratio to take into account how variable that ratio estimate was and provide a better indication of how significant the event or events really are. The analysis is particularly useful because it provides a single number that can be graphed or plugged into other models.
As an example of the usefulness of having a single number, DuMouchel showed a heat map of adverse events for a single drug (see Figure 8-2). This heat map is divided into different spaces according to MedDRA (the Medical Dictionary for Regulatory Activities) terms. The biggest rectangles are the system–organ classes—blood, cardiovascular, respiratory, renal, gastrointestinal, and so forth—but all of the 10,000 or so MedDRA preferred terms are grouped into very small squares where the grouping respects the hierarchy of MedDRA. One can explore this heat map by moving the computer cursor over these squares; as this happens, information appears concerning where that square falls in the MedDRA grouping.
Drug Interactions
It is also possible to use the above analysis to look for adverse events related to interactions between drugs. The more medications a person takes, the greater is the chance of a drug interaction. Therefore, as more people take more prescription drugs (DuMouchel reported that 12 percent of the elderly take at least 10 drugs a week), interaction effects are becoming more important.
The process of looking for adverse events due to drug interactions is straightforward. A pair of drugs is treated as an additional “pseudo-drug.” If, for example, there is a report of a patient’s taking three drugs and the three drugs are listed in the report, the analysis treats the case as though the patient were taking three drugs—A, B, and C—as well as three pseudo-drugs—A + B, A + C, and B + C. From this point, the analysis is the same, with the observed number of events being compared with the expected number of events, and an EBGM being calculated to express the modified disproportionality ratio.
As an example, DuMouchel used an analysis of the drugs cisapride and erythromycin, alone and in combination, and how often they were associated with torsades de pointes, an uncommon variant of ventricular tachycardia (see Figure 8-3). The EBGM for each drug alone relative to torsades was about 20, but the EBGM for the combination of the two drugs was nearly 230, a huge disproportion.
Complications
A number of complications and subtleties must be taken into account in performing this sort of analysis. For one thing, the analysis is based on the assumption that all reports except those concerning the drug of interest can be considered “background noise.” The problem is that the “control group”—all reports except those concerning the drug of interest—may include other drugs with very high signals for the event of interest, and in this case it is not a very good control group. The denominator will be inflated, and this will partially mask the effect that is the target of the analysis. Improved methods for dealing with this issue are needed.
Another issue that needs to be considered during analysis is confounding due to people taking more than one drug at a time. If a particular drug that causes an adverse event is often coprescribed with another drug, that second drug will inherit the association with the adverse event. This is called signal linkage or the innocent bystander effect, and it is particularly prominent in drugs used for certain chronic conditions, such as diabetes or HIV infection, for which a set of drugs is often prescribed together. If one of those drugs has a serious association with adverse reactions, that association will propagate to the others.
The standard way of dealing with such confounding is multiple regression analysis; however, there are complications that must be addressed. For such a regression analysis, the adverse event is taken as the response, or dependent variable, and the stratification variables and the presence or absence of various drugs are taken as the predictors, or independent variables. With this analysis, the background noise rate can be estimated automatically and can be extended to estimate drug interactions.
This is a time-consuming process as it is necessary to perform a multiple regression analysis for every adverse event; thus if 10,000 MedDRA terms are being considered, 10,000 regressions must be calculated. Furthermore, with the presence or absence of a drug as a predictor and with 3,000 drugs in the database, there is a very large number of predictors for the regression model. In addition, if a large number of coefficients are estimated simultaneously, it becomes necessary to add shrinkage methods to the regression analysis.
One final confounding factor that must be taken into account is that drugs taken for particular diseases can appear to be related to the symptoms of the disease. For example, if a disease causes nausea, nausea may emerge from the analysis as an adverse event related to a drug prescribed for that disease, even if it is an antinausea medication that is being evaluated. In such cases, it is important for medical expertise and judgment to be involved in the analysis to rule out such factors.
Summary
The issues that need to be considered when one is performing DPA on spontaneously reported events can be summarized as follows:
- Extensive data cleaning is necessary to sort and organize millions of records.
- There are many noncausal reasons for associations between drugs and events.
- In comparison with clinical trial or cohort data, where participants can be followed from start to finish, these studies are poorly designed.
- Interpretation of comparator groups is difficult.
- Multiple comparison and post hoc fallacies are endemic.
Despite the need to address these issues, systematic DPA can yield a number of beneficial results, including the following:
- This method is considered the only way to learn about very rare adverse drug events.
- It provides hypothesis generation and a second data source for comparisons.
- The Bayesian approach to multiple comparisons aids in assessment.
- Computer tools have improved productivity.
- The signal management approach enables institutional “memory.”
One weakness of DPA was brought out in the discussion when John Senior, FDA, questioned whether quantitation of DPA can provide numbers in which one can be confident and how well those numbers relate to real risk. DuMouchel agreed that DPA is not as good as incidence rates or relative risks, but stressed that it is useful nonetheless. He explained that if DPA were viewed as estimates of an event that was overrepresented in the database, there would be no problem with comparing two drugs to determine whether one was more represented than another. Once risk has been assessed, case reports can be examined and medical judgments made.
ACTIVE SURVEILLANCE FOR ANTICIPATED ADVERSE EVENTS3
Historically, postmarket monitoring for adverse events has been accomplished through passive surveillance. While this system may be capable of detecting rare serious adverse events, it has several limitations, including underreporting, biased reporting, and difficulties in attributing an adverse event to a specific drug. In addition, the data accumulate slowly, and answering important safety questions can take years. With the technological advances that have occurred in recent years, numerous groups and stakeholders have embarked on the establishment of active surveillance systems to monitor for adverse drug events, with the aim of identifying drug safety issues more quickly than is possible with standard passive surveillance. Dr. Platt described what a national active surveillance system might look like and elaborated on the benefits and challenges it would entail.
Using Claims Databases for Surveillance
A large percentage of Americans’ medical records and history of prescription drug use can be accessed by using health care claims, making this an ideal platform for launching a national active surveillance system. The backbone database of such a system would comprise routinely collected administrative health care claims enhanced by supplemental information, such as links to full-text medical records in either electronic or paper form, laboratory results, and pharmacy records. Claims databases have the important features of covering defined groups of individuals and containing information on all reimbursed care. Thus they can provide both numerators (e.g., how many people experienced event X after taking drug Y) and denominators (e.g., the total number of people who took drug Y) for events, avoiding biases in systems based solely on medical records.
To test how well such a claims database might work, Platt and colleagues in the FDA and the Centers for Education and Research in Therapeutics (CERTs) program at the Agency for Healthcare Research and Quality performed a retrospective study to determine how early it might have been possible to uncover the association between Vioxx (rofecoxib) and acute myocardial infarction. They looked at several years’ worth of claims from a group of health plans with an aggregate population of about 7 million and plotted the observed number of myocardial infarctions among rofecoxib users versus the expected number, based on a comparison group composed of naproxen (brand name Aleve) users. They concluded there was a statistically significant signal of excess acute myocardial infarction when 28 heart attacks had been recorded among rofecoxib users, data that took 34 months to accumulate in the group of health plans with which they were working. If the researchers had had data for 100 million people available, the signal might have been evident from only about 3 months’ worth of data. While the data are never available immediately—it takes a while to obtain them and to transfer them into analyzable form—this example illustrates that working with large data sets can make it possible to identify phenomena of interest relatively quickly.
An FDA reviewer questioned Platt’s assertion that an active surveillance system would have been able to detect the safety signal from Vioxx much sooner. She noted that Platt had an advantage in picking the outcome to study and the comparator, whereas in real time, if other outcomes had been monitored or a different comparator had been used, the myocardial infarction events might have been masked. In other words, because Platt’s study was retrospective (it was already known that myocardial infarction was the problem), it was possible to monitor specifically for that event. In a real-life situation, researchers might not know which events to monitor closely for, and therefore it might take longer or be more difficult to identify the unanticipated serious adverse event than was demonstrated with Platt’s example.
Using Claims Databases for Hypothesis Testing
A valuable use for claims databases would be to test hypotheses that have been raised in some other way. For example, a hypothesized connection between the Menactra meningococcal conjugate vaccine and Guillain-Barré syndrome is currently of substantial interest to both the FDA and the Centers for Disease Control and Prevention (CDC). Shortly after the vaccine was approved in 2005, the Advisory Committee on Immunization Practice recommended that it be used for all adolescents; within a year or so, 15 spontaneous reports of Guillain-Barré syndrome occurring within 6 weeks of immunization had been filed. At the time, it was estimated that approximately 6 million people had been immunized. A number of questions were raised, such as whether an excess risk is associated with the vaccine; if so, how great; and whether this is a high-risk subgroup.
The Vaccine Safety Datalink (VSD) project, a CDC-supported program that operates in eight health plans of the HMO Research Network, quickly became involved and analyzed the risk using its database of 7 million health plan members. At the end of a year, approximately 100,000 doses had been administered, and no cases of Guillain-Barré syndrome had appeared among those receiving the vaccine. However, this did not rule out a connection: since the background rate of Guillain-Barré syndrome is only about 1 to 2 cases per 100,000 person-years, it could take several years for the connection to be observable in the VSD database.
The FDA currently has postmarket contracts with the same eight HMO Research Network plans that are involved in the VSD project, as well as UnitedHealthcare, two state Medicaid databases, and the Veterans Health Administration system; altogether, these organizations represent about 26 million people. The information they provide includes details about the diagnoses that are assigned, about the procedures people undergo, and about the drugs dispensed through pharmacies, and all of this information can be linked to full-text medical records. While these are the systems used most often in the United States for surveillance purposes, however, they are insufficient for ensuring timely identification of new adverse events or timely follow-up on safety signals. Platt asserted that in addition, linked databases from Medicare Parts A, B, and D, Medicaid in most large states, and private health plans need to be accessible and included in a national surveillance network.
To complement the VSD project’s effort to test whether the vaccine increased the risk of Guillain-Barreé syndrome, a one-time collaboration of four health plans with 40 million members was established to conduct a study that would use linked automated resources to identify potential cases by their diagnosis codes, obtain the medical records of potential cases for review and abstraction, and have an expert panel adjudicate all abstracted cases. The study results will be reported frequently to the FDA, CDC, and the vaccine’s manufacturer, and will eventually be made available to the public.
These efforts have helped stimulate the creation of a standing consortium called the Health Plan Consortium for Public Health, which Platt and colleagues are working to develop. The consortium would be run under the auspices of CERTs and would have a target population of 100 million covered people. Its goal would be to improve the safety and use of marketed vaccines and prescription drugs. While there is no guarantee that the consortium will be accomplished, active planning is under way.
Characteristics of an Effective Active Surveillance System
Assuming such a consortium could be established, Platt described a number of characteristics that he would expect the active surveillance system to have. It would function as a distributed network, with the data residing at and belonging to the individual health plans. Typically, the data would be accessed via computer programs that would be distributed to each health plan, which would run the programs on their own data. Results would be returned to a coordinating center, combined with results from other health plans, and then analyzed. Although most of the surveillance analysis could be performed using deidentified information, thus ensuring protection of confidential personal information, it might be necessary for the health plans to provide individual patient-level data for a small fraction of individuals with specific diagnosis codes or other characteristics requiring additional evaluation. However, this information would be provided under full Health Insurance Portability and Accountability Act (HIPAA) protections. Platt commented that the VSD project works this way.
In addition to claims data, the system should ideally have access to a variety of other clinical information. For example, many large national health plans now receive laboratory test results for their members, and electronic medical records are becoming increasingly available and should eventually become a critical component of a national surveillance system. Platt stressed that being able to access full-text medical records would be crucial for this system. Because the data involved in the system would belong to individual health plans, plans should have the option to opt in and out of specific uses of the data. Further, during development of the system, transparency to the public would be important: protocols should be offered for public comment before being finalized and available to the public when a study begins, and results should be provided to the public when a study is completed.
Data Ownership and Decision Making
Robert Califf, of Duke University, urged caution in response to Platt’s description of an integrated active surveillance network in which data would belong to individual health plans, companies, regulators, etc., and groups could opt in and out of specific uses of the data. He warned against every stakeholder having its own data sets, completing its own analyses, and making its own decisions about what drugs are dangerous or safe. Platt responded that the signal detection he described represents only the beginning of the decision process. Currently, these systems cannot be used to make assertive decisions, as researchers in the community are in the midst of working to establish best practices and are debating methods and the interpretation of results. Platt asserted further that decision making should fall to regulators and then to the community.
Selection of Outcomes to Monitor
An active surveillance system could be used in two ways: (1) to watch for potential adverse outcomes specified in advance, or (2) to evaluate signals arising from spontaneous reports or other sources. Thoughtful selection is necessary in choosing outcomes of interest to monitor by active surveillance. Spontaneous reports collected through passive surveillance indicate that the choice of outcome may be problematic, however. With active surveillance, it is more difficult to determine what outcomes should be monitored because data exist for every outcome that has occurred to every person in the database. While it would also be possible to use data mining approaches of the kind described by DuMouchel, Platt suggested that the primary use for the system should be to focus on adverse outcomes for which the FDA already has cause for concern. He contended that the large majority of postmarket safety problems are caused by a relatively small set of candidates, so the first goal would be to conduct prospective surveillance looking for signals related to these candidates. Judy Racoosin, FDA, seconded Platt’s suggestion that the experience gained during clinical trials should help guide the active surveillance program for a drug. During the FDA’s preapproval safety conference, when the drug review division meets with the Office of Surveillance and Epidemiology (OSE), the participants discuss issues that have arisen during the development process. Racoosin explained that it is not uncommon to encounter a few cases of worrisome events and a greater number of more common events. Some events are difficult to understand because of the limited number of subjects in whom a drug has been tested. Some of the more obscure and confounded premarket data could be useful when selecting outcomes to monitor. Almenoff added that this is exactly what GSK does: every program, beginning in early development, has a risk/benefit management plan. From this plan is created a list of items of special interest that are monitored throughout the drug’s life.
Approaches for Conducting Active Surveillance
Platt described two approaches for conducting active surveillance for prespecified events. The first is to wait until a sufficiently large number of exposures has occurred and then conduct a study. The weaknesses of this approach are that it is difficult to define a sufficiently large number, and it could take a long time to acquire the data. A second approach is to conduct sequential analysis—periodic data accumulation followed by periodic analysis, with each new analysis adding to the existing ones. This approach requires a method that allows for repeated testing on the same data.
There are a variety of ways to look for a signal in accumulating data. One standard method is the sequential probability ratio test (SPRT). A weakness of this method is that a threshold must be specified ahead of time for what constitutes an excess risk, and it is difficult to know what risk to specify in advance. If the correct excess risk threshold is chosen, the test can be highly effective and verify a risk very quickly, but if the wrong risk threshold is chosen, it may mask real risks.
Martin Kullforff, a statistician working with the VSD project, developed a variant of the SPRT called the maximized SPRT, which tests the null hypothesis (no excess risk) against the compound alternative hypothesis (a relative risk greater than one). The trade-off, from a statistical point of view, is that while this method can be used to test for any increased risk, it is somewhat less efficient than would be the case if the excess risk were known.
The VSD project is using this technique to perform surveillance on new vaccines, including the meningococcal vaccine. In the latter case, at the end of 95 weeks, there had been two cases of thrombocytopenia relatively soon after immunization. That number does not exceed the signal threshold. But, Platt said, if the second case had occurred by week 12, that would have represented a statistically significant excess, and the project would have been faced with the question of what those two cases really meant.
A workshop participant expressed his opinion that SPRT is greatly limited for signal detection. Platt agreed that SPRT is not an appropriate method for signal detection and said that at present, no one knows what the best method is. He reiterated that maximized SPRT may prove to be useful, but suggested that researchers will need to explore and debate different methods until they find a better one. DuMouchel added that SPRT is designed to test a predefined hypothesis that is followed over time, and is not intended to be used when one is screening multiple drugs and multiple hypotheses. The next step will be to gain a better understanding of how sequential signal detection methods work under conditions that mimic real-life use.
What Is Needed
Creating an effective active surveillance system to monitor large numbers of therapeutic agents and outcomes will require consideration of several factors. Researchers will need to determine
- how to select the outcomes that will be monitored;
- how many outcomes can realistically be monitored;
- how to define outcomes in the terms in which they exist in the data sets;
- how often to look for those outcomes;
- what the appropriate comparators should be; and
- what the statistical approach should be.
Researchers will also need to develop rapid and effective ways of determining which signals represent real problems that require public health or regulatory action.
The maximized SPRT currently used by the VSD project has many desirable properties, but other sequential analysis methods should also be tested to determine which works best. Before making a decision, it will be necessary to evaluate each method to determine what its performance characteristics are—in particular, how common false positives (signals when there is no real excess risk) and false negatives (failure to detect an excess risk that is actually present) are, and once this information is known, how to trade off between the two. It will also be necessary to decide how much error in each direction can be tolerated.
Problems to Overcome
Implementation of active surveillance will require that researchers overcome several barriers. For example, they will need to be able to determine whether the outcomes in a data set are real problems and not simply artifacts of the recording or analysis methods employed. Platt described the detection of a signal involving excess gastrointestinal bleeding associated with a new vaccine. After substantial time and effort, the signal was proven to be spurious, resulting from a change over time in the way the health plans’ clinicians used certain diagnostic codes (more common use of codes that suggested gastrointestinal bleeding); the signal was highlighted by the change in documentation practices.
This is one example of many ways in which dynamic data systems developed to support health care delivery and payment can pose major challenges when one attempts to use them for surveillance purposes. If a signal appears not to be an artifact of the data systems, it will usually be necessary to validate the accuracy of the coded diagnoses for the cases by obtaining additional information from the associated medical records. Review of medical records will also be important in those cases to determine whether other factors are present that contributed to the outcome. Furthermore, even after it has been determined that there are more confirmed adverse outcomes than would be expected by chance, it will be necessary to disentangle the contribution of the drug or vaccine in question from other potential contributors, such as the underlying illness that was the indication for treatment or concomitant treatments. Finally, active surveillance will raise the issue of balancing benefits and risks to a new level of visibility. Because active surveillance will reveal risks of a drug that would otherwise have taken longer to detect—or perhaps would not have been detected at all—it will force a decision as to whether the benefits of continuing to use a drug in the way it has been used outweigh the risks uncovered by the surveillance.
Benefit of Maintaining the AERS Once a National Active Surveillance Network Has Been Established
Moderator Paul Seligman, FDA, asked Almenoff, DuMouchel, and Platt to comment on the value of maintaining a record system based on voluntary reports of adverse events when an active surveillance network encompassing 100 million people is available. DuMouchel explained that while he is enthusiastic about the idea of an active surveillance network, he believes choosing the correct outcomes to monitor will be challenging. When data are reported to the AERS, a qualified health care provider has already decided that the event is important and should be explored. DuMouchel cautioned that without spontaneous reports, data could be entered into the system without undergoing such scrutiny, and therefore important outcomes could be missed. Although the AERS has a number of limitations as described earlier, until confidence in the ability of an active surveillance system to match the sensitivity of the AERS is established, spontaneous reports should not be abandoned. Platt agreed that spontaneous reports will be needed for the foreseeable future.
Almenoff suggested that an ideal way to approach this issue would be to include in electronic medical records a box that could be checked to indicate that the health care provider believed the occurrence was an adverse event, thereby flagging the event. Responding to this suggestion, Platt said his group is experimenting with “elicited surveillance,” an electronic medical record system including a field designed to prompt clinicians to indicate when an event has occurred (diagnosis or laboratory result) that would not be expected. Using vaccines, this method was tested through comparison with the baseline reporting of the AERS. A five- to six-fold increase in the number of reported events was seen when clinicians were told that they had entered a diagnosis that would be unexpected for an individual who had recently been immunized, and asked whether this might be an adverse event for which they wanted to submit an AERS report. Though Platt believes this might be a good way of soliciting such information from clinicians, he expressed concern that many clinicians are hesitant to attribute unexpected outcomes to drugs, and therefore events could be missed.
Footnotes
- 1
This section is based on the presentation of June Almenoff, Vice President, Safety Evaluation and Risk Management, Global Clinical Safety and Pharmacovigilance, GlaxoSmithKline.
- 2
This section is based on the presentation of William DuMouchel, Chief Statistical Scientist, Lincoln Technologies.
- 3
This section is based on the presentation of Richard Platt, Professor and Chair, Harvard Medical School and Harvard Pilgrim Health Care.
- Pharmacovigilance - Emerging Safety SciencePharmacovigilance - Emerging Safety Science
- Conserved Domain Links for Protein (Select 156152631) (1)Conserved Domains
Your browsing activity is empty.
Activity recording is turned off.
See more...