18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automated Detection of Infectious Disease Outbreaks in Hospitals: A Retrospective Cohort Study

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Susan Huang and colleagues describe an automated statistical software, WHONET-SaTScan, its application in a hospital, and the potential it has to identify hospital infection clusters that had escaped routine detection.

          Abstract

          Background

          Detection of outbreaks of hospital-acquired infections is often based on simple rules, such as the occurrence of three new cases of a single pathogen in two weeks on the same ward. These rules typically focus on only a few pathogens, and they do not account for the pathogens' underlying prevalence, the normal random variation in rates, and clusters that may occur beyond a single ward, such as those associated with specialty services. Ideally, outbreak detection programs should evaluate many pathogens, using a wide array of data sources.

          Methods and Findings

          We applied a space-time permutation scan statistic to microbiology data from patients admitted to a 750-bed academic medical center in 2002–2006, using WHONET-SaTScan laboratory information software from the World Health Organization (WHO) Collaborating Centre for Surveillance of Antimicrobial Resistance. We evaluated patients' first isolates for each potential pathogenic species. In order to evaluate hospital-associated infections, only pathogens first isolated >2 d after admission were included. Clusters were sought daily across the entire hospital, as well as in hospital wards, specialty services, and using similar antimicrobial susceptibility profiles. We assessed clusters that had a likelihood of occurring by chance less than once per year. For methicillin-resistant Staphylococcus aureus (MRSA) or vancomycin-resistant enterococci (VRE), WHONET-SaTScan–generated clusters were compared to those previously identified by the Infection Control program, which were based on a rule-based criterion of three occurrences in two weeks in the same ward. Two hospital epidemiologists independently classified each cluster's importance. From 2002 to 2006, WHONET-SaTScan found 59 clusters involving 2–27 patients (median 4). Clusters were identified by antimicrobial resistance profile (41%), wards (29%), service (13%), and hospital-wide assessments (17%). WHONET-SaTScan rapidly detected the two previously known gram-negative pathogen clusters. Compared to rule-based thresholds, WHONET-SaTScan considered only one of 73 previously designated MRSA clusters and 0 of 87 VRE clusters as episodes statistically unlikely to have occurred by chance. WHONET-SaTScan identified six MRSA and four VRE clusters that were previously unknown. Epidemiologists considered more than 95% of the 59 detected clusters to merit consideration, with 27% warranting active investigation or intervention.

          Conclusions

          Automated statistical software identified hospital clusters that had escaped routine detection. It also classified many previously identified clusters as events likely to occur because of normal random fluctuations. This automated method has the potential to provide valuable real-time guidance both by identifying otherwise unrecognized outbreaks and by preventing the unnecessary implementation of resource-intensive infection control measures that interfere with regular patient care.

          Please see later in the article for the Editors' Summary

          Editors' Summary

          Background

          Admission to a hospital is often a life-saving necessity—individuals injured in a road accident, for example, may need immediate medical and surgical attention if they are to survive. Unfortunately, many patients acquire infections, some of which are life-threatening, during their stay in a hospital. The World Health Organization has estimated that, globally, 8.7% of hospital patients develop hospital-acquired infections (infections that are identified more than two days after admission to hospital). In the US alone, 2 million people develop a hospital-acquired infection every year, often an infection of a surgical wound, or a urinary tract or lung infection. Infections are common among hospital patients because increasing age or underlying illnesses can reduce immunity to infection and because many medical and surgical procedures bypass the body's natural protective barriers. In addition, poor infection control practices can facilitate the transmission of bacteria—including meticillin-resistant Staphylococcus aureus (MRSA) and vancomycin-resistant enterococci (VRE)—and other infectious agents (pathogens) between patients.

          Why Was This Study Done?

          Sometimes, the number of cases of hospital-acquired infections increases unexpectedly or a new infection emerges. Such clusters account for relatively few health care–associated infections, but, because they may arise from the transmission of a pathogen within a hospital, they need to be rapidly identified and measures implemented (for example, isolation of affected patients) to stop transmission if an outbreak is confirmed. Currently, the detection of clusters of hospital-acquired infections is based on simple rules, such as the occurrence of three new cases of a single pathogen in two weeks on the same ward. This rule-based approach relies on the human eye to detect infection clusters within microbiology data (information collected on the pathogens isolated from patients), it focuses on a few pathogens, and it does not consider the random variation in infection rates or the possibility that clusters might be associated with shared facilities rather than with individual wards. In this study, the researchers test whether an automated statistical system can detect outbreaks of hospital-acquired infections quickly and accurately.

          What Did the Researchers Do and Find?

          The researchers combined two software packages used to track diseases in populations to create the WHONET-SaTScan cluster detection tool. They then compared the clusters of hospital-acquired infection identified by the new tool in microbiology data from a 750-bed US academic medical center with those generated by the hospital's infection control program, which was largely based on the simple rule described above. WHONET-SaTScan found 59 clusters of infection that occurred between 2002 and 2006, about three-quarters of which were identified by characteristics other than a ward-based location. Nearly half the cluster alerts were generated on the basis of shared antibiotic susceptibility patterns. Although WHONET-SaTScan identified all the clusters previously identified by the hospital's infection control program, it classified most of these clusters as likely to be the result of normal random variations in infection rates rather than the result of “true” outbreaks. By contrast, the hospital's infection control department only identified three of the 59 statistically significant clusters identified by WHONET-SaTScan. Furthermore, the new tool identified six previously unknown MRSA outbreaks and four previously unknown VRE outbreaks. Finally, two hospital epidemiologists (scientists who study diseases in populations) classified 95% of the clusters detected by WHONET-SaTScan as worthy of consideration by the hospital infection control team and a quarter of the clusters as warranting active investigation or intervention.

          What Do These Findings Mean?

          These findings suggest that automated statistical software should be able to detect clusters of hospital-acquired infections that would escape detection using routine rule-based systems. Importantly, they also suggest that an automated system would be able to discount a large number of supposed outbreaks identified by rule-based systems. These findings need to be confirmed in other settings and in prospective studies in which the outcomes of clusters detected with WHONET-SaTScan are carefully analyzed. For now, however, these findings suggest that automated statistical tools could provide hospital infection control experts with valuable real-time guidance by identifying outbreaks that would be missed by routine detection methods and by preventing the implementation of intensive and costly infection control measures in situations where they are unnecessary.

          Additional Information

          Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000238.

          Related collections

          Most cited references8

          • Record: found
          • Abstract: found
          • Article: not found

          A Space–Time Permutation Scan Statistic for Disease Outbreak Detection

          Introduction The World Trade Center and anthrax terrorist attacks in 2001, as well as the recent West Nile virus and SARS outbreaks, have motivated many public health departments to develop early disease outbreak detection systems using non-diagnostic information, often derived from electronic data collected for other purposes (“syndromic surveillance”) [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17]. These include systems that monitor the number of emergency department visits, primary care visits, ambulance dispatches, nurse hot line calls, pharmaceutical sales, and West Nile–related dead bird reports. The establishment of such systems involves many challenges in data collection, analytical methods, signal interpretation, and response. Important analytical challenges include dealing with the unknown time, place, and size of an outbreak, detecting an outbreak as early as possible, adjusting for natural temporal and geographical variation, and dealing with the lack of suitable population-at-risk data. Most analytical methods in use for the early detection of disease outbreaks are purely temporal in nature [18,19,20,21,22]. These methods are useful for detecting outbreaks that simultaneously affect all parts of the geographical region under surveillance, but may be late at detecting outbreaks that start locally. While purely temporal methods can be used in parallel for overlapping areas of different sizes in order to cover all possible outbreaks, that approach leads to a severe problem of multiple testing, generating many more false signals than the nominal statistical significance level would indicate. First studied by Naus [23], the scan statistic is an elegant way to solve problems of multiple testing when there are closely overlapping spatial areas and/or time intervals being evaluated. Temporal, spatial, and space–time scan statistics [24,25,26,27] are now commonly used for disease cluster detection and evaluation, for a wide variety of diseases including cancer [28,29], Creutzfeldt-Jakob disease [30], granulocytic ehrlichiosis [31], sclerosis [32], and diabetes [33]. The basic idea is that there is a scanning window that moves across space and/or time. For each location and size of the window, the number of observed and expected cases is counted. Among these, the most “unusual” excess of observed cases is noted. The statistical significance of this cluster is then evaluated taking into account the multiple testing stemming from the many potential cluster locations and sizes evaluated. To date, all scan statistics require either a uniform population at risk, a control group, or other data that provide information about the geographical and temporal distribution of the underlying population at risk. Census population numbers are useful as a denominator for cancer, birth defects, and other registry data, where the expected number of cases can be accurately estimated based on the underlying population. They are less relevant for surveillance data such as emergency department visits and pharmacy sales, since the catchment area for each hospital/pharmacy is undefined. Even if it were available, the catchment area population would not be a good denominator since there can be significant natural geographical variation in health-care utilization data, due to disparities in disease prevalence, access to health care, and consumer behavior [34]. One option when evaluating data that are affected by utilization behavior is to use total volume as the denominator. For example, one may use total emergency department visits as a denominator when evaluating diarrhea visits [7], or similarly, all pharmacy sales as the denominator when evaluating diarrhea medication sales [4]. This may or may not work depending on the nature of the data. For example, changes in total drug sales due to sales promotions or the allergy season could hide a true signal or create a false signal for the drug category of interest. In this paper we present a prospective space–time permutation scan statistic that does not require population-at-risk data, and which can be used for the early detection of disease outbreaks when only the number of cases is available. The method can be used prospectively to regularly scan a geographical region for outbreaks of any location and any size. For each location and size, it looks at potential one-day as well as multi-day outbreaks, in order to quickly detect a rapidly rising outbreak and still have power to detect a slowly emerging outbreak by combining information from multiple days. The space–time permutation scan statistic was gradually developed as part of the New York City Department of Health and Mental Hygiene (DOHMH) surveillance initiatives, in parallel with the adaptation of population-at-risk-based scan statistics for dead bird reports (for West Nile virus) [13], emergency department visits [7], ambulance dispatch calls [6], pharmacy sales [4], and student absentee records [3]. In this methodological paper, the space–time permutation scan statistic is presented and illustrated using emergency department visits for diarrhea, respiratory, and fever/flu-like illnesses. Methods New York City Emergency Department Syndromic Surveillance System The New York City Emergency Department syndromic surveillance system is described in detail elsewhere [7]. In brief, participating hospitals transmit electronic files to the DOHMH seven days per week. Files contain data for all emergency department patient visits on the previous day, including the time of visit, patient age, gender, home zip code, and chief complaint—a free-text field that captures the patient's own description of their illness. As of November 2002, 38 of New York City's 66 emergency departments were participating in the system, covering an estimated 75% of emergency department visits in the city. Data are verified for completeness and accuracy, concatenated into a single dataset, and appended to a master archive using SAS [35]. To categorize visits into “syndromes” (e.g., “diarrhea syndrome”), a computer algorithm searches the free-text chief complaint field for character strings indicating symptoms consistent with that syndrome. The goal of data analysis, which is carried out seven days per week, is to detect unusual increases in key syndrome categories. To run the space–time permutation scan statistic we have written a SAS program that generates the necessary case and parameter files, invokes the SaTScan software [36], and reads the results back into SAS for reporting and display. Two sets of analyses are performed, one based on assigning each individual to the coordinates of their residential zip code and the other based on their hospital address. With 183 zip codes versus 38 hospitals, the former utilizes more detailed geographical information, while the latter may be able to pick up outbreaks not only related to place of residence but also to place of work or other outside activities (if people go to the nearest hospital when they feel sick). Residential zip code is not recorded by the hospital for about 3% of patients, and for the analyses described here, these individuals are only included in the hospital-based analyses. The performance of the prospective space–time permutation scan statistic was evaluated using both hospital and residential analyses. We used historical diarrhea data to mimic a prospective surveillance system with daily analyses from 15 November 2001 to 14 November 2002. For each of these days, the analysis only used data prior to and including the day in question, ignoring all data from subsequent days. This corresponds to the actual data available at the DOHMH 8–12 h after the end of that day, when that analysis would have been conducted if the system has been in place at that time. We also present one week of daily prospective analyses conducted in November 2003, where the daily analysis was run about 12 h after the conclusion of each day, as part of the regular syndromic surveillance activities at the DOHMH. The Space–Time Permutation Scan Statistic As with the Poisson- and Bernoulli-based prospective space–time scan statistics [27], the space–time permutation scan statistic utilizes thousands or millions of overlapping cylinders to define the scanning window, each being a possible candidate for an outbreak. The circular base represents the geographical area of the potential outbreak. A typical approach is to first iterate over a finite number geographical grid points and then gradually increase the circle radius from zero to some maximum value defined by the user, iterating over the zip codes in the order in which they enter the circle. In this way, both small and large circles are considered, all of which overlap with many other circles. The height of the cylinder represents the number of days, with the requirement that the last day is always included together with a variable number of preceding days, up to some maximum defined by the user. For example, we may consider all cylinders with a height of 1, 2, 3, 4, 5, 6, or 7 d. For each center and radius of the circular cylinder base, the method iterates over all possible temporal cylinder lengths. This means that we will evaluate cylinders that are geographically large and temporally short, forming a flat disk, those that are geographically small and temporally long, forming a pole, and every other combination in between. What is new with the space–time permutation scan statistic is the probability model. Since we do not have population-at-risk data, the expected must be calculated using only the cases. Suppose we have daily case counts for zip-code areas, where czd is the observed number of cases in zip-code area z during day d. The total number of observed cases (C) is For each zip code and day, we calculate the expected number of cases μ zd conditioning on the observed marginals: In words, this is the proportion of all cases that occurred in zip-code area z times the total number of cases during day d. The expected number of cases μ A in a particular cylinder A is the summation of these expectations over all the zip-code-days within that cylinder: The underlying assumption when calculating these expected numbers is that the probability of a case being in zip-code area z, given that it was observed on day d, is the same for all days d. Let cA be the observed number of cases in the cylinder. Conditioned on the marginals, and when there is no space–time interaction, cA is distributed according to the hypergeometric distribution with mean μ A and probability function When both Σ z εA czd and Σ d εA czd are small compared to C, cA is approximately Poisson distributed with mean μ A [37]. Based on this approximation, we use the Poisson generalized likelihood ratio (GLR) as a measure of the evidence that cylinder A contains an outbreak: In words, this is the observed divided by the expected to the power of the observed inside the cylinder, multiplied by the observed divided by the expected to the power of the observed outside the cylinder. Among the many cylinders evaluated, the one with the maximum GLR constitutes the space–time cluster of cases that is least likely to be a chance occurrence and, hence, is the primary candidate for a true outbreak. One reason for using the Poisson approximation is that it is much easier to work with this distribution than the hypergeometric when adjusting for space by day-of-week interaction (see below), as the sum of Poisson distributions is still a Poisson distribution. Since we are evaluating a huge number of outbreak locations, sizes, and time lengths, there is serious multiple testing that we need to adjust for. Since we do not have population-at-risk data, this cannot be done in any of the usual ways for scan statistics. Instead, it is done by creating a large number of random permutations of the spatial and temporal attributes of each case in the dataset. That is, we shuffle the dates/times and assign them to the original set of case locations, ensuring that both the spatial and temporal marginals are unchanged. After that, the most likely cluster is calculated for each simulated dataset in exactly the same way as for the real data. Statistical significance is evaluated using Monte Carlo hypothesis testing [38]. If, for example, the maximum GLR is calculated from 999 simulated datasets, and the maximum GLR for the real data is higher than the 50th highest, then that cluster is statistically significant at the 0.05 level. In general terms, the p-value is p = R/(S + 1) where R is the rank of the maximum GLR from the real dataset and S is the number of simulated datasets [38]. In addition to p-values, we also report null occurrence rates [8], such as once every 45 d or once every 23 mo. The null occurrence rate is the expected time between seeing an outbreak signal with an equal or higher GLR assuming that the null hypothesis is true. For daily analyses, it is defined as once every 1/p d. For example, under the null hypothesis we would at the 0.05 level on average expect one false alarm every 20 d for each syndrome under surveillance. Because of the Monte Carlo hypothesis testing, the method is computer intensive. To facilitate the use of the methods by local, state, and federal health departments, the space–time permutation scan statistic has been implemented as a feature in the free and public domain SaTScan software [36]. Implementation for New York City Syndromic Surveillance Depending on the application, the method may be used with different parameter settings. For the syndromic surveillance analyses we set the upper limit on the geographical size of the outbreak to be a circle with a 5-km radius, and the maximum temporal length to be 7 d. This means that we are evaluating outbreaks with a circle radius size anywhere between 0 km (one zip code only) and 5 km, and a time length (cylinder height) of 1 to 7 d. The latter restriction is a reflection of the belief that the main purpose of this syndromic surveillance system is early disease outbreak detection, and if the outbreak has existed for over 1 wk, it is more likely to be picked up by reporting of specific disease diagnoses by clinicians or laboratories. Another practical choice is the total number of days to include in the analysis. One option is to include all previous days for which data are available. We chose instead to analyze the last 30 d of data, adding one day and removing another for each daily analysis. We believe this time frame provides enough baseline beyond the 1- to 7-d scanning window to establish the usual pattern of visits while avoiding inclusion of data that may no longer be relevant to the current period. To reduce the computational load, we limited the centers of the circular cylinder bases to be a collection of 446 zip-code area centroids and hospital locations in New York City and adjacent areas. This ensures, among other things, that each zip-code area may constitute an outbreak on its own. The last parameter that we need to set is the number of Monte Carlo replications used for each analysis. For the daily prospective analyses we chose 999, which meant that the smallest p-value we could get was 0.001, so that the smallest null occurrence rate possible for a signal was once every 2.7 y. In our system, signals of that strength clearly merit investigation. For the historical evaluation, in order to obtain more precise null occurrence rates, we set the number of replications to 9,999. Adjusting for Space by Day-of-Week Interaction The space–time permutation scan statistic automatically adjusts for any purely spatial and purely temporal variation. For many syndromic surveillance data sources, there is also natural space by day-of-week interaction in the data that is not due to a disease outbreak but to consumer behavior, store hours, etc. For example, if a particular pharmacy has an exceptionally high number of sales on Sundays because neighboring pharmacies are closed, we might get a signal for this pharmacy every Sunday unless we adjust for this space by day-of-week interaction. This can be done through a stratified random permutation procedure. The first step is to stratify the data by day of week: Monday, Tuesday,…, Sunday. The space–time permutation randomization step is then done separately for each day of the week. For each zip code and day combination, the expected is then calculated using only data from that day of the week. For each cylinder, both the observed and expected number of cases is summed over all day-of-week strata, zip code, and day combinations within that cylinder. The same technique can be used to adjust for other types of space–time interaction. The underlying assumption when calculating these expected numbers is now that the probability of a case being in zip-code area z, given that it was observed on a Monday, is the same for all Mondays, etc. All our analyses were adjusted for space by day-of-week interaction. Missing Data Daily disease surveillance systems require rapid transmission of data, and it may not be possible to get complete data from each provider every single day. When we first tried the new method in New York City, a number of highly significant outbreak signals were generated that were artifacts of previously unrecognized missing or incomplete data from one or more hospitals. This is a good reflection on the method, since it should be able to detect abnormalities in the data no matter what their cause, but it also illustrates the importance of accounting for missing data in order to create an early detection system that is useful on a practical level. Depending on the exact nature of the missing data, there are different ways to handle it. We used a combination of three different approaches. (1) If a hospital had missing data for all of the past 7 d (all possible days within the cylinder), we completely removed that hospital from the analysis, including all previous 23 d. (2) If a hospital had no missing data during the last 7 d, but one or more missing days during the previous 23 baseline days, then we completely removed the baseline days with some missing data, for all of the hospitals. (3) If a hospital had missing data for at least one but not all of the last 7 d, then we removed those missing days together with all previous days for the same hospital and the same day of week. That is, if Monday was missing during the last week, then we removed all Mondays for that hospital. This removal introduces artificial space by day-of-week interaction, so this approach only works if it is implemented in conjunction with the stratified adjustment for space by day-of-week interaction. For some analyses, more than one of these approaches were used simultaneously. Note that, since the missing data depend on the hospital, the solution is to remove specific hospitals and days rather than zip codes and days, even when we are doing the zip-code-based residential analyses. If there are many hospitals with missing data, then the second approach could potentially remove all or almost all of the baseline days. To avoid this, one could sometimes go further back in time and add the same number of earlier days to compensate. Another option is to impute into the cells with missing data a Poisson random number of cases generated under the null hypothesis. Given the completeness of our data, neither of these methods were employed (94% of analyses were conducted with four or fewer baseline days removed). Results Evaluation Using Historical Data: Diarrhea Surveillance We first tested the new method by mimicking daily prospective analyses of hospital emergency department data from 15 Nov 2001 to 14 Nov 2002, looking at diarrhea visits. Signals with p ≤ 0.0027 are listed in Table 1 and depicted on the map in Figure 1. That is, we only list those signals with a null occurrence rate of once every year or less often. For the residential zip-code analyses, there were two such signals. For the hospital analyses, there were six, two of which occurred in the same place on consecutive days. It is worth noting that at the false alarm rate chosen, none of the residential signals correspond to any of the hospital signals. For the residential analysis, the strongest signal was on 9 February 2002, covering 17 zip-code areas in southern Bronx and northern Manhattan. This signal had 63 cases observed over 2 d when 34.7 were expected (relative risk = 1.82). With a null occurrence rate of once every 5.5 y, a spike in cases of this magnitude is unlikely to be due to random variation. The signal immediately preceded a sharp increase in citywide diarrheal visits from 10 February to 20 March (Figure 2). In both the localized 9 February cluster and the citywide outbreak, the increase was most notable among children less than 5 y of age. The weaker 26 February hospital signal and the 7 March residential signal that were centered in northern Manhattan occurred at the peak of this citywide outbreak. Laboratory investigation of the citywide increase in diarrheal activity indicated the rotavirus as the most likely causative agent. The two hospital signals on 1 November and 2 November 2002, were at the same three hospitals in southern Bronx and northern Manhattan, with null occurrence rates of 1.6 and 3.4 y, respectively. These signals immediately preceded another sharp increase in citywide diarrheal activity, this time among individuals of all ages (Figure 2). This citywide outbreak lasted approximately 6 wk and coincided with a number of institutional outbreaks in nursing homes and on cruise ships. Laboratory investigation of several of these outbreaks revealed the norovirus as the most likely causative agent. A similar citywide outbreak of norovirus in 2001 began shortly before the 21 November 2001 hospital signal in northern Bronx, which had a null occurrence rate of once every 3.4 y. For the hospital analyses, the strongest signal was a 1-d cluster at a single hospital in Queens on 11 January 2002, with ten diarrhea cases when only 2.3 were expected, which one would only expect to happen once every 3.9 y. Being very local in both time and space, it is different from the previously described signals preceding citywide outbreaks. While examination of individual-level data revealed a predominance of infants under the age of two, this cluster could not be associated with any known outbreak, and retrospective investigation was not feasible. As shown in Table 1, at the p = 0.0027 threshold there were six and two signals for the hospital and residential analyses, respectively, compared to one expected in each. Figure 3 shows the number of days on which the p-value of the most likely cluster was within a given range. Had the null hypothesis been true on all 365 d analyzed, the p-values would have been uniformly distributed between zero and one. The fact that in our data there were more days with low rather than high p-values is an indication that there may be additional true “outbreaks” that are indistinguishable from random noise. These could be very small disease outbreaks, for example, due to spoiled food eaten by only a few people, or they could be artifacts caused by, for example, changes in the hours of operation at an emergency department or coding differences between the emergency department triage nurses. Daily Prospective Surveillance Since 1 November 2003, the space–time permutation scan statistic has been used daily in parallel with the population-at-risk-based space–time scan statistics [7] as part of the DOHMH Emergency Department surveillance system. For respiratory symptoms, fever/flu, and diarrhea, the results for the last week of November are listed in Tables 2 and 3. For diarrhea or respiratory symptoms there were no strong signals warranting an epidemiological investigation, and all had null occurrence rates of more often than once every month. This reflects a very typical week. For fever/flu there was a strong 7-d hospital signal in southern Bronx and northern Manhattan on 28 November with a null occurrence rate of once every 2.7 y. On each of the following 2 d, there were again strong hospital signals in the same general area as well as residential zip-code signals of lesser magnitude. These signals started 12 d into a gradual citywide increase in fever/flu that continued to grow through the end of December, driven by an unusually early influenza season in New York City. Discussion In this paper we have presented a new method for prospective infectious disease outbreak surveillance that uses only case data, handles missing data, and makes minimal assumptions about the spatiotemporal characteristics of an outbreak. When using historical emergency department chief complaint data to mimic a prospective surveillance system with daily analyses, we detected four highly unusual clusters of diarrhea cases, three of which heralded citywide gastrointestinal outbreaks due to rotavirus and norovirus. Three of four weaker signals also occurred immediately preceding or concurrent with these citywide outbreaks. If we assume that all of these clusters were associated with the citywide disease outbreaks, then the method generated at most two false alarms at a signal threshold where we would have expected one by chance alone. For disease outbreak detection, the public-health community has historically relied on the watchful eyes of physicians and other health-care workers. However, the increasing availability of timely electronic surveillance data, both reportable diagnoses and pre-diagnostic syndromic indicators, raises the possibility of earlier outbreak detection and intervention if suitable analytic methods are found. While it is still unclear whether systematic health surveillance using syndromic or reportable disease data will be able to quickly detect a bioterrorism attack [39,40], the methods described here can also be applied to early detection of outbreaks of other, more common infectious diseases. There are other alternative ways to calculate expected counts from a series of case data. One naive approach is to use the observed count 7 d ago in a zip-code area as the expected count for that same area today, and then apply the regular Poisson-based space–time scan statistic. When applied to the New York City diarrhea data described above, such an approach generated at least one “statistically significant” outbreak signal on each of the 365 d evaluated. The basic problem with this is that there is random variation in the observed counts that are used to calculate the expected, which is not accounted for in the Poisson-based scan statistic. If we based the expected on the average of multiple prior weeks of data, we would get less variability in the expected counts and fewer false signals, but the problem would still persist, and as the number of weeks increase beyond a few months other problems may gradually arise due to, for example, seasonal trends or population size changes. Computing time depends on the size of the dataset and the analysis parameters chosen. With 999 replications, the hospital analyses with 38 data locations take 7 s to run on a 2.5-MHz Pentium 4 computer, while the residential analyses using 183 zip-code area locations take 11 s. The same numbers for 9,999 replications are 27 and 57 s, respectively. There are a number of limitations with the proposed method. The method is highly sensitive to missing or incomplete data. Our first implementation of the method resulted in a number of false alarms, and highlights the need for systematic data quality checks and the analytic adjustments described above. When excellent population-at-risk data are available, we expect the Poisson-based space–time scan statistic that utilizes this extra information to perform better than the space–time permutation scan statistic. If, however, the population-at-risk data are of poor quality or nonexistent, which is often the case, then the space–time permutation scan statistic should be used. Since the space–time permutation scan statistic adjusts for purely temporal clusters, it can only detect citywide outbreaks if they start locally, but not if they occur more or less simultaneously in the whole city. Hence, it does not replace purely temporal surveillance methods, but rather complements them. Finally, it is important to note that the geographical boundary of the detected outbreak is not necessarily the same as the boundary of the true outbreak. Since we used circles as the base for the scanning cylinder, all detected outbreaks are approximately circular. Other shapes of the scanning window are also available [36], but it has been shown that circular scan statistics are also able to detect noncircular outbreak areas [41]. The less geographically compact the outbreak is, though, the less power (sensitivity) there is to detect it. For example, using circles we cannot expect to pick up an outbreak that is very long and narrow such as a one-block area on each side of Broadway, stretching from southern to northern Manhattan. The emergency department data used in this study also have some limitations. In addition to the citywide outbreaks, there were several institutional gastrointestinal outbreaks reported to DOHMH during the historical 1-y period but not detected in emergency department data using the space–time permutation scan statistic. One reported outbreak involved school children that went to the emergency department of a nonparticipating hospital. Other outbreaks went undetected because medical care was not sought in emergency departments. Most people with diarrhea do not go to the hospital emergency department. Rather, they call or go to their primary care physician, they visit the pharmacy to buy over-the-counter medication, or they may have symptoms that are so mild that they do not seek medical care. Further studies are needed to evaluate the strengths and weaknesses of different data sources. The geographic units of analysis used were residential zip code and hospital location. It may be hard to detect outbreaks that affect only a small part of a single zip code, especially if the background rate of the syndrome is fairly high. Where available, the exact coordinates of a patient's residence can be used to avoid problems introduced when aggregating data. Furthermore, some outbreaks may not be clustered by place of residence, as in the case of an exposure occurring at the place of work or in a subway. Using the location of the hospital rather than residence may provide higher power to detect workplace-related outbreaks, but the only way to fully address this issue may be to conduct workplace surveillance. In spite of these limitations, we have presented a new method for the early detection of disease outbreaks and illustrated its practical use. The primary advantages of the method are that it is easy to use, it only requires case data, it automatically adjusts for naturally occurring purely spatial and purely temporal variation, it allows adjustment for space by day-of-week interaction, and it is capable of handling missing data. While the method was developed and applied in the context of syndromic surveillance, it may also be used for the early detection of diagnosed disease outbreaks, or for detecting changes in the pattern of chronic diseases, when population census information is unavailable, unreliable, or not available at the fine geographical resolution needed. The ability to perform disease surveillance without population-at-risk data is especially important in developing countries, where these data may be hard to obtain. The space–time permutation scan statistic could also be used for similar early detection problems in other fields, such as criminology, ecology, engineering, social sciences, and veterinary sciences. Patient Summary Background Detecting disease outbreaks early means that health officials are better able to fight and contain them. Electronic patient records that can be analyzed with statistical methods in computer programs should help with disease surveillance and make it possible to detect outbreaks early without raising too many false alarms. Why Was This Study Done? The researchers who did this study have developed and operated real-time disease surveillance systems. In any such system, there will always be more disease cases in some places and time periods than in others, for example, because there are more people living there, or because there are more people of a certain type living there, like older people or children, who are more prone to get sick. The researchers were trying to develop a method that can discover outbreaks without the need to know about the structure of the population under surveillance. What Did the Researchers Do? They modified an existing method to make it work without data on the structure of the population under surveillance. They also found a way to deal with incomplete data, when, for example, one hospital did not report any data for a particular day. What Did They Find? When they applied the method to emergency room data from New York City, they found that it performs well: it seems to be able to detect real outbreaks early and not result in many false alarms. What Are the Limitations of the Method? The method can detect only outbreaks that start locally, not those that occur more or less simultaneously in the whole surveillance area. For some outbreaks—for example, those caused by exposure to an infectious agent in the subway—patients will not necessarily live in the same neighborhood or go to the same emergency room. The method will not detect outbreaks with very few cases, such as one case of small pox or three cases of anthrax, such as the anthrax bioterrorism attacks in the fall of 2001. And the method only works for diseases with early symptoms severe enough that people go to the emergency room. Efficient disease surveillance will need the parallel use of different methods, each with their own strengths and weaknesses. What Next? The method was developed as part of the New York City Department of Health and Mental Hygiene surveillance initiatives and is now being used every day to analyze emergency department records from 38 hospitals in the city. To facilitate wider use, the method has been integrated into a more diverse software called SaTScan that is freely available. Where Can I Find Out More? The following websites provide additional information on this and other methods. Details on SaTScan and software for downloading: http://www.satscan.org/ United States Centers of Disease Control and Prevention Web page on electronic disease surveillance: http://www.cdc.gov/od/hissb/act_int.htm National Syndromic Surveillance Conference: http://www.syndromic.org/index.html National Bioterrorism Syndromic Surveillance Demonstration Program: http://btsurveillance.org/ The Real-Time Outbreak and Disease Surveillance Open Source Project: http://openrods.sourceforge.net/
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Evaluating cluster alarms: a space-time scan statistic and brain cancer in Los Alamos, New Mexico.

            This article presents a space-time scan statistic, useful for evaluating space-time cluster alarms, and illustrates the method on a recent brain cancer cluster alarms in Los Alamos, NM. The space-time scan statistic accounts for the preselection bias and multiple testing inherent in a cluster alarm. Confounders and time trends can be adjusted for. The observed excess of brain cancer in Los Alamos was not statistically significant. The space-time scan statistic is useful as a screening tool for evaluating which cluster alarms merit further investigation and which clusters are probably chance occurrences.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Automated DNA Sequence-Based Early Warning System for the Detection of Methicillin-Resistant Staphylococcus aureus Outbreaks

              Introduction In the United States alone, infections acquired in hospitals affect 2 million patients, account for half of all major hospital complications, and result in annual costs of more than $4.5 billion [1]. Staphylococcus aureus is the leading cause of these nosocomial infections that include a wide range of diseases such as endocarditis, septicemia, skin infections, soft tissue infections, and bone infections [2]. Strains resistant to methicillin, in particular, have become a major concern in the hospital environment because of the high mortality rate and the stringent hygienic requirements needed for patients who are harboring a methicillin-resistant S. aureus (MRSA) [3,4]. Moreover, since the emergence of strains that are insensitive or have reduced sensitivity to glycopeptides, there is a real danger of infections spreading that have even greater drug resistance [5]. Analysis of laboratory test results and patients' charts are the methods usually used to identify outbreaks. However, the manual review of laboratory test results is time-consuming and resource-intensive. Electronic analysis of data can help identify suspicious patterns of disease and antimicrobial resistance [6], but such sentinel methods are rarely used in clinical practice. The typing of MRSA isolates, not only from clinical specimens, but also from surveillance cultures, is necessary for the elucidation of possible transmission routes. Because the procedures are slow and laborious, molecular typing (e.g., pulsed-field gel electrophoresis [PFGE]) is usually used a posteriori to track the course of nosocomial infections in an already established outbreak. Furthermore, PFGE requires great efforts to harmonize protocols and is therefore only partially successful in generating reproducible results [7]. In order to improve the speed of typing, DNA sequence-based approaches, such as the multi-locus sequence typing (MLST), are becoming more frequently used [8]. However, MLST is not suitable for routine surveillance of MRSA because of the high costs involved and the low discriminatory power compared to PFGE. Frenay et al., who were the first to use a single-locus sequence typing method for S. aureus, employed the sequence of the polymorphic region X of the S. aureus protein A gene (spa) for typing [9]. Since then, numerous studies evaluated this variable number of tandem repeat targets as quite suitable for short-term epidemiological applications, e.g., [10–13]. Because of the paucity in software for repeat identification and lack of a consensus in assigning spa type names, the wide-spread use of the method was hampered for years until the recent introduction of the Ridom StaphType software [14]. With this software, the spa sequences are analyzed automatically and linked to a database integrated with epidemiological information. A universal nomenclature is achieved by synchronization with a central server that assigns new spa types for all users (http://www.spaserver.ridom.de). The aim of the study reported here was therefore to analyze the utility of a spa sequence-based, automatic early warning algorithm to detect MRSA clusters in hospitals and to compare this approach with classical surveillance techniques. We hypothesized that the automated system, once established, can complement and even replace the labor-intensive traditional methods used for cluster identification. Methods Setting Between 1998 and 2003, a total of 557 non-replicate MRSA isolates were collected at the University Hospital Münster (UHM), Germany, a 1,480-bed tertiary-care teaching facility. In 2003, there were approximately 43,000 annual admissions to the hospital where the mean length of stay was 9.8 d. The prevalence of patients with MRSA colonizations and infections was taken as the annual number of persons harboring MRSA (×100) divided by the total number of admissions at UHM [15]. The baseline for calculation of the relative risk was the year 1998. Surveillance and Infection Control Measures All new MRSA cases were monitored prospectively by infection control professionals (ICP) from the day when MRSA was first identified until hospital discharge. Information on each patient was obtained by reviewing medical records and laboratory data and holding telephone interviews with the attending physician. Subsequently, the ICP decided if a transmission event was likely and if further investigation was necessary. In more detail, the following infection control measures were implemented: (i) As recommended in the guidelines of the Robert Koch Institute (Berlin, Germany), all patients infected or colonized with MRSA were placed in contact isolation until the time of discharge or until eradication could be documented in three consecutive sets of negative surveillance cultures (separated by at least 24 h). MRSA surveillance cultures included swabs of several body sites (nose, groin, skin lesions, inguinal, perineal, and axillary swabs). In the case of infected patients, samples were taken from the site of infection. (ii) All patients known to have been previously colonized or infected with MRSA were isolated on re-admission to UHM and surveillance swabs were obtained. Negative surveillance cultures were mandatory in order to terminate contact isolation. (iii) Clinical microbiology laboratory results were monitored daily for the occurrence of specimens containing MRSA. (iv) spa typing of MRSA isolates, as performed since 2002, were carried out directly after detection of a new MRSA isolate. (v) Colonized patients were treated with nasal mupirocin ointment for 5 d and daily chlorhexidine body washes were applied. In the case of patients remaining in the hospital after eradication, weekly surveillance cultures were recommended over a 4-wk period, and then at monthly intervals to detect possible re-colonization. (vi) To detect MRSA colonization and cross-transmission, surveillance cultures were obtained from roommates as soon as a new MRSA patient was identified. (vii) Staff were screened when nosocomial transmissions were suspected and at intervals as a surveillance method on high-risk wards. (viii) Hospital staff found MRSA-positive were suspended from work on the wards until the successful eradication of MRSA could be documented. (ix) Systematic surveillance cultures at the time of admission and on a weekly basis thereafter were begun in 2002 in wards caring for high-risk patients, e.g., intensive care units [15,16]. Colonization and infection were defined in accordance with the Centers for Disease Control and Prevention criteria [17]. Microbiology and Molecular Typing The strain collection consisted of MRSA from various clinical sources (e.g., blood cultures and wound infections) and included surveillance cultures from patients and staff. Of all clinical S. aureus isolates, 6.4% exhibited methicillin resistance in 2003. For species identification, every strain was tested with API ID 32 Staph (bioMérieux, Marci l'Etoile, France) and for the presence of free coagulase. The presence of the mecA gene responsible for methicillin resistance was confirmed using PCR [18]. The sequence of the short sequence repeat region of the spa gene encoding the S. aureus protein A was determined in 557 strains [14]. The primers spa-1113f (5′- TAA AGA CGA TCC TTC GGT GAG C −3′) and spa-1514r (5′- CAG CAG TAG TGC CGT TTG CT −3′) were used for spa amplification and Taq Cycle sequencing. DNA sequences were obtained with an ABI Prism 3100 Avant Genetic Analyzer (Applied Biosystems, Foster City, California, United States) and analyzed with the Ridom StaphType software version 1.5 beta (Ridom GmbH, Würzburg, Germany) incorporating the newly added automated early warning system (“clonal alerts”) for MRSA cluster detection [14]. Typability, discriminatory index, and the 95% confidence interval (CI) of the discriminatory index were calculated using the procedures published previously [19,20]. Retrospective Temporal-Scan Test Statistics To evaluate the various early warning algorithms, we performed scan test statistics using the epidemiological and typing information from 1998 to 2002 as historical data to determine MRSA clusters in 2003 [21,22]. Temporal-scan statistics evaluates whether an apparent cluster of disease is unlikely to occur by chance alone. Thereby, the test determines a likelihood p-value for an observed number of cases appearing in a window of fixed width as the window is moved along the time axis studied (2003). Observed and expected cases, the latter calculated using the historical data (1998–2002), were compared with a null hypothesis that states cases occur at random, evaluated against the alternative hypothesis that states cases cluster in certain time periods. In this evaluation, a Poisson distribution was assumed because a positive MRSA finding is a rare and irregular event. Clusters of two or more infected/colonized patients or colonized staff on the same ward or wards in close contact (e.g., interdisciplinary intensive care units) occurring within a 2-wk window and harboring the same MRSA isolate according to the spa typing, were identified as significant at the 5% level. These statistically confirmed clusters were then used as the “gold standard” for comparing the various alert mechanisms. Non-significant clusters were considered to be sporadic occurrences. Early Warning Algorithms Every MRSA isolate obtained in 2003 was examined in a prospective analysis by applying descriptive epidemiologic parameters such as time, place, and person. When two or more MRSA isolates were detected within a 2-wk window on the same ward or on wards having close contact, the resulting alert was regarded as a “frequency alert” and allocated to a “frequency cluster.” If MRSA isolates also shared an identical spa type, the allocation to “clonal alerts” and associated “clonal clusters” was triggered. An ICP, which is a panel consisting of two physicians and four infection control nurses who meet weekly and hold additional meetings when an outbreak occurs, rate the findings as “ICP alerts” and “ICP clusters,” respectively. When feasible, the area of surveillance is widened and an investigation initiated. The ICP uses microbial data and data from patients' charts to reach their decisions but are blind to the occurrence of an outbreak on the basis of spa typing results. Statistical Analysis Sensitivity, specificity, positive and negative predictive values (PPV, NPV), and pre-test probability were determined as described by Sackett et al. [23]. The pre-test probability is defined as the proportion with the target disorder (MRSA cluster) in the population at risk (MRSA positive) at a specific time interval. Two-tailed, 95% CIs were calculated to assess sensitivity, specificity, PPV, and NPV using a normal approximation for the pertinent (binomial) distribution. The chi-square distribution, with one degree of freedom, was used to determine the significance of the differences in these parameters. Results Table 1 summarizes the important epidemiological indicators for MRSA at the UHM. The overall prevalence of MRSA cases was 0.17 per 100 admissions and the relative risk of acquiring MRSA increased 4-fold during the study period. The annual number of patients with MRSA bacteremia reached a peak in 2003 with six patients. The average turn-around time for spa typing under routine laboratory conditions was 2.4 d. Of the 557 MRSA isolates tested, 549 (98.6%) could be typed using spa sequencing, and the eight strains, which could not be typed, were excluded from the analysis. A total of 79 different spa types were identified in samples collected for the period 1998–2003. The discriminatory power of spa typing was 91.8%. Table 2 shows the frequency with which the various spa types were isolated at UHM and nationally. spa types t003, t004, t001, and t032 were the types most frequently isolated during the study period and accounted for 52.9% of all cases. In Table 2 the typing results are also brought into a global epidemiological context as defined by PFGE and MLST [11,24–26]. The dynamics, expressed on an annual basis of the epidemic MRSA clones at UHM and in Germany as a whole, are depicted in Figure 1. In general, the findings for UHM followed the national trend, i.e., the number of “Barnim” and “Rhine Hesse” MRSA clones increased in parallel throughout the study period. As shown by data for the “Southern German” MRSA clone, the fluctuation at the regional level was more marked, and this is probably due to the smaller number of cases. In 2003, a total of 175 MRSA isolates (154 patients, 21 staff) could be typed, and these comprised 34 different spa types. The results of the year 2003 scan test analysis are shown in Table 3. The encoded name of the clinic/ward was derived from the location of the first MRSA isolated in a particular cluster. In total, there were 42 MRSA isolates forming 13 significant clusters and representing seven different spa types, but five clusters involved spa type t003, a common spa type in our hospital (29.6% of all MRSA in 2003). The average time-span of the clusters was 10 d (range 1–31 d) and the number of isolates in each cluster ranged from 2–11 (mean 3.2). Six clusters were located on a single ward, whereas seven other clusters were located at sites distributed throughout a clinic. In the prospective analysis of all MRSA in 2003 using the various alert procedures, there were 106 frequency alerts assignable to 31 frequency clusters (Table 4). A total of 36 clonal alerts, comprising 20 clonal clusters, were triggered by the early warning system. The ICP called 22 ICP alerts corresponding to nine ICP clusters, but in only five clusters (the two largest clusters, numbers five and seven were included) was the recognition of an existing outbreak and the need for further investigation correct. The four other clusters arose from false alerts by the ICP. In Table 4, the alerts triggered by the various methods are categorized as true or false alerts using the alerts for the 13 significant “true” clusters. The sensitivity, specificity, PPV, and NPV, and where appropriate, the 95% CIs for the various alert methods, are displayed in Table 5. Because of the high number of false-positive frequency alerts (n = 77), the specificity of the frequency and clonal methods (47.2% and 95.2%, respectively) differed considerably. The ICP alerts had the highest specificity, but the number of false-negative alerts (eight of 13 confirmed clusters were missed) led to the lowest sensitivity (62.1%). Given a pre-test probability of 24%, the PPV (same as the post-test probability) of the ICP and clonal alerts was above 80%, whereas the PPV of the frequency alert was only 27.4%. There were no significant differences in specificity and PPV between clonal and ICP alerts. Frequency alerts were significantly less sensitive (p < 0.001) and less accurate in making positive predictions than clonal and ICP alerts. Discussion In this paper we have presented a new method for prospective MRSA outbreak surveillance in a hospital that uses case and molecular typing data. Historically, MRSA outbreak detection in hospitals has relied on the watchful eyes of physicians and other health-care workers. However, the increasing availability of timely electronic surveillance and molecular typing data raises the possibility of earlier outbreak detection and intervention if suitable analytic methods are found. Germany belongs to a group of Western European countries with an intermediate level of MRSA (approximately 20% of all S. aureus diagnosed in laboratories are MRSA positive). However, the isolation rate has increased significantly in recent years [27]. Although the MRSA laboratory isolation rate in UHM of 6.4% in 2003 is still rather low in comparison with other German hospitals, the relative risk of acquiring MRSA within this hospital facility rose significantly during the study period (Table 1). Furthermore, the absolute risk will also probably rise because of epidemiological pressure and the rising prevalence of MRSA in Germany as a whole (Table 2 and Figure 1). It is clear that control of MRSA is a pressing concern where new concepts are needed, and therefore we studied spa typing in combination with an automatic early warning algorithm to detect MRSA clusters at UHM. We showed that the feasibility and speed with which it was possible to carry out spa typing was highly satisfactory. The discriminatory power, however, was lower than previously reported, probably because only a local strain collection was analyzed [12,13]. Although not examined by us, the high intra- and inter-laboratory reproducibility of 100% and the robustness of the method have recently been documented (Aires-de-Sousa et al., unpublished data). Moreover, there is a high concordance of results between spa and PFGE, microarray and MLST [11,12]. The practicability of using spa in short-term epidemiological studies has been questioned because differences in PCR amplicon sizes in related strains was thought to imply instability in the target gene [28]. In the meantime, however, there has been a plethora of publications demonstrating the value of spa in the investigation of MRSA outbreaks, e.g., [10,14]. Moreover, it has recently been shown that spa data not only contain information on short-term, but also long-term evolutionary events, as observed in whole repeat duplications and deletions [12,29]. Because of the steady fall in the cost of DNA sequencing and an average hands-on time of only 20 min per sample (determination of both strands of DNA and processing ten samples in parallel), this technique is within the capability of even small laboratories [30]. The present study has compared three early warning algorithms for the detection of nosocomial MRSA outbreaks before limited clusters of preventable MRSA transmissions develop into larger outbreaks. The evaluation of an early warning system, however, is difficult because there is no accepted “gold standard” and it is likely that no system will be completely reliable [31]. Therefore, we chose to combine epidemiological and molecular typing data with statistical analysis to provide an objective measure of performance between the varied approaches. In this approach, by definition, the sensitivity and NPV of the frequency and clonal alert methods will always be 100%. This means that the ICP method will give results that are less sensitive, or at best, of only equal sensitivity to those of the automated methods. Infection and colonization with MRSA were given equal status since both can lead to further transmissions. The typing data accumulated since 1998 enables the significance of spa-type clustering with respect to time to be calculated for all those occasions when there is a suspicion of an outbreak. By excluding all non-significant clusters, it was possible to reduce the likelihood that two or more MRSA with the same spa type, coincidentally isolated on the same or related wards within the 2-wk window, would be counted as correct. A 2-wk window is approximately 1.5 times as long as the mean duration of hospitalization in the UHM hospital. A 4-wk time window yielded similar results (unpublished data). An outbreak can be defined as (i) two or more cases of infection by a common agent that are linked epidemiologically. However, this definition has usually limited practical relevance in the identification of outbreaks, because it presupposes that detailed epidemiological and typing data are available as soon as the outbreak occurs. Thus, in more practical terms, an outbreak is often defined operationally as (ii) an increase in the number of cases above expected levels [32]. Historical data can be used to calculate a baseline and an alert is given when the number of cases exceeds a certain threshold. Early warning systems at national levels are based on this definition of an outbreak and have already implemented this approach successfully [33,34]. A similar approach has also been used in hospitals, e.g., using 2-fold standard deviation and monthly increase algorithms for detecting clusters of nosocomial infections [31]. In using the 2-fold standard deviation algorithm, the threshold for a suspected outbreak is defined as the mean of all previous cases per unit time plus two standard deviations. The monthly increase algorithm triggers an alert if there is either a 100% increase in the number of observed cases in the current month compared to the monthly totals for the two previous months, or a 50% increase over a three-month period. In the case of MRSA, however, infections/colonizations occur infrequently and irregularly. Applying the 2-fold standard deviation and monthly increase algorithms to our data (including typing data) resulted in delayed alerts for cluster detection indicating that they were insufficiently sensitive (unpublished data). In order to improve the detection of MRSA clusters and to avoid delay, we applied the first of the two definitions of an outbreak mentioned above, i.e., two or more cases of infection/colonization that are linked epidemiologically, because discriminatory typing details of cases were rapidly available. The ability of the current procedures in hospitals to prevent nosocomial infections and to recognize nosocomial outbreaks often depends on the manual review of laboratory results and surveillance by the ICP. However, this review process is resource-intensive because duplicate isolates must be eliminated, results must be correlated with patients' charts, patient locations within the hospital must be tracked, and related events must be correlated and monitored [6]. Not surprising, many minor transmissions of MRSA infections amenable to intervention go undiscovered. In this study, only five of the 13 “true” clusters were detected as clusters by visual screening of laboratory reports by the ICP (Table 3 and 4). However, the ICP alerts had the highest PPV (81.8%) because of their high specificity. On the other hand, frequency alerts with a sensitivity of 100% detected every cluster. However, the high number of false-positive alerts giving the lowest PPV (27.4%) clearly demonstrates that this method is unsuitable (also shown by the significant differences in specificity and PPV compared with clonal and ICP alerts). Clonal alerts combine the best of both methods, i.e., high specificity and high sensitivity with a PPV comparable to that of ICP (no significant difference). In comparison to ICP, only a few more false-positive alerts were triggered, and more clusters, especially the smaller ones, were detected (Table 5). The data also indicated that surveillance conducted in the laboratory has the advantage in that clusters occurring throughout the hospital can be identified at a single, central data collection point. Further advantages are its speed (<3 d after detection of MRSA) and the portable nature of data generated by spa typing permitting the differentiation between outbreaks and pseudo-outbreaks and the central coordination of a suitable response in real-time [14]. Cost-benefit analyses have demonstrated that the cost of MRSA infections far exceeds those costs involved in active surveillance and isolation procedures in a hospital [35,36]. Whether the expenditure required for spa typing is less than that for the labor-intensive manual review of patients' charts and laboratory results needs to be determined. Since the second half of 2004, when the study was finished and data analyzed, the Ridom StaphType software v. 1.5 beta that features an automated early warning based on clonal alerts came into routine use in our laboratory. A data-driven “re-admission alert” triggered by a hospital information system, which identifies the re-admission of any patient previously colonized or infected with MRSA, could enhance the accuracy of such a system [37]. All methods described above are based on underlying rules or seek predefined patterns. The advantages of these hypothesis-based methods are the high sensitivity and specificity achievable. However, a rapid method with a high discrimination involving gene typing is necessary to attain such a high specificity. Due to the predefined rules, unusual patterns of outbreaks might go undetected (e.g., retarded epidemics). A different approach is employed by data mining, i.e., knowledge discovery in databases [6]. Data-mining uses techniques based on computer science and statistics to search large event spaces (data warehouses) for interesting patterns that would otherwise have gone undetected by traditional analysis. These “discovery” models are independent of an underlying hypothesis, but are usually less sensitive and specific [38]. There are a number of limitations within the proposed method. There is no definitive proof available, with any method, to authenticate a MRSA transmission event. Furthermore, the “gold standard” used incorporates elements of the diagnostic test under study. Finally, if the epidemiological pressure of a certain clone changes rapidly, temporal clustering could fail and false-positive (e.g., as possible in the case of the “Rhine Hesse” MRSA clone) or false-negative clusters could be recorded. Examining the occurrence of MRSA clusters in the way described here not only provides a useful early warning system, but can also be used to model infection dynamics and estimate important epidemiological parameters, e.g., cross-transmission rates [39,40]. Time series and typing data was used by Grundmann et al. with scan test statistics and risk factor analysis to show that the incidence of infection can be related to staffing levels [41]. In addition, molecular typing data can also be incorporated into geographical information systems combined with space-time scan statistics analysis on a regional and national level [42,43]. In conclusion, a surveillance method based on spa typing and automated alerts is useful as an early warning system in a hospital and is at least comparable to classical epidemiological approaches. We have shown that the combined use of medical informatics and molecular laboratory techniques makes intervention possible before limited clusters of preventable MRSA transmissions expand into outbreaks. Patient Summary Background Everyone carries many types of bacteria on or in their bodies; Staphylococcus aureus is a normal bacteria for people to carry. About 25% to 30% of people have it, usually in the nose. It is usually harmless; however, this bacterium can also cause infections—especially in people who are otherwise unwell, or who have surgery. These infections need to be treated with antibiotics. Methicillin-resistant S. aureus (MRSA) is an increasing problem in much of the developed world because, unlike other types of these bacteria, MRSA cannot be killed by most of the usual antibiotics that are used, such as methicillin. Without treatment, staphylococcal infection can become very severe. Why Was This Study Done? MRSA is a particular problem in hospitals, where there is a need to be able to identify infected and colonized people quickly and isolate and treat them. These researchers wanted to test for the best way of identifying early clusters of MRSA outbreaks, which are more serious than just single cases and are an indication of hygiene deficiencies. What Did the Researchers Do and Find? Between 1998 and 2003 the researchers analysed 557 MRSA strains from staff and patients admitted to one German university hospital. They collected information about the characteristics (in space and time) of these people, and genetically identified each of the strains. They then looked for the most efficient way to identify an outbreak, including assessment of the risk by specially trained hospital staff, with and without genetic analysis. They also assessed a specially designed computer programme (developed by some of the authors), which combined the genetic type of the MRSA as well as details about the outbreak, such as the characteristics of the patients infected. They found that the most efficient and reliable method to identify outbreaks was to combine the genetic type of the MRSA with details about the outbreak, using the computer programme tested. What Do These Findings Mean? The computer programme seems to be more efficient than other methods tested here in identifying when an outbreak is likely to occur. However, this is the first test of this method, and before being adopted more widely, further testing is needed in different settings and by other researchers. Where Can I Get More Information Online? Medline Plus has many links to pages of information on different staphylococcal infections: http://www.nlm.nih.gov/medlineplus/staphylococcalinfections.html The Centers for Disease Control in the United States has a patient information sheet on MRSA: http://www.cdc.gov/ncidod/hip/aresist/ca_mrsa_public.htm The Health Protection Agency in the United Kingdom has a leaflet on MRSA aimed at patients: http://www.hpa.org.uk/infections/topics_az/staphylo/mrsa_leaflet.htm
                Bookmark

                Author and article information

                Contributors
                Role: Academic Editor
                Journal
                PLoS Med
                PLoS
                plosmed
                PLoS Medicine
                Public Library of Science (San Francisco, USA )
                1549-1277
                1549-1676
                February 2010
                February 2010
                23 February 2010
                : 7
                : 2
                : e1000238
                Affiliations
                [1 ]Division of Infectious Diseases and Health Policy Research Institute, University of California Irvine School of Medicine, Irvine, California, United States of America
                [2 ]Channing Laboratory, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
                [3 ]Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
                [4 ]Department of Clinical and Population Health Research, University of Massachusetts Medical School - Worcester, Worcester, Massachusetts, United States of America
                [5 ]Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts, United States of America
                [6 ]Department of Epidemiology, Rollins School of Public Health, Atlanta, Georgia, United States of America
                Free University of Brussels, Belgium
                Author notes

                ICMJE criteria for authorship read and met: SSH DSY JS HP MK KK TFO MSC JV JD RP. Agree with the manuscript's results and conclusions: SSH DSY JS HP MK KK TFO MSC JV JD RP. Designed the experiments/the study: SSH DSY JS MK RP. Analyzed the data: SSH DSY JS HP MK JD. Collected data/did experiments for the study: SSH DSY JS HP TFO MSC JD. Wrote the first draft of the paper: SSH. Contributed to the writing of the paper: DSY JS HP MK KK TFO MSC RP. Interpreted the results: MK. Co-developed the data collection system that made the study possible: TFO. Developed tool for data collection: JV.

                Article
                09-PLME-RA-2384R2
                10.1371/journal.pmed.1000238
                2826381
                20186274
                bcd23e53-92f3-4f0b-aec2-6a4b6f7ed8d3
                Huang et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 21 August 2009
                : 21 January 2010
                Page count
                Pages: 10
                Categories
                Research Article
                Infectious Diseases/Epidemiology and Control of Infectious Diseases
                Infectious Diseases/Nosocomial and Healthcare-Associated Infections
                Mathematics/Statistics

                Medicine
                Medicine

                Comments

                Comment on this article