40
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Detection of Temporal Clusters of Healthcare-Associated Infections or Colonizations with Pseudomonas aeruginosa in Two Hospitals: Comparison of SaTScan and WHONET Software Packages

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The identification of temporal clusters of healthcare-associated colonizations or infections is a challenge in infection control. WHONET software is available to achieve these objectives using laboratory databases of hospitals but it has never been compared with SaTScan regarding its detection performance. This study provided the opportunity to evaluate the performance of WHONET software in comparison with SaTScan software as a reference to detect clusters of Pseudomonas aeruginosa. A retrospective study was conducted in two French university hospitals. Cases of P. aeruginosa colonizations or infections occurring between 1 st January 2005 and 30 th April 2014 in the first hospital were analyzed overall and by medical ward/care unit. Poisson temporal and space-time permutation models were used. Analyses were repeated for the second hospital on data from 1 st July 2007 to 31 st December 2013 to validate WHONET software (in comparison with SaTScan) in another setting. During the study period, 3,946 isolates of P. aeruginosa were recovered from 2,996 patients in the first hospital. The incidence rate was 89.8 per 100,000 patient-days (95% CI [87.0; 92.6]). Several clusters were observed overall and at the unit level and some of these were detected whatever the method used. WHONET results were consistent with the analyses that took patient-days and temporal trends into account in both hospitals. Because it is more flexible and easier to use than SaTScan, WHONET software seems to be a useful tool for the prospective surveillance of hospital data although it does not take populations at risk into account.

          Related collections

          Most cited references3

          • Record: found
          • Abstract: found
          • Article: not found

          A Space–Time Permutation Scan Statistic for Disease Outbreak Detection

          Introduction The World Trade Center and anthrax terrorist attacks in 2001, as well as the recent West Nile virus and SARS outbreaks, have motivated many public health departments to develop early disease outbreak detection systems using non-diagnostic information, often derived from electronic data collected for other purposes (“syndromic surveillance”) [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17]. These include systems that monitor the number of emergency department visits, primary care visits, ambulance dispatches, nurse hot line calls, pharmaceutical sales, and West Nile–related dead bird reports. The establishment of such systems involves many challenges in data collection, analytical methods, signal interpretation, and response. Important analytical challenges include dealing with the unknown time, place, and size of an outbreak, detecting an outbreak as early as possible, adjusting for natural temporal and geographical variation, and dealing with the lack of suitable population-at-risk data. Most analytical methods in use for the early detection of disease outbreaks are purely temporal in nature [18,19,20,21,22]. These methods are useful for detecting outbreaks that simultaneously affect all parts of the geographical region under surveillance, but may be late at detecting outbreaks that start locally. While purely temporal methods can be used in parallel for overlapping areas of different sizes in order to cover all possible outbreaks, that approach leads to a severe problem of multiple testing, generating many more false signals than the nominal statistical significance level would indicate. First studied by Naus [23], the scan statistic is an elegant way to solve problems of multiple testing when there are closely overlapping spatial areas and/or time intervals being evaluated. Temporal, spatial, and space–time scan statistics [24,25,26,27] are now commonly used for disease cluster detection and evaluation, for a wide variety of diseases including cancer [28,29], Creutzfeldt-Jakob disease [30], granulocytic ehrlichiosis [31], sclerosis [32], and diabetes [33]. The basic idea is that there is a scanning window that moves across space and/or time. For each location and size of the window, the number of observed and expected cases is counted. Among these, the most “unusual” excess of observed cases is noted. The statistical significance of this cluster is then evaluated taking into account the multiple testing stemming from the many potential cluster locations and sizes evaluated. To date, all scan statistics require either a uniform population at risk, a control group, or other data that provide information about the geographical and temporal distribution of the underlying population at risk. Census population numbers are useful as a denominator for cancer, birth defects, and other registry data, where the expected number of cases can be accurately estimated based on the underlying population. They are less relevant for surveillance data such as emergency department visits and pharmacy sales, since the catchment area for each hospital/pharmacy is undefined. Even if it were available, the catchment area population would not be a good denominator since there can be significant natural geographical variation in health-care utilization data, due to disparities in disease prevalence, access to health care, and consumer behavior [34]. One option when evaluating data that are affected by utilization behavior is to use total volume as the denominator. For example, one may use total emergency department visits as a denominator when evaluating diarrhea visits [7], or similarly, all pharmacy sales as the denominator when evaluating diarrhea medication sales [4]. This may or may not work depending on the nature of the data. For example, changes in total drug sales due to sales promotions or the allergy season could hide a true signal or create a false signal for the drug category of interest. In this paper we present a prospective space–time permutation scan statistic that does not require population-at-risk data, and which can be used for the early detection of disease outbreaks when only the number of cases is available. The method can be used prospectively to regularly scan a geographical region for outbreaks of any location and any size. For each location and size, it looks at potential one-day as well as multi-day outbreaks, in order to quickly detect a rapidly rising outbreak and still have power to detect a slowly emerging outbreak by combining information from multiple days. The space–time permutation scan statistic was gradually developed as part of the New York City Department of Health and Mental Hygiene (DOHMH) surveillance initiatives, in parallel with the adaptation of population-at-risk-based scan statistics for dead bird reports (for West Nile virus) [13], emergency department visits [7], ambulance dispatch calls [6], pharmacy sales [4], and student absentee records [3]. In this methodological paper, the space–time permutation scan statistic is presented and illustrated using emergency department visits for diarrhea, respiratory, and fever/flu-like illnesses. Methods New York City Emergency Department Syndromic Surveillance System The New York City Emergency Department syndromic surveillance system is described in detail elsewhere [7]. In brief, participating hospitals transmit electronic files to the DOHMH seven days per week. Files contain data for all emergency department patient visits on the previous day, including the time of visit, patient age, gender, home zip code, and chief complaint—a free-text field that captures the patient's own description of their illness. As of November 2002, 38 of New York City's 66 emergency departments were participating in the system, covering an estimated 75% of emergency department visits in the city. Data are verified for completeness and accuracy, concatenated into a single dataset, and appended to a master archive using SAS [35]. To categorize visits into “syndromes” (e.g., “diarrhea syndrome”), a computer algorithm searches the free-text chief complaint field for character strings indicating symptoms consistent with that syndrome. The goal of data analysis, which is carried out seven days per week, is to detect unusual increases in key syndrome categories. To run the space–time permutation scan statistic we have written a SAS program that generates the necessary case and parameter files, invokes the SaTScan software [36], and reads the results back into SAS for reporting and display. Two sets of analyses are performed, one based on assigning each individual to the coordinates of their residential zip code and the other based on their hospital address. With 183 zip codes versus 38 hospitals, the former utilizes more detailed geographical information, while the latter may be able to pick up outbreaks not only related to place of residence but also to place of work or other outside activities (if people go to the nearest hospital when they feel sick). Residential zip code is not recorded by the hospital for about 3% of patients, and for the analyses described here, these individuals are only included in the hospital-based analyses. The performance of the prospective space–time permutation scan statistic was evaluated using both hospital and residential analyses. We used historical diarrhea data to mimic a prospective surveillance system with daily analyses from 15 November 2001 to 14 November 2002. For each of these days, the analysis only used data prior to and including the day in question, ignoring all data from subsequent days. This corresponds to the actual data available at the DOHMH 8–12 h after the end of that day, when that analysis would have been conducted if the system has been in place at that time. We also present one week of daily prospective analyses conducted in November 2003, where the daily analysis was run about 12 h after the conclusion of each day, as part of the regular syndromic surveillance activities at the DOHMH. The Space–Time Permutation Scan Statistic As with the Poisson- and Bernoulli-based prospective space–time scan statistics [27], the space–time permutation scan statistic utilizes thousands or millions of overlapping cylinders to define the scanning window, each being a possible candidate for an outbreak. The circular base represents the geographical area of the potential outbreak. A typical approach is to first iterate over a finite number geographical grid points and then gradually increase the circle radius from zero to some maximum value defined by the user, iterating over the zip codes in the order in which they enter the circle. In this way, both small and large circles are considered, all of which overlap with many other circles. The height of the cylinder represents the number of days, with the requirement that the last day is always included together with a variable number of preceding days, up to some maximum defined by the user. For example, we may consider all cylinders with a height of 1, 2, 3, 4, 5, 6, or 7 d. For each center and radius of the circular cylinder base, the method iterates over all possible temporal cylinder lengths. This means that we will evaluate cylinders that are geographically large and temporally short, forming a flat disk, those that are geographically small and temporally long, forming a pole, and every other combination in between. What is new with the space–time permutation scan statistic is the probability model. Since we do not have population-at-risk data, the expected must be calculated using only the cases. Suppose we have daily case counts for zip-code areas, where czd is the observed number of cases in zip-code area z during day d. The total number of observed cases (C) is For each zip code and day, we calculate the expected number of cases μ zd conditioning on the observed marginals: In words, this is the proportion of all cases that occurred in zip-code area z times the total number of cases during day d. The expected number of cases μ A in a particular cylinder A is the summation of these expectations over all the zip-code-days within that cylinder: The underlying assumption when calculating these expected numbers is that the probability of a case being in zip-code area z, given that it was observed on day d, is the same for all days d. Let cA be the observed number of cases in the cylinder. Conditioned on the marginals, and when there is no space–time interaction, cA is distributed according to the hypergeometric distribution with mean μ A and probability function When both Σ z εA czd and Σ d εA czd are small compared to C, cA is approximately Poisson distributed with mean μ A [37]. Based on this approximation, we use the Poisson generalized likelihood ratio (GLR) as a measure of the evidence that cylinder A contains an outbreak: In words, this is the observed divided by the expected to the power of the observed inside the cylinder, multiplied by the observed divided by the expected to the power of the observed outside the cylinder. Among the many cylinders evaluated, the one with the maximum GLR constitutes the space–time cluster of cases that is least likely to be a chance occurrence and, hence, is the primary candidate for a true outbreak. One reason for using the Poisson approximation is that it is much easier to work with this distribution than the hypergeometric when adjusting for space by day-of-week interaction (see below), as the sum of Poisson distributions is still a Poisson distribution. Since we are evaluating a huge number of outbreak locations, sizes, and time lengths, there is serious multiple testing that we need to adjust for. Since we do not have population-at-risk data, this cannot be done in any of the usual ways for scan statistics. Instead, it is done by creating a large number of random permutations of the spatial and temporal attributes of each case in the dataset. That is, we shuffle the dates/times and assign them to the original set of case locations, ensuring that both the spatial and temporal marginals are unchanged. After that, the most likely cluster is calculated for each simulated dataset in exactly the same way as for the real data. Statistical significance is evaluated using Monte Carlo hypothesis testing [38]. If, for example, the maximum GLR is calculated from 999 simulated datasets, and the maximum GLR for the real data is higher than the 50th highest, then that cluster is statistically significant at the 0.05 level. In general terms, the p-value is p = R/(S + 1) where R is the rank of the maximum GLR from the real dataset and S is the number of simulated datasets [38]. In addition to p-values, we also report null occurrence rates [8], such as once every 45 d or once every 23 mo. The null occurrence rate is the expected time between seeing an outbreak signal with an equal or higher GLR assuming that the null hypothesis is true. For daily analyses, it is defined as once every 1/p d. For example, under the null hypothesis we would at the 0.05 level on average expect one false alarm every 20 d for each syndrome under surveillance. Because of the Monte Carlo hypothesis testing, the method is computer intensive. To facilitate the use of the methods by local, state, and federal health departments, the space–time permutation scan statistic has been implemented as a feature in the free and public domain SaTScan software [36]. Implementation for New York City Syndromic Surveillance Depending on the application, the method may be used with different parameter settings. For the syndromic surveillance analyses we set the upper limit on the geographical size of the outbreak to be a circle with a 5-km radius, and the maximum temporal length to be 7 d. This means that we are evaluating outbreaks with a circle radius size anywhere between 0 km (one zip code only) and 5 km, and a time length (cylinder height) of 1 to 7 d. The latter restriction is a reflection of the belief that the main purpose of this syndromic surveillance system is early disease outbreak detection, and if the outbreak has existed for over 1 wk, it is more likely to be picked up by reporting of specific disease diagnoses by clinicians or laboratories. Another practical choice is the total number of days to include in the analysis. One option is to include all previous days for which data are available. We chose instead to analyze the last 30 d of data, adding one day and removing another for each daily analysis. We believe this time frame provides enough baseline beyond the 1- to 7-d scanning window to establish the usual pattern of visits while avoiding inclusion of data that may no longer be relevant to the current period. To reduce the computational load, we limited the centers of the circular cylinder bases to be a collection of 446 zip-code area centroids and hospital locations in New York City and adjacent areas. This ensures, among other things, that each zip-code area may constitute an outbreak on its own. The last parameter that we need to set is the number of Monte Carlo replications used for each analysis. For the daily prospective analyses we chose 999, which meant that the smallest p-value we could get was 0.001, so that the smallest null occurrence rate possible for a signal was once every 2.7 y. In our system, signals of that strength clearly merit investigation. For the historical evaluation, in order to obtain more precise null occurrence rates, we set the number of replications to 9,999. Adjusting for Space by Day-of-Week Interaction The space–time permutation scan statistic automatically adjusts for any purely spatial and purely temporal variation. For many syndromic surveillance data sources, there is also natural space by day-of-week interaction in the data that is not due to a disease outbreak but to consumer behavior, store hours, etc. For example, if a particular pharmacy has an exceptionally high number of sales on Sundays because neighboring pharmacies are closed, we might get a signal for this pharmacy every Sunday unless we adjust for this space by day-of-week interaction. This can be done through a stratified random permutation procedure. The first step is to stratify the data by day of week: Monday, Tuesday,…, Sunday. The space–time permutation randomization step is then done separately for each day of the week. For each zip code and day combination, the expected is then calculated using only data from that day of the week. For each cylinder, both the observed and expected number of cases is summed over all day-of-week strata, zip code, and day combinations within that cylinder. The same technique can be used to adjust for other types of space–time interaction. The underlying assumption when calculating these expected numbers is now that the probability of a case being in zip-code area z, given that it was observed on a Monday, is the same for all Mondays, etc. All our analyses were adjusted for space by day-of-week interaction. Missing Data Daily disease surveillance systems require rapid transmission of data, and it may not be possible to get complete data from each provider every single day. When we first tried the new method in New York City, a number of highly significant outbreak signals were generated that were artifacts of previously unrecognized missing or incomplete data from one or more hospitals. This is a good reflection on the method, since it should be able to detect abnormalities in the data no matter what their cause, but it also illustrates the importance of accounting for missing data in order to create an early detection system that is useful on a practical level. Depending on the exact nature of the missing data, there are different ways to handle it. We used a combination of three different approaches. (1) If a hospital had missing data for all of the past 7 d (all possible days within the cylinder), we completely removed that hospital from the analysis, including all previous 23 d. (2) If a hospital had no missing data during the last 7 d, but one or more missing days during the previous 23 baseline days, then we completely removed the baseline days with some missing data, for all of the hospitals. (3) If a hospital had missing data for at least one but not all of the last 7 d, then we removed those missing days together with all previous days for the same hospital and the same day of week. That is, if Monday was missing during the last week, then we removed all Mondays for that hospital. This removal introduces artificial space by day-of-week interaction, so this approach only works if it is implemented in conjunction with the stratified adjustment for space by day-of-week interaction. For some analyses, more than one of these approaches were used simultaneously. Note that, since the missing data depend on the hospital, the solution is to remove specific hospitals and days rather than zip codes and days, even when we are doing the zip-code-based residential analyses. If there are many hospitals with missing data, then the second approach could potentially remove all or almost all of the baseline days. To avoid this, one could sometimes go further back in time and add the same number of earlier days to compensate. Another option is to impute into the cells with missing data a Poisson random number of cases generated under the null hypothesis. Given the completeness of our data, neither of these methods were employed (94% of analyses were conducted with four or fewer baseline days removed). Results Evaluation Using Historical Data: Diarrhea Surveillance We first tested the new method by mimicking daily prospective analyses of hospital emergency department data from 15 Nov 2001 to 14 Nov 2002, looking at diarrhea visits. Signals with p ≤ 0.0027 are listed in Table 1 and depicted on the map in Figure 1. That is, we only list those signals with a null occurrence rate of once every year or less often. For the residential zip-code analyses, there were two such signals. For the hospital analyses, there were six, two of which occurred in the same place on consecutive days. It is worth noting that at the false alarm rate chosen, none of the residential signals correspond to any of the hospital signals. For the residential analysis, the strongest signal was on 9 February 2002, covering 17 zip-code areas in southern Bronx and northern Manhattan. This signal had 63 cases observed over 2 d when 34.7 were expected (relative risk = 1.82). With a null occurrence rate of once every 5.5 y, a spike in cases of this magnitude is unlikely to be due to random variation. The signal immediately preceded a sharp increase in citywide diarrheal visits from 10 February to 20 March (Figure 2). In both the localized 9 February cluster and the citywide outbreak, the increase was most notable among children less than 5 y of age. The weaker 26 February hospital signal and the 7 March residential signal that were centered in northern Manhattan occurred at the peak of this citywide outbreak. Laboratory investigation of the citywide increase in diarrheal activity indicated the rotavirus as the most likely causative agent. The two hospital signals on 1 November and 2 November 2002, were at the same three hospitals in southern Bronx and northern Manhattan, with null occurrence rates of 1.6 and 3.4 y, respectively. These signals immediately preceded another sharp increase in citywide diarrheal activity, this time among individuals of all ages (Figure 2). This citywide outbreak lasted approximately 6 wk and coincided with a number of institutional outbreaks in nursing homes and on cruise ships. Laboratory investigation of several of these outbreaks revealed the norovirus as the most likely causative agent. A similar citywide outbreak of norovirus in 2001 began shortly before the 21 November 2001 hospital signal in northern Bronx, which had a null occurrence rate of once every 3.4 y. For the hospital analyses, the strongest signal was a 1-d cluster at a single hospital in Queens on 11 January 2002, with ten diarrhea cases when only 2.3 were expected, which one would only expect to happen once every 3.9 y. Being very local in both time and space, it is different from the previously described signals preceding citywide outbreaks. While examination of individual-level data revealed a predominance of infants under the age of two, this cluster could not be associated with any known outbreak, and retrospective investigation was not feasible. As shown in Table 1, at the p = 0.0027 threshold there were six and two signals for the hospital and residential analyses, respectively, compared to one expected in each. Figure 3 shows the number of days on which the p-value of the most likely cluster was within a given range. Had the null hypothesis been true on all 365 d analyzed, the p-values would have been uniformly distributed between zero and one. The fact that in our data there were more days with low rather than high p-values is an indication that there may be additional true “outbreaks” that are indistinguishable from random noise. These could be very small disease outbreaks, for example, due to spoiled food eaten by only a few people, or they could be artifacts caused by, for example, changes in the hours of operation at an emergency department or coding differences between the emergency department triage nurses. Daily Prospective Surveillance Since 1 November 2003, the space–time permutation scan statistic has been used daily in parallel with the population-at-risk-based space–time scan statistics [7] as part of the DOHMH Emergency Department surveillance system. For respiratory symptoms, fever/flu, and diarrhea, the results for the last week of November are listed in Tables 2 and 3. For diarrhea or respiratory symptoms there were no strong signals warranting an epidemiological investigation, and all had null occurrence rates of more often than once every month. This reflects a very typical week. For fever/flu there was a strong 7-d hospital signal in southern Bronx and northern Manhattan on 28 November with a null occurrence rate of once every 2.7 y. On each of the following 2 d, there were again strong hospital signals in the same general area as well as residential zip-code signals of lesser magnitude. These signals started 12 d into a gradual citywide increase in fever/flu that continued to grow through the end of December, driven by an unusually early influenza season in New York City. Discussion In this paper we have presented a new method for prospective infectious disease outbreak surveillance that uses only case data, handles missing data, and makes minimal assumptions about the spatiotemporal characteristics of an outbreak. When using historical emergency department chief complaint data to mimic a prospective surveillance system with daily analyses, we detected four highly unusual clusters of diarrhea cases, three of which heralded citywide gastrointestinal outbreaks due to rotavirus and norovirus. Three of four weaker signals also occurred immediately preceding or concurrent with these citywide outbreaks. If we assume that all of these clusters were associated with the citywide disease outbreaks, then the method generated at most two false alarms at a signal threshold where we would have expected one by chance alone. For disease outbreak detection, the public-health community has historically relied on the watchful eyes of physicians and other health-care workers. However, the increasing availability of timely electronic surveillance data, both reportable diagnoses and pre-diagnostic syndromic indicators, raises the possibility of earlier outbreak detection and intervention if suitable analytic methods are found. While it is still unclear whether systematic health surveillance using syndromic or reportable disease data will be able to quickly detect a bioterrorism attack [39,40], the methods described here can also be applied to early detection of outbreaks of other, more common infectious diseases. There are other alternative ways to calculate expected counts from a series of case data. One naive approach is to use the observed count 7 d ago in a zip-code area as the expected count for that same area today, and then apply the regular Poisson-based space–time scan statistic. When applied to the New York City diarrhea data described above, such an approach generated at least one “statistically significant” outbreak signal on each of the 365 d evaluated. The basic problem with this is that there is random variation in the observed counts that are used to calculate the expected, which is not accounted for in the Poisson-based scan statistic. If we based the expected on the average of multiple prior weeks of data, we would get less variability in the expected counts and fewer false signals, but the problem would still persist, and as the number of weeks increase beyond a few months other problems may gradually arise due to, for example, seasonal trends or population size changes. Computing time depends on the size of the dataset and the analysis parameters chosen. With 999 replications, the hospital analyses with 38 data locations take 7 s to run on a 2.5-MHz Pentium 4 computer, while the residential analyses using 183 zip-code area locations take 11 s. The same numbers for 9,999 replications are 27 and 57 s, respectively. There are a number of limitations with the proposed method. The method is highly sensitive to missing or incomplete data. Our first implementation of the method resulted in a number of false alarms, and highlights the need for systematic data quality checks and the analytic adjustments described above. When excellent population-at-risk data are available, we expect the Poisson-based space–time scan statistic that utilizes this extra information to perform better than the space–time permutation scan statistic. If, however, the population-at-risk data are of poor quality or nonexistent, which is often the case, then the space–time permutation scan statistic should be used. Since the space–time permutation scan statistic adjusts for purely temporal clusters, it can only detect citywide outbreaks if they start locally, but not if they occur more or less simultaneously in the whole city. Hence, it does not replace purely temporal surveillance methods, but rather complements them. Finally, it is important to note that the geographical boundary of the detected outbreak is not necessarily the same as the boundary of the true outbreak. Since we used circles as the base for the scanning cylinder, all detected outbreaks are approximately circular. Other shapes of the scanning window are also available [36], but it has been shown that circular scan statistics are also able to detect noncircular outbreak areas [41]. The less geographically compact the outbreak is, though, the less power (sensitivity) there is to detect it. For example, using circles we cannot expect to pick up an outbreak that is very long and narrow such as a one-block area on each side of Broadway, stretching from southern to northern Manhattan. The emergency department data used in this study also have some limitations. In addition to the citywide outbreaks, there were several institutional gastrointestinal outbreaks reported to DOHMH during the historical 1-y period but not detected in emergency department data using the space–time permutation scan statistic. One reported outbreak involved school children that went to the emergency department of a nonparticipating hospital. Other outbreaks went undetected because medical care was not sought in emergency departments. Most people with diarrhea do not go to the hospital emergency department. Rather, they call or go to their primary care physician, they visit the pharmacy to buy over-the-counter medication, or they may have symptoms that are so mild that they do not seek medical care. Further studies are needed to evaluate the strengths and weaknesses of different data sources. The geographic units of analysis used were residential zip code and hospital location. It may be hard to detect outbreaks that affect only a small part of a single zip code, especially if the background rate of the syndrome is fairly high. Where available, the exact coordinates of a patient's residence can be used to avoid problems introduced when aggregating data. Furthermore, some outbreaks may not be clustered by place of residence, as in the case of an exposure occurring at the place of work or in a subway. Using the location of the hospital rather than residence may provide higher power to detect workplace-related outbreaks, but the only way to fully address this issue may be to conduct workplace surveillance. In spite of these limitations, we have presented a new method for the early detection of disease outbreaks and illustrated its practical use. The primary advantages of the method are that it is easy to use, it only requires case data, it automatically adjusts for naturally occurring purely spatial and purely temporal variation, it allows adjustment for space by day-of-week interaction, and it is capable of handling missing data. While the method was developed and applied in the context of syndromic surveillance, it may also be used for the early detection of diagnosed disease outbreaks, or for detecting changes in the pattern of chronic diseases, when population census information is unavailable, unreliable, or not available at the fine geographical resolution needed. The ability to perform disease surveillance without population-at-risk data is especially important in developing countries, where these data may be hard to obtain. The space–time permutation scan statistic could also be used for similar early detection problems in other fields, such as criminology, ecology, engineering, social sciences, and veterinary sciences. Patient Summary Background Detecting disease outbreaks early means that health officials are better able to fight and contain them. Electronic patient records that can be analyzed with statistical methods in computer programs should help with disease surveillance and make it possible to detect outbreaks early without raising too many false alarms. Why Was This Study Done? The researchers who did this study have developed and operated real-time disease surveillance systems. In any such system, there will always be more disease cases in some places and time periods than in others, for example, because there are more people living there, or because there are more people of a certain type living there, like older people or children, who are more prone to get sick. The researchers were trying to develop a method that can discover outbreaks without the need to know about the structure of the population under surveillance. What Did the Researchers Do? They modified an existing method to make it work without data on the structure of the population under surveillance. They also found a way to deal with incomplete data, when, for example, one hospital did not report any data for a particular day. What Did They Find? When they applied the method to emergency room data from New York City, they found that it performs well: it seems to be able to detect real outbreaks early and not result in many false alarms. What Are the Limitations of the Method? The method can detect only outbreaks that start locally, not those that occur more or less simultaneously in the whole surveillance area. For some outbreaks—for example, those caused by exposure to an infectious agent in the subway—patients will not necessarily live in the same neighborhood or go to the same emergency room. The method will not detect outbreaks with very few cases, such as one case of small pox or three cases of anthrax, such as the anthrax bioterrorism attacks in the fall of 2001. And the method only works for diseases with early symptoms severe enough that people go to the emergency room. Efficient disease surveillance will need the parallel use of different methods, each with their own strengths and weaknesses. What Next? The method was developed as part of the New York City Department of Health and Mental Hygiene surveillance initiatives and is now being used every day to analyze emergency department records from 38 hospitals in the city. To facilitate wider use, the method has been integrated into a more diverse software called SaTScan that is freely available. Where Can I Find Out More? The following websites provide additional information on this and other methods. Details on SaTScan and software for downloading: http://www.satscan.org/ United States Centers of Disease Control and Prevention Web page on electronic disease surveillance: http://www.cdc.gov/od/hissb/act_int.htm National Syndromic Surveillance Conference: http://www.syndromic.org/index.html National Bioterrorism Syndromic Surveillance Demonstration Program: http://btsurveillance.org/ The Real-Time Outbreak and Disease Surveillance Open Source Project: http://openrods.sourceforge.net/
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A model-adjusted space-time scan statistic with an application to syndromic surveillance.

            The space-time scan statistic is often used to identify incident disease clusters. We introduce a method to adjust for naturally occurring temporal trends or geographical patterns in illness. The space-time scan statistic was applied to reports of lower respiratory complaints in a large group practice. We compared its performance with unadjusted populations from: (1) the census, (2) group-practice membership counts, and on adjustments incorporating (3) day of week, month, and holidays; and (4) additionally, local history of illness. Using a nominal false detection rate of 5%, incident clusters during 1 year were identified on 26, 22, 4 and 2% of days for the four populations respectively. We show that it is important to account for naturally occurring temporal and geographic trends when using the space-time scan statistic for surveillance. The large number of days with clusters renders the census and membership approaches impractical for public health surveillance. The proposed adjustment allows practical surveillance.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Automated Detection of Infectious Disease Outbreaks in Hospitals: A Retrospective Cohort Study

              Introduction Although hospital-associated outbreaks of infection account for a small proportion of health care–associated infections [1]–[4], the fact that they typically result from transmission within health care facilities means that timely identification is essential for investigation and effective response. Current detection methods rely heavily on temporal or spatial clustering of specific pathogens. Such monitoring usually involves case counting and subjective judgment to adjudicate whether a cluster is occurring. For multidrug-resistant organisms (MDROs), such as methicillin-resistant Staphylococcus aureus (MRSA), rule-based criteria (e.g., three cases within 2 wk in the same ward) are often used to define a cluster. For example, Mellmann et al. used a definition of two cases in 2 wk with identical spa types to define a MRSA outbreak [5]. Ad hoc and rule-based criteria are subject to error—both in defining random variation as a cluster and in failing to identify clusters owing to hospital transmission that do not meet specified rules. Reliance on the human eye to filter daily microbiology data and detect clusters among hundreds of pathogens can lead to a high failure rate. In addition, reliance on subjective judgment by infection control professionals for cluster detection can lead to interhospital variability and incorrect identification. Because clusters (perceived or real) engender intensive investigation and possible intervention, identification of false clusters can waste valuable resources and dilute attention to real problems. Microbiology-based cluster detection systems should use automated statistical methods to optimize cluster identification, lessen surveillance burden, and expand cluster detection to all pathogens across all hospital locations and services. It should automatically assess whether pathogens in a cluster had similar antimicrobial susceptibility patterns that would suggest clonality and a common source. Requirements for a useful system include (1) automatic and timely generation of alerts of clusters, (2) sufficient sensitivity to detect clinically significant clusters identified through routine surveillance methods, and (3) sufficient positive predictive value to avoid an excessive number of false alerts that could generate unnecessary investigation and intervention. Methods Ethics Statement This study was approved by the Brigham & Women's Hospital (BWH) Institutional Review Board. Study Population and Datasets BWH is a 750-bed academic medical center. It provides neonatal and adult medical care with intensive care and oncology patient populations. Its electronic data repository contains finalized microbiology data from 1987 to present. The microbiology data repository includes patient identifiers, ward, and clinical service at the time of specimen collection, collection date and specimen source, and hospital admission date. Antimicrobial susceptibility testing is based on Clinical and Laboratory Standards Institute (CLSI) standards [6]. The entire microbiology data repository was used to identify the first positive result per patient for a specific bacterial or fungal species since 1987. The dataset was further restricted to isolates representing hospital-associated acquisition (All Organism Nosocomial Dataset) by limiting isolates to those obtained >2 d after hospital admission. In addition, a second dataset was created that limited pathogens to organism species associated with hospital transmission on the basis of published literature (Priority Pathogen Nosocomial Dataset) (Table 1). Because of national surveillance related to multidrug-resistant bacteria, we additionally assessed MRSA and VRE. 10.1371/journal.pmed.1000238.t001 Table 1 Priority pathogens previously described in hospital-associated clusters. Pathogen Acinetobacter sp. Alcaligenes sp. Aspergillus sp. Bacteroides sp. Burkholderia sp. Candida sp. Chromobacterium sp. Chryseobacterium sp. Citrobacter sp.. Enterobacter sp. Enterococcus sp. Each species regardless of resistance profile VRE Escherichia sp. Fusarium sp. Group A Streptococcus Haemophilus sp. Klebsiella sp. Legionella sp. Malassezia sp.. Mycobacterium sp. Oligella sp. Pantoea sp. Proteus sp. Pseudomonas sp. Rhizopus sp. Salmonella sp. Serratia sp. S. aureus All isolates regardless of resistance profile MRSA Stenotrophomonas sp. Torulopsis sp. All species individually assessed within genus. Automated Cluster Detection Tool We integrated two freely available software packages used for public health epidemiology. WHONET/BacLink software is available from the World Health Organization (WHO) Collaborating Centre for Surveillance of Antimicrobial Resistance for management and descriptive analysis of microbiology data [7]. BacLink is a data-conversion utility that standardizes data from existing microbiology systems into WHONET formats. WHONET/BacLink is used by >1,000 laboratories world-wide. SaTScan was originally developed for geographical disease surveillance to assess the statistical significance of community cancer clusters [8]–[10]. The software was subsequently enhanced and applied to early detection of infectious disease outbreaks [11],[12]. We integrated the space-time permutation scan statistic in SaTScan into the WHONET analysis module to create the WHONET-SaTScan cluster detection tool, which is now freely available as part of WHONET/BacLink as of June 2009 [7]. For hospital surveillance, “spatial” locations consisted of individual wards and services (e.g., medicine, oncology). In addition, we evaluated groups of wards or services sharing in patient care (e.g., cardiology and cardiac surgery services), regardless of physical proximity. Antimicrobial resistance profile was also used as a spatial location to detect clusters of specific pathogens that had identical patterns of nonsusceptibility to routinely tested antibiotics. Using only case data, the space-time permutation scan statistic looks for space-time interaction clusters, adjusting for purely temporal and purely spatial variation [12]. The space-time cluster with the maximum likelihood is the cluster least likely due to chance. For each pathogen in the Priority Pathogen Nosocomial Dataset, a separate set of analyses were done for wards, services, and antimicrobial resistance pattern. It is important to note that this method will be subject to human-influenced variation, such that if one ward expanded in volume because of increasing bed size, then this increase may trigger a cluster alert in the ward-based analysis. Surveillance for hospital-wide clusters was performed by replacing “space” in the space-time permutation scan statistic with “pathogen.” This assessment was applied to the All Organism Nosocomial Dataset, to detect clusters that were not explained by a general simultaneous increase in all pathogens, as might occur with new diagnostics that enhance overall pathogen detection by culture systems or increased culturing because of changes in physician practice. Similarly, the WHONET-SaTScan tool adjusts for (i.e., would not detect) weekly or seasonal increases that occurred simultaneously across all “spatial” locations, such as all wards in the ward-based analyses, or all nosocomial pathogens in the hospital-wide analyses. However, nosocomial increases in specific wards would be detected in the ward-based analyses and increases in specific pathogens would be detected in the hospital-wide analyses. Each pathogen-specific set of analyses was performed “daily” from 2002 to 2006, mimicking real-time prospective surveillance among all patients admitted to BWH during this time period. Within each set, the method adjusts for multiple testing inherent in the many combinations of wards, services, pathogens, and resistance patterns considered, and for the large number of days evaluated. Selecting WHONET-SaTScan Parameters Datasets from 2001 were used to select software parameters. The maximum number of days over which isolates could contribute to the initial determination that a cluster had occurred was set to 60 d. This parameter setting was based principally upon biologic plausibility of ongoing transmission due to a common source, as well as the practical ability to respond and intervene. For example, if a cluster alert is signaled in December based upon two cultures—one in the preceding January and one in December—one might conclude that notification was unhelpful since the prolonged time lapse since the January event makes it unlikely that current investigation or intervention would be meaningful. A maximum span of 60 d was chosen after the 2001 assessment of 30, 60, and 90 d revealed increased cluster detection with 60 d, but minimal improvement with 90 d. Because an ongoing cluster can span many months, we did not restrict the time that a cluster could persist (continue to generate alerts). If new cases continued to occur, they would generate alerts as long as the statistical threshold was met. For presentation purposes, alerts from the same cluster were combined into a summary report that included the number of observed versus expected cases across the duration of the cluster, the time from the first culture of the cluster until the first alert, and the total duration of the cluster. Thus, a cluster could be represented by a single alert or a set of overlapping alerts that would signal a potential outbreak. WHONET-SaTScan scanned for clusters on a daily basis by comparing the number of cases in a specific time window to the expected number based on the 365 d prior to the day of analysis. We selected statistical thresholds for detecting clusters on the basis of recurrence intervals [13]. The recurrence interval is the expected frequency of falsely identifying a cluster by chance alone. A recurrence interval of 100 d means that a cluster as unusual as the identified cluster would occur by chance approximately once every 100 d. We evaluated recurrence intervals of 200, 365, and 1,000 d using the 2001 test dataset and compared the results to 2001 clusters previously identified by routine infection control surveillance and confirmed by genetic typing of isolates. A recurrence interval threshold of ≥365 d was selected, because recurrence intervals 10 9 (15.3) a Two clusters were identified by two different types of alerts. 10.1371/journal.pmed.1000238.t003 Table 3 Potential hospital-associated clusters detected using WHONET-SaTScan automated system, 2002–2006. Organism Signal Type Observed Cases Expected Cases Days to First Signala Span of Signalsb Cluster Year Recurrence Intervalc Previously Identified by Infection Control Gram-positive bacteria E. faecalis Antibiotic profile 4 0.6 18 25 2004 667 N E. faecalis Service 4 0.6 10 17 2005 1,429 N E. faecium Antibiotic profile 3 0.3 1 20 2006 1,429 N E. faecium (VRE) Antibiotic profile 5 1.0 13 57 2002 625 N E. faecium (VRE) Antibiotic profile 6 1.3 31 29 2002 769 N E. faecium (VRE) Antibiotic profile 4 0.6 42 18 2003 1,429 N E. faecium (VRE) Antibiotic profile 2 0.14 29 17 2004 500 N Propionibacterium acnes Hospital-wide 10 2.7 11 7 2006 1,429 N S. aureus Antibiotic profile 2 0.0 0 5 2002 2,000 N S. aureus Ward 3 0.1 0 2 2003 833 N S. aureus Ward 3 0.1 1 1 2003 833 N S. aureus Ward 7 1.1 6 16 2004 667 N S. aureus Antibiotic profile 4 0.3 2 4 2006 385 N S. aureus (MRSA) Antibiotic profile 14 2.8 1 67 2002 10,000 N S. aureus (MRSA) Ward 3 0.1 0 6 2005 5,000 N S. aureus (MRSA) Ward 8 1.4 6 54 2004 10,000 Y S. aureus (MRSA)d Ward 6 0.91 33 15 2005 833 N S. aureus (MRSA)d Service 4 0.44 8 5 2005 625 N S. aureus (MRSA) Antibiotic profile 2 0.04 6 4 2005 667 N S. aureus (MRSA) Service 6 1.05 8 9 2006 2,500 N S. aureus (MRSA) Antibiotic profile 2 0.09 4 3 2006 435 N Streptococcus, Group A Hospital-wide 3 0.2 0 15 2005 3,333 N Gram-negative bacteria A. baumannii Multi Service 4 0.8 2 24 2002 5,000 N A. baumannii Hospital-wide 5 0.5 1 6 2002 588 N A. baumannii e Antibiotic profile 15 7.5 18 52 2004 10,000 Y A. baumannii e Hospital-wide 20 8.3 3 57 2004 625 Y A. baumannii Ward 4 0.6 3 9 2006 2,000 N Bacteroides fragilis Service 2 0.2 4 1 2006 500 N B. cepacia Hospital-wide 15 3.8 6 60 2005 10,000 Y C. freundii Antibiotic profile 2 0.1 4 27 2006 10,000 N E. aerogenes Antibiotic profile 3 1.8 2 26 2006 909 N E. cloacae Antibiotic profile 3 0.0 1 28 2002 10,000 N E. cloacae Hospital-wide 11 2.7 2 6 2002 1,250 N E. cloacae Antibiotic profile 4 0.5 4 2 2005 476 N E. cloacae Service 11 3.6 14 46 2005 370 N E. cloacae Antibiotic profile 4 0.3 6 33 2006 769 N E. cloacae Multiward 5 0.8 20 36 2006 2500 N E. cloacae Antibiotic profile 27 4.3 42 163 2006 10,000 N E. coli Antibiotic profile 4 0.5 3 34 2002 476 N E. coli Antibiotic profile 6 1.1 6 9 2005 2,500 N H. influenzae Hospital-wide 13 4.2 18 14 2004 455 N H. influenzae Antibiotic profile 6 1.0 8 52 2006 5,000 N K. oxytoca Antibiotic profile 2 0.2 24 12 2004 1111 N K. oxytoca Antibiotic profile 2 0.2 0 30 2006 10,000 N K. pneumoniae Ward 3 0.2 3 16 2003 909 N P. (Entero.) agglomerans Hospital-wide 4 0.2 4 2 2002 400 N P. aeruginosa Multi Service 5 0.6 4 7 2002 833 N P. aeruginosa Antibiotic profile 3 0.2 2 7 2004 476 N P. aeruginosa Ward 2 0.0 1 3 2005 476 N S. marcescens Antibiotic profile 3 0.4 34 10 2002 435 N S. marcescens Multi Service 4 0.5 12 4 2003 556 N S. marcescens Hospital-wide 10 2.8 10 3 2004 2,500 N S. marcescens Antibiotic profile 11 1.4 21 118 2006 10,000 N S. maltophilia Ward 3 0.3 6 9 2006 2,000 N Fungi A. fumigatus Hospital-wide 7 1.4 20 57 2004 417 N C. albicans Ward 7 1.1 12 9 2003 667 N C. albicans Ward 2 0.0 0 2 2005 588 N C. albicans Multiward 14 2.6 51 36 2005 10,000 N C. krusei Ward 2 0.3 7 11 2002 10,000 N C. lusitaniae Hospital-wide 2 0.0 0 1 2002 370 N T. (Candida) glabrata Ward 4 0.4 24 1 2003 1,250 N a Number of days from the first culture associated with the cluster and the date of the first alert. b Number of days between the first and the last alert for a cluster. c Reflects the frequency (d) in which such as cluster is expected to occur by chance alone. Only clusters meeting a threshold recurrence interval of ≥365 d are provided. d–e Indicates same cluster identified by more than one signal type. N, no; Y, yes. Half of the detected clusters were gram-negative organisms not routinely tracked by Infection Control. In addition, 71% of clusters were identified by spatial characteristics other than traditional ward-based location, including groups of wards and services that shared patients and antimicrobial susceptibility patterns. The most common alerts (41%) were triggered by antibiotic resistance profiles. VRE clusters (n = 4) comprised 57% of enterococcal clusters and none were identified by ward-level spatial analyses (all were geographically dispersed, but shared antibiotic susceptibility profile). MRSA clusters comprised 58% of S. aureus alerts, and only three of seven clusters were based upon ward analyses (Table 3). Comparison with Clusters Previously Detected by Routine Infection Control Methods Clusters identified using WHONET-SaTScan were compared to clusters previously identified through routine infection control surveillance. Other than pathogens identified by rule-based criteria that were evaluated separately (see below), all clusters previously identified and confirmed by the BWH Infection Control Department were also identified by WHONET-SaTScan. During the study period, the BWH Infection Control department identified two major clusters involving multidrug-resistant Acinetobacter (2004) and Burkholderia cepacia (2005), both of which were confirmed as clonal by pulse-field gel electrophoresis (PFGE). Both clusters were identified by WHONET-SaTScan within 3 and 6 d, respectively, of the initial isolate collection date. The clonal cluster of multidrug-resistant Acinetobacter baumanii involved patients in several intensive care units. WHONET-SaTScan identified this cluster through hospital-wide clustering of A. baumanii isolates (Figure 1A) as well as through clustering of a specific antimicrobial susceptibility pattern (Figure 1B). 10.1371/journal.pmed.1000238.g001 Figure 1 Display of monthly nosocomial A. baumanii isolates. (A) Hospital-wide. (B) Restricted to isolates with an identical antibiotic susceptibility profile. Shaded area in gray indicates time period of cluster detection by WHONET-SaTScan. In contrast, the BWH Infection Control department only identified three of the 59 clusters deemed to be statistically significant events on the basis of WHONET-SaTScan (Table 3, last column). Two coincided with the two clusters described above, and one involved MRSA. Comparison with Clusters Based on Numerical Thresholds We compared the results of the WHONET-SaTScan statistical clusters to the rule-based criteria (i.e., ≥3 new nosocomial cases on a single ward within 2 wk) that were used by the Infection Control Department for MRSA and VRE during the study period. Many more MRSA alerts were triggered by the rule-based criteria (n = 73) versus WHONET-SaTScan statistical thresholds (n = 7), and only one of them was in common. Of interest, the one in common was a fairly large cluster of eight nosocomial isolates in an intensive care unit. No isolates were sent for genetic typing. Over half of the WHONET-SaTScan alerts were triggered by spatial analyses other than a single ward. Four alerts had a recurrence interval >1,000, and two reached the highest possible recurrence interval allowed by our parameter settings (10,000). Similarly, many more VRE alerts were triggered by rule-based criteria (n = 87) versus WHONET-SaTScan statistical thresholds (n = 4). None of the alerts overlapped when methods were compared. Details of MRSA and VRE clusters detected by both methods are provided in Table 4. No additional overlap in MRSA or VRE clusters was identified when the recurrence interval was lowered to 200. 10.1371/journal.pmed.1000238.t004 Table 4 Characteristics of MRSA and VRE clusters detected by routine infection control surveillance compared to WHONET-SaTScan. Cluster Time Period Infection Control Detection WHONET-SaTScan Detection Dual Detection n Clusters Cases (Mean) Duration (Mean Days) Cluster Typea n Clusters Cases (Mean) Duration (Mean Days) Cluster Type n Clusters MRSA 2002 14 10.8 96.5 Ward 1 14 67.0 Antibiotic profile 0 2003 11 11.1 100.3 Ward 0 — — — 0 2004 18 6.9 65.3 Ward 1 8 54.0 Ward 1 2005 18 5.9 52.4 Ward 3 3.7 8.3 Ward, ward/service, antibiotic profile 0 2006 12 4.9 48.0 Ward 2 4 6.0 Service, antibiotic profile 0 5-y total 73 — — — 7 — — — 1 Annual mean 14.6 7.9 72.5 — 1.4 5.9 27.1 — 0.2 Annual median 14 6.9 65.3 — 1.0 4.0 8.3 — 0 VRE 2002 15 7.6 71.2 Ward 2 5.5 43.0 Antibiotic profile 0 2003 12 6.4 62.8 Ward 1 4.0 18.0 Antibiotic profile 0 2004 20 8.2 74.1 Ward 1 2.0 17.0 Antibiotic profile 0 2005 18 7.2 69.1 Ward 0 — — — 0 2006 22 6.0 58.3 Ward 0 — — — 0 5-y total 87 — — — 4 — — — 0 Annual mean 17.4 7.1 67.1 — 0.8 2.3 15.6 — 0 Annual median 18 7.2 69.1 — 1 2 17 — 0 a Infection Control identification of clusters was limited to wards only. Only two rule-based clusters were deemed sufficiently large or persistent by the BWH Infection Control Department to warrant sending isolates for typing. Both involved MRSA. One occurred in the 2001 dataset that was used for parameterization (thus, not provided in Table 5). This cluster was rapidly detected by WHONET-SaTScan. The other cluster was an intensive care unit cluster in 2004 that was not detected by WHONET-SaTScan. This cluster involved nine nosocomial cases, but genetic typing revealed six different strain types, and the Infection Control Department ultimately ruled that this was not an outbreak. 10.1371/journal.pmed.1000238.t005 Table 5 Correlation of two hospital epidemiologists independently assessing WHONET-SaTScan clusters. Ignore Watch Investigate Actively Intervene Total Ignore 25 11 1 0 37 (63%) Watch 2 5 1 2 10 (17%) Investigate 0 0 0 0 0 (0%) Actively Intervene 1 0 1 10 12 (20%) Total 28 (47%) 16 (27%) 3 (5%) 12 (20%) 59 (100%) Assessing Utility and Response to Daily Alerts The hospital epidemiologists classified 95% of the 59 cluster alerts as useful information. Sixteen (27%) of the clusters were classified as warranting either investigation or active intervention by at least one epidemiologist and 11(19%) by both (kappa = 0.76, confidence interval 0.5–0.8). The remaining 43 (73%) clusters were classified as warranting either no action or watchful waiting by both epidemiologists (Table 5). There were four clusters where the two epidemiologists disagreed about initiating active intervention. The reason for the discrepancies were due to a low number of events leading one epidemiologist to await further cases before acting while the other initiated intervention because of the significance of the pathogens (aspergillus, pseudomonas) or the source of the isolates (bacteremias). Certain cluster characteristics were associated with the likelihood of initiating active intervention (Figure 2). 10.1371/journal.pmed.1000238.g002 Figure 2 Graph showing survey-based Infection Control response by type of WHONET-SaTScan cluster. Significant differences among organism type and cluster size were noted when assessing the likelihood of triggering an intervention (Fisher exact tests). A trend toward a significant difference was found among cluster types. Among organism type, the likelihood of a cluster triggering an intervention was: gram-positive (43%), gram-negative (13%), fungal (14%). Among cluster size, the likelihood of a cluster triggering an intervention was: 2–5 (13%), 6–10 (45%), 10+ (44%). Among recurrence interval, the likelihood of a cluster triggering an intervention was: 365–999 (20%), 1,000–5,000 (20%), >5,000 (36%). Among cluster type, the likelihood of a cluster was: hospital (27%), antibiotic profile (12%), ward (38%), and service (50%). Discussion The automated WHONET-SaTScan cluster detection tool rapidly detected epidemiologically confirmed hospital outbreaks in a large academic medical center and demonstrated that the common use of rule-based criteria (i.e., ≥3 new nosocomial cases on a single ward within 2 wk) for identifying clusters of MDROs often led to the identification of events likely to occur because of normal random fluctuations. Using a statistical method for cluster detection can focus hospital epidemiology efforts and conserve resources for events likely to represent actual outbreaks. Current methods for cluster detection in hospitals are labor-intensive, narrow in focus, and subject to both over- and under-ascertainment of clusters. We linked two publicly available software systems to screen microbiology data for statistically significant clusters among all pathogens, across all wards and services. In a single center study, we introduced the WHONET-SaTScan cluster detection tool and showed that it outperforms current infection control surveillance systems in several ways. First, it is more comprehensive. It is able to evaluate all pathogens with the potential to produce hospital-associated clusters. Current infection control surveillance is heavily focused on a small number of highly antibiotic-resistant bacteria, most of which are gram-positive pathogens. We found that two-thirds of identified clusters were due to gram-negative or fungal pathogens not under routine surveillance. Second, the automated nature of WHONET-SaTScan makes it labor-sparing compared to usual surveillance, which identifies clusters from daily microbiologic feeds using the trained human eye. This software can be run daily within seconds and can provide a prospective tool for real-time cluster detection. Furthermore, the use of routinely available microbiologic data makes it adaptable by all hospitals using conventional microbiologic data systems. More importantly, it has the potential to spare the labor of unnecessary investigation of perceived clusters that are merely chance aggregations. These perceived clusters often result in substantial intervention costs and efforts on behalf of infection control and involved hospital wards. Third, WHONET-SaTScan provides a statistical basis for cluster identification, thus improving the likelihood that the clusters represent health care–associated transmission events. When compared to conventional surveillance that uses numerical thresholds (rule-based criteria such as three cases in 2 wk in a single ward), we found that there was no significant statistical basis for nearly all of the clusters identified by routine infection control surveillance. This finding is not surprising given the rise in prevalence of MRSA and VRE—pathogens to which these rules are applied. In the example of MRSA, not only did rule-based criteria identify a large number of clusters (∼14/y) that may not have been real, but it failed to identify the once-a-year occurrence of a highly statistically significant cluster. Findings were even more striking for VRE. Although we recognize that statistical significance should not be the sole driver of cluster detection and response, the large discrepancy between statistically identified clusters and those found by infection control rule-based criteria suggests that statistical alerts (and lack of alerts) may provide a critical piece of information to guide action. These results suggest that much of current infection control surveillance for nosocomial clusters may be ineffective, failing to find true clusters that may indicate unusual nosocomial transmission and identifying numerous events that likely represent random variation from a baseline rate as clusters that warrant resource-intensive investigation and response. The reduction in the number of MRSA and VRE clusters more than offset the increased number of clusters that resulted from identifying clusters caused by all pathogens. If this is a typical result, then statistically based surveillance could provide a major redirection of scarce infection control efforts. Notably, WHONET-SaTScan was able to identify the major pathogen clusters known to infection control that had clear epidemiologic links and evidence of genetic clonality. Finally, the predictive value of alerts based on this scanning technique was acceptably high. Nearly all reported clusters were deemed of interest by the two hospital epidemiologists, and >25% generated sufficient concern to initiate an active investigation or full-scale intervention. There are several limitations to this evaluation. First, it is a single center study providing subjective evaluation by two hospital epidemiologists, both of whom have been affected by prior experience at that hospital. Additional assessment in other centers is needed for validation. Specifically, prospective validation is needed to evaluate whether statistical clusters are sufficiently important to warrant action, and whether ignoring rule-based clusters leads to no harm. In this study, we placed a subjective value on the WHONET-SaTScan clusters and assumed that all infection control clusters were deemed of high value. It was not possible to similarly assess the infection control clusters since action was taken once rule-based criteria were met. In addition, the recurrence interval was part of the assessment of WHONET-SaTScan clusters and this was not available for infection control clusters. The discrepancy between the WHONET-SaTScan results and the rule-based clusters can only be known in a prospective fashion when knowledge of statistical alerts can be integrated with clinical judgment to determine if action will be taken, and if a large cluster ensues because of inaction. If the value of statistical alerts continues to be demonstrated, then WHONET-SaTScan may provide a valuable tool for standardizing outbreak detection and evaluating the impact of various interventions to reduce nosocomial transmission. Beyond further validation, this work requires replication and assessment of generalizability in other hospitals. Nevertheless, because it bases cluster determination on expected numbers of cases from recent history, it is adaptable to the varying conditions across institutions and the changing rates of pathogen colonization and infection. In this analysis, we identified clusters by comparing cluster case counts to the “spatial” and temporal locations of all other cases occurring during a 365-d period. Although this identification allows the analyses to be robust to secular trends in the prevalence of pathogens arising from different wards and services, other baseline periods could have been selected. Secondly, clustering does not prove that there is an important biologic connection between cases. No matter what recurrence interval is selected, some clusters with a lower recurrence interval will reflect hospital transmission and some that exceed the value will be chance events. We do not have a precise estimate of this frequency because we performed a large number of scans across all pathogens and spatial dimensions. Further evaluation is needed to ensure that the threshold does not yield an unacceptable number of signals that are deemed of no interest. In this instance, the average of 12 alerts per year was far fewer than the number of clusters currently being identified by the Infection Control department. Notably, in this study, lowering the statistical threshold did not increase the overlap between WHONET-SaTScan clusters and those found by infection control. In conclusion, we demonstrate the usefulness of automated cluster detection that uses readily available microbiology data to identify clusters of clinically relevant nosocomial pathogens. This approach to cluster detection has the potential to be more comprehensive than current surveillance systems and save substantial amounts of infection control resources [14]. Most importantly and provocatively, these findings suggest that many of the events that trigger outbreak control protocols probably represent random variation rather than true outbreaks. Additionally, current infection control methods fail to identify a majority of events that are statistically unusual and may represent opportunities for intervention. This statistically based cluster detection tool could be readily implemented to improve and streamline the daily practice of infection control professionals.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                8 October 2015
                2015
                : 10
                : 10
                : e0139920
                Affiliations
                [1 ]Hospital Epidemiology and Infection Control Department, Dijon University Hospital, Dijon, France
                [2 ]Laboratory of Environmental Microbiology and Health Risks, University of Burgundy, Dijon, France
                [3 ]Infection Control Department, CHU Besançon, Besançon, France
                [4 ]Chrono-environment Laboratory, UMR CNRS 6249, University of Franche-Comté, Besançon, France
                [5 ]Infection Control, Epidemiology and Prevention Department, Hospital Group Edouard Herriot, Lyon, France
                [6 ]Epidemiology and Public Health Team, Claude Bernard University, Lyon, France
                [7 ]UHLIN, Hospital Group Bichat—Claude Bernard, HUPNVS, AP-HP, Paris, France
                [8 ]Paris Diderot University, Paris 7, Paris, France
                [9 ]Infectious Diseases Department, Dijon University Hospital, Dijon, France
                [10 ]Biostatistics and Medical Information Department, Dijon University Hospital, Dijon, France
                [11 ]Epidemiology Department—EA 4184, University of Burgundy, Dijon, France
                National Institutes of Health, UNITED STATES
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Conceived and designed the experiments: AL XB PV JCL PC KA CQ LSAG. Performed the experiments: AL KA MT. Analyzed the data: AL LSAG. Contributed reagents/materials/analysis tools: AL KA LSAG. Wrote the paper: AL XB CQ LSAG. Critical revision of manuscript: AL XB PV JCL PC KA MT CQ LSAG.

                Article
                PONE-D-15-16210
                10.1371/journal.pone.0139920
                4598114
                26448036
                7c7a92fc-7a71-4c1b-875f-9f3dbe32d631
                Copyright @ 2015

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

                History
                : 24 April 2015
                : 18 September 2015
                Page count
                Figures: 2, Tables: 4, Pages: 14
                Funding
                The authors have no support or funding to report.
                Categories
                Research Article
                Custom metadata
                Due to ethical and legal restrictions imposed by the French National Commission for Data Protection and Liberties (CNIL) related to protecting patient confidentiality, all relevant data are available to researchers who meet the criteria for access to confidential data upon request to the corresponding author after acceptance by the CNIL. Minimal datasets are available.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article