35
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      How and under what circumstances do quality improvement collaboratives lead to better outcomes? A systematic review

      review-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Quality improvement collaboratives are widely used to improve health care in both high-income and low and middle-income settings. Teams from multiple health facilities share learning on a given topic and apply a structured cycle of change testing. Previous systematic reviews reported positive effects on target outcomes, but the role of context and mechanism of change is underexplored. This realist-inspired systematic review aims to analyse contextual factors influencing intended outcomes and to identify how quality improvement collaboratives may result in improved adherence to evidence-based practices.

          Methods

          We built an initial conceptual framework to drive our enquiry, focusing on three context domains: health facility setting; project-specific factors; wider organisational and external factors; and two further domains pertaining to mechanisms: intra-organisational and inter-organisational changes. We systematically searched five databases and grey literature for publications relating to quality improvement collaboratives in a healthcare setting and containing data on context or mechanisms. We analysed and reported findings thematically and refined the programme theory.

          Results

          We screened 962 abstracts of which 88 met the inclusion criteria, and we retained 32 for analysis. Adequacy and appropriateness of external support, functionality of quality improvement teams, leadership characteristics and alignment with national systems and priorities may influence outcomes of quality improvement collaboratives, but the strength and quality of the evidence is weak. Participation in quality improvement collaborative activities may improve health professionals’ knowledge, problem-solving skills and attitude; teamwork; shared leadership and habits for improvement. Interaction across quality improvement teams may generate normative pressure and opportunities for capacity building and peer recognition.

          Conclusion

          Our review offers a novel programme theory to unpack the complexity of quality improvement collaboratives by exploring the relationship between context, mechanisms and outcomes. There remains a need for greater use of behaviour change and organisational psychology theory to improve design, adaptation and evaluation of the collaborative quality improvement approach and to test its effectiveness. Further research is needed to determine whether certain contextual factors related to capacity should be a precondition to the quality improvement collaborative approach and to test the emerging programme theory using rigorous research designs.

          Related collections

          Most cited references78

          • Record: found
          • Abstract: found
          • Article: not found

          The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies.

          Much biomedical research is observational. The reporting of such research is often inadequate, which hampers the assessment of its strengths and weaknesses and of a study's generalisability. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) initiative developed recommendations on what should be included in an accurate and complete report of an observational study. We defined the scope of the recommendations to cover three main study designs: cohort, case-control, and cross-sectional studies. We convened a 2-day workshop in September, 2004, with methodologists, researchers, and journal editors to draft a checklist of items. This list was subsequently revised during several meetings of the coordinating group and in e-mail discussions with the larger group of STROBE contributors, taking into account empirical evidence and methodological considerations. The workshop and the subsequent iterative process of consultation and revision resulted in a checklist of 22 items (the STROBE statement) that relate to the title, abstract, introduction, methods, results, and discussion sections of articles.18 items are common to all three study designs and four are specific for cohort, case-control, or cross-sectional studies.A detailed explanation and elaboration document is published separately and is freely available on the websites of PLoS Medicine, Annals of Internal Medicine, and Epidemiology. We hope that the STROBE statement will contribute to improving the quality of reporting of observational studies
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies That Evaluate Health Care Interventions: Explanation and Elaboration

            Introduction Systematic reviews and meta-analyses are essential tools for summarizing evidence accurately and reliably. They help clinicians keep up-to-date; provide evidence for policy makers to judge risks, benefits, and harms of health care behaviors and interventions; gather together and summarize related research for patients and their carers; provide a starting point for clinical practice guideline developers; provide summaries of previous research for funders wishing to support new research [1]; and help editors judge the merits of publishing reports of new studies [2]. Recent data suggest that at least 2,500 new systematic reviews reported in English are indexed in MEDLINE annually [3]. Unfortunately, there is considerable evidence that key information is often poorly reported in systematic reviews, thus diminishing their potential usefulness [3],[4],[5],[6]. As is true for all research, systematic reviews should be reported fully and transparently to allow readers to assess the strengths and weaknesses of the investigation [7]. That rationale led to the development of the QUOROM (QUality Of Reporting Of Meta-analyses) Statement; those detailed reporting recommendations were published in 1999 [8]. In this paper we describe the updating of that guidance. Our aim is to ensure clear presentation of what was planned, done, and found in a systematic review. Terminology used to describe systematic reviews and meta-analyses has evolved over time and varies across different groups of researchers and authors (see Box 1). In this document we adopt the definitions used by the Cochrane Collaboration [9]. A systematic review attempts to collate all empirical evidence that fits pre-specified eligibility criteria to answer a specific research question. It uses explicit, systematic methods that are selected to minimize bias, thus providing reliable findings from which conclusions can be drawn and decisions made. Meta-analysis is the use of statistical methods to summarize and combine the results of independent studies. Many systematic reviews contain meta-analyses, but not all. Box 1. Terminology The terminology used to describe systematic reviews and meta-analyses has evolved over time and varies between fields. Different terms have been used by different groups, such as educators and psychologists. The conduct of a systematic review comprises several explicit and reproducible steps, such as identifying all likely relevant records, selecting eligible studies, assessing the risk of bias, extracting data, qualitative synthesis of the included studies, and possibly meta-analyses. Initially this entire process was termed a meta-analysis and was so defined in the QUOROM Statement [8]. More recently, especially in health care research, there has been a trend towards preferring the term systematic review. If quantitative synthesis is performed, this last stage alone is referred to as a meta-analysis. The Cochrane Collaboration uses this terminology [9], under which a meta-analysis, if performed, is a component of a systematic review. Regardless of the question addressed and the complexities involved, it is always possible to complete a systematic review of existing data, but not always possible, or desirable, to quantitatively synthesize results, due to clinical, methodological, or statistical differences across the included studies. Conversely, with prospective accumulation of studies and datasets where the plan is eventually to combine them, the term “(prospective) meta-analysis” may make more sense than “systematic review.” For retrospective efforts, one possibility is to use the term systematic review for the whole process up to the point when one decides whether to perform a quantitative synthesis. If a quantitative synthesis is performed, some researchers refer to this as a meta-analysis. This definition is similar to that found in the current edition of the Dictionary of Epidemiology [183]. While we recognize that the use of these terms is inconsistent and there is residual disagreement among the members of the panel working on PRISMA, we have adopted the definitions used by the Cochrane Collaboration [9]. Systematic review: A systematic review attempts to collate all empirical evidence that fits pre-specified eligibility criteria to answer a specific research question. It uses explicit, systematic methods that are selected with a view to minimizing bias, thus providing reliable findings from which conclusions can be drawn and decisions made [184],[185]. The key characteristics of a systematic review are: (a) a clearly stated set of objectives with an explicit, reproducible methodology; (b) a systematic search that attempts to identify all studies that would meet the eligibility criteria; (c) an assessment of the validity of the findings of the included studies, for example through the assessment of risk of bias; and (d) systematic presentation, and synthesis, of the characteristics and findings of the included studies. Meta-analysis: Meta-analysis is the use of statistical techniques to integrate and summarize the results of included studies. Many systematic reviews contain meta-analyses, but not all. By combining information from all relevant studies, meta-analyses can provide more precise estimates of the effects of health care than those derived from the individual studies included within a review. The QUOROM Statement and Its Evolution into PRISMA The QUOROM Statement, developed in 1996 and published in 1999 [8], was conceived as a reporting guidance for authors reporting a meta-analysis of randomized trials. Since then, much has happened. First, knowledge about the conduct and reporting of systematic reviews has expanded considerably. For example, The Cochrane Library's Methodology Register (which includes reports of studies relevant to the methods for systematic reviews) now contains more than 11,000 entries (March 2009). Second, there have been many conceptual advances, such as “outcome-level” assessments of the risk of bias [10],[11], that apply to systematic reviews. Third, authors have increasingly used systematic reviews to summarize evidence other than that provided by randomized trials. However, despite advances, the quality of the conduct and reporting of systematic reviews remains well short of ideal [3],[4],[5],[6]. All of these issues prompted the need for an update and expansion of the QUOROM Statement. Of note, recognizing that the updated statement now addresses the above conceptual and methodological issues and may also have broader applicability than the original QUOROM Statement, we changed the name of the reporting guidance to PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses). Development of PRISMA The PRISMA Statement was developed by a group of 29 review authors, methodologists, clinicians, medical editors, and consumers [12]. They attended a three-day meeting in 2005 and participated in extensive post-meeting electronic correspondence. A consensus process that was informed by evidence, whenever possible, was used to develop a 27-item checklist (Table 1; see also Text S1 for a downloadable template checklist for researchers to re-use) and a four-phase flow diagram (Figure 1; see Figure S1 for a downloadable template document for researchers to re-use). Items deemed essential for transparent reporting of a systematic review were included in the checklist. The flow diagram originally proposed by QUOROM was also modified to show numbers of identified records, excluded articles, and included studies. After 11 revisions the group approved the checklist, flow diagram, and this explanatory paper. 10.1371/journal.pmed.1000100.g001 Figure 1 Flow of information through the different phases of a systematic review. The PRISMA Statement itself provides further details regarding its background and development [12]. This accompanying Explanation and Elaboration document explains the meaning and rationale for each checklist item. A few PRISMA Group participants volunteered to help draft specific items for this document, and four of these (DGA, AL, DM, and JT) met on several occasions to further refine the document, which was circulated and ultimately approved by the larger PRISMA Group. 10.1371/journal.pmed.1000100.t001 Table 1 Checklist of items to include when reporting a systematic review (with or without meta-analysis). Section/Topic # Checklist Item Reported on Page # TITLE Title 1 Identify the report as a systematic review, meta-analysis, or both. ABSTRACT Structured summary 2 Provide a structured summary including, as applicable: background; objectives; data sources; study eligibility criteria, participants, and interventions; study appraisal and synthesis methods; results; limitations; conclusions and implications of key findings; systematic review registration number. INTRODUCTION Rationale 3 Describe the rationale for the review in the context of what is already known. Objectives 4 Provide an explicit statement of questions being addressed with reference to participants, interventions, comparisons, outcomes, and study design (PICOS). METHODS Protocol and registration 5 Indicate if a review protocol exists, if and where it can be accessed (e.g., Web address), and, if available, provide registration information including registration number. Eligibility criteria 6 Specify study characteristics (e.g., PICOS, length of follow-up) and report characteristics (e.g., years considered, language, publication status) used as criteria for eligibility, giving rationale. Information sources 7 Describe all information sources (e.g., databases with dates of coverage, contact with study authors to identify additional studies) in the search and date last searched. Search 8 Present full electronic search strategy for at least one database, including any limits used, such that it could be repeated. Study selection 9 State the process for selecting studies (i.e., screening, eligibility, included in systematic review, and, if applicable, included in the meta-analysis). Data collection process 10 Describe method of data extraction from reports (e.g., piloted forms, independently, in duplicate) and any processes for obtaining and confirming data from investigators. Data items 11 List and define all variables for which data were sought (e.g., PICOS, funding sources) and any assumptions and simplifications made. Risk of bias in individual studies 12 Describe methods used for assessing risk of bias of individual studies (including specification of whether this was done at the study or outcome level), and how this information is to be used in any data synthesis. Summary measures 13 State the principal summary measures (e.g., risk ratio, difference in means). Synthesis of results 14 Describe the methods of handling data and combining results of studies, if done, including measures of consistency (e.g., I2) for each meta-analysis. Risk of bias across studies 15 Specify any assessment of risk of bias that may affect the cumulative evidence (e.g., publication bias, selective reporting within studies). Additional analyses 16 Describe methods of additional analyses (e.g., sensitivity or subgroup analyses, meta-regression), if done, indicating which were pre-specified. RESULTS Study selection 17 Give numbers of studies screened, assessed for eligibility, and included in the review, with reasons for exclusions at each stage, ideally with a flow diagram. Study characteristics 18 For each study, present characteristics for which data were extracted (e.g., study size, PICOS, follow-up period) and provide the citations. Risk of bias within studies 19 Present data on risk of bias of each study and, if available, any outcome-level assessment (see Item 12). Results of individual studies 20 For all outcomes considered (benefits or harms), present, for each study: (a) simple summary data for each intervention group and (b) effect estimates and confidence intervals, ideally with a forest plot. Synthesis of results 21 Present results of each meta-analysis done, including confidence intervals and measures of consistency. Risk of bias across studies 22 Present results of any assessment of risk of bias across studies (see Item 15). Additional analysis 23 Give results of additional analyses, if done (e.g., sensitivity or subgroup analyses, meta-regression [see Item 16]). DISCUSSION Summary of evidence 24 Summarize the main findings including the strength of evidence for each main outcome; consider their relevance to key groups (e.g., health care providers, users, and policy makers). Limitations 25 Discuss limitations at study and outcome level (e.g., risk of bias), and at review level (e.g., incomplete retrieval of identified research, reporting bias). Conclusions 26 Provide a general interpretation of the results in the context of other evidence, and implications for future research. FUNDING Funding 27 Describe sources of funding for the systematic review and other support (e.g., supply of data); role of funders for the systematic review. Scope of PRISMA PRISMA focuses on ways in which authors can ensure the transparent and complete reporting of systematic reviews and meta-analyses. It does not address directly or in a detailed manner the conduct of systematic reviews, for which other guides are available [13],[14],[15],[16]. We developed the PRISMA Statement and this explanatory document to help authors report a wide array of systematic reviews to assess the benefits and harms of a health care intervention. We consider most of the checklist items relevant when reporting systematic reviews of non-randomized studies assessing the benefits and harms of interventions. However, we recognize that authors who address questions relating to etiology, diagnosis, or prognosis, for example, and who review epidemiological or diagnostic accuracy studies may need to modify or incorporate additional items for their systematic reviews. How To Use This Paper We modeled this Explanation and Elaboration document after those prepared for other reporting guidelines [17],[18],[19]. To maximize the benefit of this document, we encourage people to read it in conjunction with the PRISMA Statement [11]. We present each checklist item and follow it with a published exemplar of good reporting for that item. (We edited some examples by removing citations or Web addresses, or by spelling out abbreviations.) We then explain the pertinent issue, the rationale for including the item, and relevant evidence from the literature, whenever possible. No systematic search was carried out to identify exemplars and evidence. We also include seven Boxes that provide a more comprehensive explanation of certain thematic aspects of the methodology and conduct of systematic reviews. Although we focus on a minimal list of items to consider when reporting a systematic review, we indicate places where additional information is desirable to improve transparency of the review process. We present the items numerically from 1 to 27; however, authors need not address items in this particular order in their reports. Rather, what is important is that the information for each item is given somewhere within the report. The PRISMA Checklist TITLE and ABSTRACT Item 1: TITLE Identify the report as a systematic review, meta-analysis, or both. Examples. “Recurrence rates of video-assisted thoracoscopic versus open surgery in the prevention of recurrent pneumothoraces: a systematic review of randomised and non-randomised trials” [20] “Mortality in randomized trials of antioxidant supplements for primary and secondary prevention: systematic review and meta-analysis” [21] Explanation Authors should identify their report as a systematic review or meta-analysis. Terms such as “review” or “overview” do not describe for readers whether the review was systematic or whether a meta-analysis was performed. A recent survey found that 50% of 300 authors did not mention the terms “systematic review” or “meta-analysis” in the title or abstract of their systematic review [3]. Although sensitive search strategies have been developed to identify systematic reviews [22], inclusion of the terms systematic review or meta-analysis in the title may improve indexing and identification. We advise authors to use informative titles that make key information easily accessible to readers. Ideally, a title reflecting the PICOS approach (participants, interventions, comparators, outcomes, and study design) (see Item 11 and Box 2) may help readers as it provides key information about the scope of the review. Specifying the design(s) of the studies included, as shown in the examples, may also help some readers and those searching databases. Box 2. Helping To Develop the Research Question(s): The PICOS Approach Formulating relevant and precise questions that can be answered in a systematic review can be complex and time consuming. A structured approach for framing questions that uses five components may help facilitate the process. This approach is commonly known by the acronym “PICOS” where each letter refers to a component: the patient population or the disease being addressed (P), the interventions or exposure (I), the comparator group (C), the outcome or endpoint (O), and the study design chosen (S) [186]. Issues relating to PICOS impact several PRISMA items (i.e., Items 6, 8, 9, 10, 11, and 18). Providing information about the population requires a precise definition of a group of participants (often patients), such as men over the age of 65 years, their defining characteristics of interest (often disease), and possibly the setting of care considered, such as an acute care hospital. The interventions (exposures) under consideration in the systematic review need to be transparently reported. For example, if the reviewers answer a question regarding the association between a woman's prenatal exposure to folic acid and subsequent offspring's neural tube defects, reporting the dose, frequency, and duration of folic acid used in different studies is likely to be important for readers to interpret the review's results and conclusions. Other interventions (exposures) might include diagnostic, preventative, or therapeutic treatments, arrangements of specific processes of care, lifestyle changes, psychosocial or educational interventions, or risk factors. Clearly reporting the comparator (control) group intervention(s), such as usual care, drug, or placebo, is essential for readers to fully understand the selection criteria of primary studies included in systematic reviews, and might be a source of heterogeneity investigators have to deal with. Comparators are often very poorly described. Clearly reporting what the intervention is compared with is very important and may sometimes have implications for the inclusion of studies in a review—many reviews compare with “standard care,” which is otherwise undefined; this should be properly addressed by authors. The outcomes of the intervention being assessed, such as mortality, morbidity, symptoms, or quality of life improvements, should be clearly specified as they are required to interpret the validity and generalizability of the systematic review's results. Finally, the type of study design(s) included in the review should be reported. Some reviews only include reports of randomized trials whereas others have broader design criteria and include randomized trials and certain types of observational studies. Still other reviews, such as those specifically answering questions related to harms, may include a wide variety of designs ranging from cohort studies to case reports. Whatever study designs are included in the review, these should be reported. Independently from how difficult it is to identify the components of the research question, the important point is that a structured approach is preferable, and this extends beyond systematic reviews of effectiveness. Ideally the PICOS criteria should be formulated a priori, in the systematic review's protocol, although some revisions might be required due to the iterative nature of the review process. Authors are encouraged to report their PICOS criteria and whether any modifications were made during the review process. A useful example in this realm is the Appendix of the “Systematic Reviews of Water Fluoridation” undertaken by the Centre for Reviews and Dissemination [187]. Some journals recommend “indicative titles” that indicate the topic matter of the review, while others require declarative titles that give the review's main conclusion. Busy practitioners may prefer to see the conclusion of the review in the title, but declarative titles can oversimplify or exaggerate findings. Thus, many journals and methodologists prefer indicative titles as used in the examples above. Item 2: STRUCTURED SUMMARY Provide a structured summary including, as applicable: background; objectives; data sources; study eligibility criteria, participants, and interventions; study appraisal and synthesis methods; results; limitations; conclusions and implications of key findings; funding for the systematic review; and systematic review registration number. Example. “Context: The role and dose of oral vitamin D supplementation in nonvertebral fracture prevention have not been well established. Objective: To estimate the effectiveness of vitamin D supplementation in preventing hip and nonvertebral fractures in older persons. Data Sources: A systematic review of English and non-English articles using MEDLINE and the Cochrane Controlled Trials Register (1960–2005), and EMBASE (1991–2005). Additional studies were identified by contacting clinical experts and searching bibliographies and abstracts presented at the American Society for Bone and Mineral Research (1995–2004). Search terms included randomized controlled trial (RCT), controlled clinical trial, random allocation, double-blind method, cholecalciferol, ergocalciferol, 25-hydroxyvitamin D, fractures, humans, elderly, falls, and bone density. Study Selection: Only double-blind RCTs of oral vitamin D supplementation (cholecalciferol, ergocalciferol) with or without calcium supplementation vs calcium supplementation or placebo in older persons (>60 years) that examined hip or nonvertebral fractures were included. Data Extraction: Independent extraction of articles by 2 authors using predefined data fields, including study quality indicators. Data Synthesis: All pooled analyses were based on random-effects models. Five RCTs for hip fracture (n = 9294) and 7 RCTs for nonvertebral fracture risk (n = 9820) met our inclusion criteria. All trials used cholecalciferol. Heterogeneity among studies for both hip and nonvertebral fracture prevention was observed, which disappeared after pooling RCTs with low-dose (400 IU/d) and higher-dose vitamin D (700–800 IU/d), separately. A vitamin D dose of 700 to 800 IU/d reduced the relative risk (RR) of hip fracture by 26% (3 RCTs with 5572 persons; pooled RR, 0.74; 95% confidence interval [CI], 0.61–0.88) and any nonvertebral fracture by 23% (5 RCTs with 6098 persons; pooled RR, 0.77; 95% CI, 0.68–0.87) vs calcium or placebo. No significant benefit was observed for RCTs with 400 IU/d vitamin D (2 RCTs with 3722 persons; pooled RR for hip fracture, 1.15; 95% CI, 0.88–1.50; and pooled RR for any nonvertebral fracture, 1.03; 95% CI, 0.86–1.24). Conclusions: Oral vitamin D supplementation between 700 to 800 IU/d appears to reduce the risk of hip and any nonvertebral fractures in ambulatory or institutionalized elderly persons. An oral vitamin D dose of 400 IU/d is not sufficient for fracture prevention.” [23] Explanation Abstracts provide key information that enables readers to understand the scope, processes, and findings of a review and to decide whether to read the full report. The abstract may be all that is readily available to a reader, for example, in a bibliographic database. The abstract should present a balanced and realistic assessment of the review's findings that mirrors, albeit briefly, the main text of the report. We agree with others that the quality of reporting in abstracts presented at conferences and in journal publications needs improvement [24],[25]. While we do not uniformly favor a specific format over another, we generally recommend structured abstracts. Structured abstracts provide readers with a series of headings pertaining to the purpose, conduct, findings, and conclusions of the systematic review being reported [26],[27]. They give readers more complete information and facilitate finding information more easily than unstructured abstracts [28],[29],[30],[31],[32]. A highly structured abstract of a systematic review could include the following headings: Context (or Background); Objective (or Purpose); Data Sources; Study Selection (or Eligibility Criteria); Study Appraisal and Synthesis Methods (or Data Extraction and Data Synthesis); Results; Limitations; and Conclusions (or Implications). Alternatively, a simpler structure could cover but collapse some of the above headings (e.g., label Study Selection and Study Appraisal as Review Methods) or omit some headings such as Background and Limitations. In the highly structured abstract mentioned above, authors use the Background heading to set the context for readers and explain the importance of the review question. Under the Objectives heading, they ideally use elements of PICOS (see Box 2) to state the primary objective of the review. Under a Data Sources heading, they summarize sources that were searched, any language or publication type restrictions, and the start and end dates of searches. Study Selection statements then ideally describe who selected studies using what inclusion criteria. Data Extraction Methods statements describe appraisal methods during data abstraction and the methods used to integrate or summarize the data. The Data Synthesis section is where the main results of the review are reported. If the review includes meta-analyses, authors should provide numerical results with confidence intervals for the most important outcomes. Ideally, they should specify the amount of evidence in these analyses (numbers of studies and numbers of participants). Under a Limitations heading, authors might describe the most important weaknesses of included studies as well as limitations of the review process. Then authors should provide clear and balanced Conclusions that are closely linked to the objective and findings of the review. Additionally, it would be helpful if authors included some information about funding for the review. Finally, although protocol registration for systematic reviews is still not common practice, if authors have registered their review or received a registration number, we recommend providing the registration information at the end of the abstract. Taking all the above considerations into account, the intrinsic tension between the goal of completeness of the abstract and its keeping into the space limit often set by journal editors is recognized as a major challenge. INTRODUCTION Item 3: RATIONALE Describe the rationale for the review in the context of what is already known. Example. “Reversing the trend of increasing weight for height in children has proven difficult. It is widely accepted that increasing energy expenditure and reducing energy intake form the theoretical basis for management. Therefore, interventions aiming to increase physical activity and improve diet are the foundation of efforts to prevent and treat childhood obesity. Such lifestyle interventions have been supported by recent systematic reviews, as well as by the Canadian Paediatric Society, the Royal College of Paediatrics and Child Health, and the American Academy of Pediatrics. However, these interventions are fraught with poor adherence. Thus, school-based interventions are theoretically appealing because adherence with interventions can be improved. Consequently, many local governments have enacted or are considering policies that mandate increased physical activity in schools, although the effect of such interventions on body composition has not been assessed.” [33] Explanation Readers need to understand the rationale behind the study and what the systematic review may add to what is already known. Authors should tell readers whether their report is a new systematic review or an update of an existing one. If the review is an update, authors should state reasons for the update, including what has been added to the evidence base since the previous version of the review. An ideal background or introduction that sets context for readers might include the following. First, authors might define the importance of the review question from different perspectives (e.g., public health, individual patient, or health policy). Second, authors might briefly mention the current state of knowledge and its limitations. As in the above example, information about the effects of several different interventions may be available that helps readers understand why potential relative benefits or harms of particular interventions need review. Third, authors might whet readers' appetites by clearly stating what the review aims to add. They also could discuss the extent to which the limitations of the existing evidence base may be overcome by the review. Item 4: OBJECTIVES Provide an explicit statement of questions being addressed with reference to participants, interventions, comparisons, outcomes, and study design (PICOS). Example. “To examine whether topical or intraluminal antibiotics reduce catheter-related bloodstream infection, we reviewed randomized, controlled trials that assessed the efficacy of these antibiotics for primary prophylaxis against catheter-related bloodstream infection and mortality compared with no antibiotic therapy in adults undergoing hemodialysis.” [34] Explanation The questions being addressed, and the rationale for them, are one of the most critical parts of a systematic review. They should be stated precisely and explicitly so that readers can understand quickly the review's scope and the potential applicability of the review to their interests [35]. Framing questions so that they include the following five “PICOS” components may improve the explicitness of review questions: (1) the patient population or disease being addressed (P), (2) the interventions or exposure of interest (I), (3) the comparators (C), (4) the main outcome or endpoint of interest (O), and (5) the study designs chosen (S). For more detail regarding PICOS, see Box 2. Good review questions may be narrowly focused or broad, depending on the overall objectives of the review. Sometimes broad questions might increase the applicability of the results and facilitate detection of bias, exploratory analyses, and sensitivity analyses [35],[36]. Whether narrowly focused or broad, precisely stated review objectives are critical as they help define other components of the review process such as the eligibility criteria (Item 6) and the search for relevant literature (Items 7 and 8). METHODS Item 5: PROTOCOL AND REGISTRATION Indicate if a review protocol exists, if and where it can be accessed (e.g., Web address) and, if available, provide registration information including the registration number. Example. “Methods of the analysis and inclusion criteria were specified in advance and documented in a protocol.” [37] Explanation A protocol is important because it pre-specifies the objectives and methods of the systematic review. For instance, a protocol specifies outcomes of primary interest, how reviewers will extract information about those outcomes, and methods that reviewers might use to quantitatively summarize the outcome data (see Item 13). Having a protocol can help restrict the likelihood of biased post hoc decisions in review methods, such as selective outcome reporting. Several sources provide guidance about elements to include in the protocol for a systematic review [16],[38],[39]. For meta-analyses of individual patient-level data, we advise authors to describe whether a protocol was explicitly designed and whether, when, and how participating collaborators endorsed it [40],[41]. Authors may modify protocols during the research, and readers should not automatically consider such modifications inappropriate. For example, legitimate modifications may extend the period of searches to include older or newer studies, broaden eligibility criteria that proved too narrow, or add analyses if the primary analyses suggest that additional ones are warranted. Authors should, however, describe the modifications and explain their rationale. Although worthwhile protocol amendments are common, one must consider the effects that protocol modifications may have on the results of a systematic review, especially if the primary outcome is changed. Bias from selective outcome reporting in randomized trials has been well documented [42],[43]. An examination of 47 Cochrane reviews revealed indirect evidence for possible selective reporting bias for systematic reviews. Almost all (n = 43) contained a major change, such as the addition or deletion of outcomes, between the protocol and the full publication [44]. Whether (or to what extent) the changes reflected bias, however, was not clear. For example, it has been rather common not to describe outcomes that were not presented in any of the included studies. Registration of a systematic review, typically with a protocol and registration number, is not yet common, but some opportunities exist [45],[46]. Registration may possibly reduce the risk of multiple reviews addressing the same question [45],[46],[47],[48], reduce publication bias, and provide greater transparency when updating systematic reviews. Of note, a survey of systematic reviews indexed in MEDLINE in November 2004 found that reports of protocol use had increased to about 46% [3] from 8% noted in previous surveys [49]. The improvement was due mostly to Cochrane reviews, which, by requirement, have a published protocol [3]. Item 6: ELIGIBILITY CRITERIA Specify study characteristics (e.g., PICOS, length of follow-up) and report characteristics (e.g., years considered, language, publication status) used as criteria for eligibility, giving rationale. Examples. Types of studies: “Randomised clinical trials studying the administration of hepatitis B vaccine to CRF [chronic renal failure] patients, with or without dialysis. No language, publication date, or publication status restrictions were imposed…” Types of participants: “Participants of any age with CRF or receiving dialysis (haemodialysis or peritoneal dialysis) were considered. CRF was defined as serum creatinine greater than 200 µmol/L for a period of more than six months or individuals receiving dialysis (haemodialysis or peritoneal dialysis)…Renal transplant patients were excluded from this review as these individuals are immunosuppressed and are receiving immunosuppressant agents to prevent rejection of their transplanted organs, and they have essentially normal renal function…” Types of intervention: “Trials comparing the beneficial and harmful effects of hepatitis B vaccines with adjuvant or cytokine co-interventions [and] trials comparing the beneficial and harmful effects of immunoglobulin prophylaxis. This review was limited to studies looking at active immunization. Hepatitis B vaccines (plasma or recombinant (yeast) derived) of all types, dose, and regimens versus placebo, control vaccine, or no vaccine…” Types of outcome measures: “Primary outcome measures: Seroconversion, ie, proportion of patients with adequate anti-HBs response (>10 IU/L or Sample Ratio Units). Hepatitis B infections (as measured by hepatitis B core antigen (HBcAg) positivity or persistent HBsAg positivity), both acute and chronic. Acute (primary) HBV [hepatitis B virus] infections were defined as seroconversion to HBsAg positivity or development of IgM anti-HBc. Chronic HBV infections were defined as the persistence of HBsAg for more than six months or HBsAg positivity and liver biopsy compatible with a diagnosis or chronic hepatitis B. Secondary outcome measures: Adverse events of hepatitis B vaccinations…[and]…mortality.” [50] Explanation Knowledge of the eligibility criteria is essential in appraising the validity, applicability, and comprehensiveness of a review. Thus, authors should unambiguously specify eligibility criteria used in the review. Carefully defined eligibility criteria inform various steps of the review methodology. They influence the development of the search strategy and serve to ensure that studies are selected in a systematic and unbiased manner. A study may be described in multiple reports, and one report may describe multiple studies. Therefore, we separate eligibility criteria into the following two components: study characteristics and report characteristics. Both need to be reported. Study eligibility criteria are likely to include the populations, interventions, comparators, outcomes, and study designs of interest (PICOS; see Box 2), as well as other study-specific elements, such as specifying a minimum length of follow-up. Authors should state whether studies will be excluded because they do not include (or report) specific outcomes to help readers ascertain whether the systematic review may be biased as a consequence of selective reporting [42],[43]. Report eligibility criteria are likely to include language of publication, publication status (e.g., inclusion of unpublished material and abstracts), and year of publication. Inclusion or not of non-English language literature [51],[52],[53],[54],[55], unpublished data, or older data can influence the effect estimates in meta-analyses [56],[57],[58],[59]. Caution may need to be exercised in including all identified studies due to potential differences in the risk of bias such as, for example, selective reporting in abstracts [60],[61],[62]. Item 7: INFORMATION SOURCES Describe all information sources in the search (e.g., databases with dates of coverage, contact with study authors to identify additional studies) and date last searched. Example. “Studies were identified by searching electronic databases, scanning reference lists of articles and consultation with experts in the field and drug companies…No limits were applied for language and foreign papers were translated. This search was applied to Medline (1966–Present), CancerLit (1975–Present), and adapted for Embase (1980–Present), Science Citation Index Expanded (1981–Present) and Pre-Medline electronic databases. Cochrane and DARE (Database of Abstracts of Reviews of Effectiveness) databases were reviewed…The last search was run on 19 June 2001. In addition, we handsearched contents pages of Journal of Clinical Oncology 2001, European Journal of Cancer 2001 and Bone 2001, together with abstracts printed in these journals 1999–2001. A limited update literature search was performed from 19 June 2001 to 31 December 2003.” [63] Explanation The National Library of Medicine's MEDLINE database is one of the most comprehensive sources of health care information in the world. Like any database, however, its coverage is not complete and varies according to the field. Retrieval from any single database, even by an experienced searcher, may be imperfect, which is why detailed reporting is important within the systematic review. At a minimum, for each database searched, authors should report the database, platform, or provider (e.g., Ovid, Dialog, PubMed) and the start and end dates for the search of each database. This information lets readers assess the currency of the review, which is important because the publication time-lag outdates the results of some reviews [64]. This information should also make updating more efficient [65]. Authors should also report who developed and conducted the search [66]. In addition to searching databases, authors should report the use of supplementary approaches to identify studies, such as hand searching of journals, checking reference lists, searching trials registries or regulatory agency Web sites [67], contacting manufacturers, or contacting authors. Authors should also report if they attempted to acquire any missing information (e.g., on study methods or results) from investigators or sponsors; it is useful to describe briefly who was contacted and what unpublished information was obtained. Item 8: SEARCH Present the full electronic search strategy for at least one major database, including any limits used, such that it could be repeated. Examples. In text: “We used the following search terms to search all trials registers and databases: immunoglobulin*; IVIG; sepsis; septic shock; septicaemia; and septicemia…” [68] In appendix: “Search strategy: MEDLINE (OVID) 01. immunoglobulins/ 02. immunoglobulin$.tw. 03. ivig.tw. 04. 1 or 2 or 3 05. sepsis/ 06. sepsis.tw. 07. septic shock/ 08. septic shock.tw. 09. septicemia/ 10. septicaemia.tw. 11. septicemia.tw. 12. 5 or 6 or 7 or 8 or 9 or 10 or 11 13. 4 and 12 14. randomized controlled trials/ 15. randomized-controlled-trial.pt. 16. controlled-clinical-trial.pt. 17. random allocation/ 18. double-blind method/ 19. single-blind method/ 20. 14 or 15 or 16 or 17 or 18 or 19 21. exp clinical trials/ 22. clinical-trial.pt. 23. (clin$ adj trial$).ti,ab. 24. ((singl$ or doubl$ or trebl$ or tripl$) adj (blind$)).ti,ab. 25. placebos/ 26. placebo$.ti,ab. 27. random$.ti,ab. 28. 21 or 22 or 23 or 24 or 25 or 26 or 27 29. research design/ 30. comparative study/ 31. exp evaluation studies/ 32. follow-up studies/ 33. prospective studies/ 34. (control$ or prospective$ or volunteer$).ti,ab. 35. 30 or 31 or 32 or 33 or 34 36. 20 or 28 or 29 or 35 37. 13 and 36” [68] Explanation The search strategy is an essential part of the report of any systematic review. Searches may be complicated and iterative, particularly when reviewers search unfamiliar databases or their review is addressing a broad or new topic. Perusing the search strategy allows interested readers to assess the comprehensiveness and completeness of the search, and to replicate it. Thus, we advise authors to report their full electronic search strategy for at least one major database. As an alternative to presenting search strategies for all databases, authors could indicate how the search took into account other databases searched, as index terms vary across databases. If different searches are used for different parts of a wider question (e.g., questions relating to benefits and questions relating to harms), we recommend authors provide at least one example of a strategy for each part of the objective [69]. We also encourage authors to state whether search strategies were peer reviewed as part of the systematic review process [70]. We realize that journal restrictions vary and that having the search strategy in the text of the report is not always feasible. We strongly encourage all journals, however, to find ways, such as a “Web extra,” appendix, or electronic link to an archive, to make search strategies accessible to readers. We also advise all authors to archive their searches so that (1) others may access and review them (e.g., replicate them or understand why their review of a similar topic did not identify the same reports), and (2) future updates of their review are facilitated. Several sources provide guidance on developing search strategies [71],[72],[73]. Most searches have constraints, for example relating to limited time or financial resources, inaccessible or inadequately indexed reports and databases, unavailability of experts with particular language or database searching skills, or review questions for which pertinent evidence is not easy to find. Authors should be straightforward in describing their search constraints. Apart from the keywords used to identify or exclude records, they should report any additional limitations relevant to the search, such as language and date restrictions (see also eligibility criteria, Item 6) [51]. Item 9: STUDY SELECTION State the process for selecting studies (i.e., for screening, for determining eligibility, for inclusion in the systematic review, and, if applicable, for inclusion in the meta-analysis). Example. “Eligibility assessment…[was] performed independently in an unblinded standardized manner by 2 reviewers…Disagreements between reviewers were resolved by consensus.” [74] Explanation There is no standard process for selecting studies to include in a systematic review. Authors usually start with a large number of identified records from their search and sequentially exclude records according to eligibility criteria. We advise authors to report how they screened the retrieved records (typically a title and abstract), how often it was necessary to review the full text publication, and if any types of record (e.g., letters to the editor) were excluded. We also advise using the PRISMA flow diagram to summarize study selection processes (see Item 17; Box 3). Box 3. Identification of Study Reports and Data Extraction Comprehensive searches usually result in a large number of identified records, a much smaller number of studies included in the systematic review, and even fewer of these studies included in any meta-analyses. Reports of systematic reviews often provide little detail as to the methods used by the review team in this process. Readers are often left with what can be described as the “X-files” phenomenon, as it is unclear what occurs between the initial set of identified records and those finally included in the review. Sometimes, review authors simply report the number of included studies; more often they report the initial number of identified records and the number of included studies. Rarely, although this is optimal for readers, do review authors report the number of identified records, the smaller number of potentially relevant studies, and the even smaller number of included studies, by outcome. Review authors also need to differentiate between the number of reports and studies. Often there will not be a 1∶1 ratio of reports to studies and this information needs to be described in the systematic review report. Ideally, the identification of study reports should be reported as text in combination with use of the PRISMA flow diagram. While we recommend use of the flow diagram, a small number of reviews might be particularly simple and can be sufficiently described with a few brief sentences of text. More generally, review authors will need to report the process used for each step: screening the identified records; examining the full text of potentially relevant studies (and reporting the number that could not be obtained); and applying eligibility criteria to select the included studies. Such descriptions should also detail how potentially eligible records were promoted to the next stage of the review (e.g., full text screening) and to the final stage of this process, the included studies. Often review teams have three response options for excluding records or promoting them to the next stage of the winnowing process: “yes,” “no,” and “maybe.” Similarly, some detail should be reported on who participated and how such processes were completed. For example, a single person may screen the identified records while a second person independently examines a small sample of them. The entire winnowing process is one of “good book keeping” whereby interested readers should be able to work backwards from the included studies to come up with the same numbers of identified records. There is often a paucity of information describing the data extraction processes in reports of systematic reviews. Authors may simply report that “relevant” data were extracted from each included study with little information about the processes used for data extraction. It may be useful for readers to know whether a systematic review's authors developed, a priori or not, a data extraction form, whether multiple forms were used, the number of questions, whether the form was pilot tested, and who completed the extraction. For example, it is important for readers to know whether one or more people extracted data, and if so, whether this was completed independently, whether “consensus” data were used in the analyses, and if the review team completed an informal training exercise or a more formal reliability exercise. Efforts to enhance objectivity and avoid mistakes in study selection are important. Thus authors should report whether each stage was carried out by one or several people, who these people were, and, whenever multiple independent investigators performed the selection, what the process was for resolving disagreements. The use of at least two investigators may reduce the possibility of rejecting relevant reports [75]. The benefit may be greatest for topics where selection or rejection of an article requires difficult judgments [76]. For these topics, authors should ideally tell readers the level of inter-rater agreement, how commonly arbitration about selection was required, and what efforts were made to resolve disagreements (e.g., by contact with the authors of the original studies). Item 10: DATA COLLECTION PROCESS Describe the method of data extraction from reports (e.g., piloted forms, independently by two reviewers) and any processes for obtaining and confirming data from investigators. Example. “We developed a data extraction sheet (based on the Cochrane Consumers and Communication Review Group's data extraction template), pilot-tested it on ten randomly-selected included studies, and refined it accordingly. One review author extracted the following data from included studies and the second author checked the extracted data…Disagreements were resolved by discussion between the two review authors; if no agreement could be reached, it was planned a third author would decide. We contacted five authors for further information. All responded and one provided numerical data that had only been presented graphically in the published paper.” [77] Explanation Reviewers extract information from each included study so that they can critique, present, and summarize evidence in a systematic review. They might also contact authors of included studies for information that has not been, or is unclearly, reported. In meta-analysis of individual patient data, this phase involves collection and scrutiny of detailed raw databases. The authors should describe these methods, including any steps taken to reduce bias and mistakes during data collection and data extraction [78] (Box 3). Some systematic reviewers use a data extraction form that could be reported as an appendix or “Web extra” to their report. These forms could show the reader what information reviewers sought (see Item 11) and how they extracted it. Authors could tell readers if the form was piloted. Regardless, we advise authors to tell readers who extracted what data, whether any extractions were completed in duplicate, and, if so, whether duplicate abstraction was done independently and how disagreements were resolved. Published reports of the included studies may not provide all the information required for the review. Reviewers should describe any actions they took to seek additional information from the original researchers (see Item 7). The description might include how they attempted to contact researchers, what they asked for, and their success in obtaining the necessary information. Authors should also tell readers when individual patient data were sought from the original researchers [41] (see Item 11) and indicate the studies for which such data were used in the analyses. The reviewers ideally should also state whether they confirmed the accuracy of the information included in their review with the original researchers, for example, by sending them a copy of the draft review [79]. Some studies are published more than once. Duplicate publications may be difficult to ascertain, and their inclusion may introduce bias [80],[81]. We advise authors to describe any steps they used to avoid double counting and piece together data from multiple reports of the same study (e.g., juxtaposing author names, treatment comparisons, sample sizes, or outcomes). We also advise authors to indicate whether all reports on a study were considered, as inconsistencies may reveal important limitations. For example, a review of multiple publications of drug trials showed that reported study characteristics may differ from report to report, including the description of the design, number of patients analyzed, chosen significance level, and outcomes [82]. Authors ideally should present any algorithm that they used to select data from overlapping reports and any efforts they used to solve logical inconsistencies across reports. Item 11: DATA ITEMS List and define all variables for which data were sought (e.g., PICOS, funding sources), and any assumptions and simplifications made. Examples. “Information was extracted from each included trial on: (1) characteristics of trial participants (including age, stage and severity of disease, and method of diagnosis), and the trial's inclusion and exclusion criteria; (2) type of intervention (including type, dose, duration and frequency of the NSAID [non-steroidal anti-inflammatory drug]; versus placebo or versus the type, dose, duration and frequency of another NSAID; or versus another pain management drug; or versus no treatment); (3) type of outcome measure (including the level of pain reduction, improvement in quality of life score (using a validated scale), effect on daily activities, absence from work or school, length of follow up, unintended effects of treatment, number of women requiring more invasive treatment).” [83] Explanation It is important for readers to know what information review authors sought, even if some of this information was not available [84]. If the review is limited to reporting only those variables that were obtained, rather than those that were deemed important but could not be obtained, bias might be introduced and the reader might be misled. It is therefore helpful if authors can refer readers to the protocol (see Item 5), and archive their extraction forms (see Item 10), including definitions of variables. The published systematic review should include a description of the processes used with, if relevant, specification of how readers can get access to additional materials. We encourage authors to report whether some variables were added after the review started. Such variables might include those found in the studies that the reviewers identified (e.g., important outcome measures that the reviewers initially overlooked). Authors should describe the reasons for adding any variables to those already pre-specified in the protocol so that readers can understand the review process. We advise authors to report any assumptions they made about missing or unclear information and to explain those processes. For example, in studies of women aged 50 or older it is reasonable to assume that none were pregnant, even if this is not reported. Likewise, review authors might make assumptions about the route of administration of drugs assessed. However, special care should be taken in making assumptions about qualitative information. For example, the upper age limit for “children” can vary from 15 years to 21 years, “intense” physiotherapy might mean very different things to different researchers at different times and for different patients, and the volume of blood associated with “heavy” blood loss might vary widely depending on the setting. Item 12: RISK OF BIAS IN INDIVIDUAL STUDIES Describe methods used for assessing risk of bias in individual studies (including specification of whether this was done at the study or outcome level, or both), and how this information is to be used in any data synthesis. Example. “To ascertain the validity of eligible randomized trials, pairs of reviewers working independently and with adequate reliability determined the adequacy of randomization and concealment of allocation, blinding of patients, health care providers, data collectors, and outcome assessors; and extent of loss to follow-up (i.e. proportion of patients in whom the investigators were not able to ascertain outcomes).” [85] “To explore variability in study results (heterogeneity) we specified the following hypotheses before conducting the analysis. We hypothesised that effect size may differ according to the methodological quality of the studies.” [86] Explanation The likelihood that the treatment effect reported in a systematic review approximates the truth depends on the validity of the included studies, as certain methodological characteristics may be associated with effect sizes [87],[88]. For example, trials without reported adequate allocation concealment exaggerate treatment effects on average compared to those with adequate concealment [88]. Therefore, it is important for authors to describe any methods that they used to gauge the risk of bias in the included studies and how that information was used [89]. Additionally, authors should provide a rationale if no assessment of risk of bias was undertaken. The most popular term to describe the issues relevant to this item is “quality,” but for the reasons that are elaborated in Box 4 we prefer to name this item as “assessment of risk of bias.” Box 4. Study Quality and Risk of Bias In this paper, and elsewhere [11], we sought to use a new term for many readers, namely, risk of bias, for evaluating each included study in a systematic review. Previous papers [89],[188] tended to use the term “quality”. When carrying out a systematic review we believe it is important to distinguish between quality and risk of bias and to focus on evaluating and reporting the latter. Quality is often the best the authors have been able to do. For example, authors may report the results of surgical trials in which blinding of the outcome assessors was not part of the trial's conduct. Even though this may have been the best methodology the researchers were able to do, there are still theoretical grounds for believing that the study was susceptible to (risk of) bias. Assessing the risk of bias should be part of the conduct and reporting of any systematic review. In all situations, we encourage systematic reviewers to think ahead carefully about what risks of bias (methodological and clinical) may have a bearing on the results of their systematic reviews. For systematic reviewers, understanding the risk of bias on the results of studies is often difficult, because the report is only a surrogate of the actual conduct of the study. There is some suggestion [189],[190] that the report may not be a reasonable facsimile of the study, although this view is not shared by all [88],[191]. There are three main ways to assess risk of bias: individual components, checklists, and scales. There are a great many scales available [192], although we caution their use based on theoretical grounds [193] and emerging empirical evidence [194]. Checklists are less frequently used and potentially run the same problems as scales. We advocate using a component approach and one that is based on domains for which there is good empirical evidence and perhaps strong clinical grounds. The new Cochrane risk of bias tool [11] is one such component approach. The Cochrane risk of bias tool consists of five items for which there is empirical evidence for their biasing influence on the estimates of an intervention's effectiveness in randomized trials (sequence generation, allocation concealment, blinding, incomplete outcome data, and selective outcome reporting) and a catch-all item called “other sources of bias” [11]. There is also some consensus that these items can be applied for evaluation of studies across very diverse clinical areas [93]. Other risk of bias items may be topic or even study specific, i.e., they may stem from some peculiarity of the research topic or some special feature of the design of a specific study. These peculiarities need to be investigated on a case-by-case basis, based on clinical and methodological acumen, and there can be no general recipe. In all situations, systematic reviewers need to think ahead carefully about what aspects of study quality may have a bearing on the results. Many methods exist to assess the overall risk of bias in included studies, including scales, checklists, and individual components [90],[91]. As discussed in Box 4, scales that numerically summarize multiple components into a single number are misleading and unhelpful [92],[93]. Rather, authors should specify the methodological components that they assessed. Common markers of validity for randomized trials include the following: appropriate generation of random allocation sequence [94]; concealment of the allocation sequence [93]; blinding of participants, health care providers, data collectors, and outcome adjudicators [95],[96],[97],[98]; proportion of patients lost to follow-up [99],[100]; stopping of trials early for benefit [101]; and whether the analysis followed the intention-to-treat principle [100],[102]. The ultimate decision regarding which methodological features to evaluate requires consideration of the strength of the empiric data, theoretical rationale, and the unique circumstances of the included studies. Authors should report how they assessed risk of bias; whether it was in a blind manner; and if assessments were completed by more than one person, and if so, whether they were completed independently [103],[104]. Similarly, we encourage authors to report any calibration exercises among review team members that were done. Finally, authors need to report how their assessments of risk of bias are used subsequently in the data synthesis (see Item 16). Despite the often difficult task of assessing the risk of bias in included studies, authors are sometimes silent on what they did with the resultant assessments [89]. If authors exclude studies from the review or any subsequent analyses on the basis of the risk of bias, they should tell readers which studies they excluded and explain the reasons for those exclusions (see Item 6). Authors should also describe any planned sensitivity or subgroup analyses related to bias assessments (see Item 16). Item 13: SUMMARY MEASURES State the principal summary measures (e.g., risk ratio, difference in means). Examples. “Relative risk of mortality reduction was the primary measure of treatment effect.” [105] “The meta-analyses were performed by computing relative risks (RRs) using random-effects model. Quantitative analyses were performed on an intention-to-treat basis and were confined to data derived from the period of follow-up. RR and 95% confidence intervals for each side effect (and all side effects) were calculated.” [106] “The primary outcome measure was the mean difference in log10 HIV-1 viral load comparing zinc supplementation to placebo…” [107] Explanation When planning a systematic review, it is generally desirable that authors pre-specify the outcomes of primary interest (see Item 5) as well as the intended summary effect measure for each outcome. The chosen summary effect measure may differ from that used in some of the included studies. If possible the choice of effect measures should be explained, though it is not always easy to judge in advance which measure is the most appropriate. For binary outcomes, the most common summary measures are the risk ratio, odds ratio, and risk difference [108]. Relative effects are more consistent across studies than absolute effects [109],[110], although absolute differences are important when interpreting findings (see Item 24). For continuous outcomes, the natural effect measure is the difference in means [108]. Its use is appropriate when outcome measurements in all studies are made on the same scale. The standardized difference in means is used when the studies do not yield directly comparable data. Usually this occurs when all studies assess the same outcome but measure it in a variety of ways (e.g., different scales to measure depression). For time-to-event outcomes, the hazard ratio is the most common summary measure. Reviewers need the log hazard ratio and its standard error for a study to be included in a meta-analysis [111]. This information may not be given for all studies, but methods are available for estimating the desired quantities from other reported information [111]. Risk ratio and odds ratio (in relation to events occurring by a fixed time) are not equivalent to the hazard ratio, and median survival times are not a reliable basis for meta-analysis [112]. If authors have used these measures they should describe their methods in the report. Item 14: PLANNED METHODS OF ANALYSIS Describe the methods of handling data and combining results of studies, if done, including measures of consistency (e.g., I2) for each meta-analysis. Examples. “We tested for heterogeneity with the Breslow-Day test, and used the method proposed by Higgins et al. to measure inconsistency (the percentage of total variation across studies due to heterogeneity) of effects across lipid-lowering interventions. The advantages of this measure of inconsistency (termed I2) are that it does not inherently depend on the number of studies and is accompanied by an uncertainty interval.” [113] “In very few instances, estimates of baseline mean or mean QOL [Quality of life] responses were obtained without corresponding estimates of variance (standard deviation [SD] or standard error). In these instances, an SD was imputed from the mean of the known SDs. In a number of cases, the response data available were the mean and variance in a pre study condition and after therapy. The within-patient variance in these cases could not be calculated directly and was approximated by assuming independence.” [114] Explanation The data extracted from the studies in the review may need some transformation (processing) before they are suitable for analysis or for presentation in an evidence table. Although such data handling may facilitate meta-analyses, it is sometimes needed even when meta-analyses are not done. For example, in trials with more than two intervention groups it may be necessary to combine results for two or more groups (e.g., receiving similar but non-identical interventions), or it may be desirable to include only a subset of the data to match the review's inclusion criteria. When several different scales (e.g., for depression) are used across studies, the sign of some scores may need to be reversed to ensure that all scales are aligned (e.g., so low values represent good health on all scales). Standard deviations may have to be reconstructed from other statistics such as p-values and t statistics [115],[116], or occasionally they may be imputed from the standard deviations observed in other studies [117]. Time-to-event data also usually need careful conversions to a consistent format [111]. Authors should report details of any such data processing. Statistical combination of data from two or more separate studies in a meta-analysis may be neither necessary nor desirable (see Box 5 and Item 21). Regardless of the decision to combine individual study results, authors should report how they planned to evaluate between-study variability (heterogeneity or inconsistency) (Box 6). The consistency of results across trials may influence the decision of whether to combine trial results in a meta-analysis. Box 5. Whether or Not To Combine Data Deciding whether or not to combine data involves statistical, clinical, and methodological considerations. The statistical decisions are perhaps the most technical and evidence-based. These are more thoroughly discussed in Box 6. The clinical and methodological decisions are generally based on discussions within the review team and may be more subjective. Clinical considerations will be influenced by the question the review is attempting to address. Broad questions might provide more “license” to combine more disparate studies, such as whether “Ritalin is effective in increasing focused attention in people diagnosed with attention deficit hyperactivity disorder (ADHD).” Here authors might elect to combine reports of studies involving children and adults. If the clinical question is more focused, such as whether “Ritalin is effective in increasing classroom attention in previously undiagnosed ADHD children who have no comorbid conditions,” it is likely that different decisions regarding synthesis of studies are taken by authors. In any case authors should describe their clinical decisions in the systematic review report. Deciding whether or not to combine data also has a methodological component. Reviewers may decide not to combine studies of low risk of bias with those of high risk of bias (see Items 12 and 19). For example, for subjective outcomes, systematic review authors may not wish to combine assessments that were completed under blind conditions with those that were not. For any particular question there may not be a “right” or “wrong” choice concerning synthesis, as such decisions are likely complex. However, as the choice may be subjective, authors should be transparent as to their key decisions and describe them for readers. Box 6. Meta-Analysis and Assessment of Consistency (Heterogeneity) Meta-Analysis: Statistical Combination of the Results of Multiple Studies If it is felt that studies should have their results combined statistically, other issues must be considered because there are many ways to conduct a meta-analysis. Different effect measures can be used for both binary and continuous outcomes (see Item 13). Also, there are two commonly used statistical models for combining data in a meta-analysis [195]. The fixed-effect model assumes that there is a common treatment effect for all included studies [196]; it is assumed that the observed differences in results across studies reflect random variation [196]. The random-effects model assumes that there is no common treatment effect for all included studies but rather that the variation of the effects across studies follows a particular distribution [197]. In a random-effects model it is believed that the included studies represent a random sample from a larger population of studies addressing the question of interest [198]. There is no consensus about whether to use fixed- or random-effects models, and both are in wide use. The following differences have influenced some researchers regarding their choice between them. The random-effects model gives more weight to the results of smaller trials than does the fixed-effect analysis, which may be undesirable as small trials may be inferior and most prone to publication bias. The fixed-effect model considers only within-study variability whereas the random-effects model considers both within- and between-study variability. This is why a fixed-effect analysis tends to give narrower confidence intervals (i.e., provide greater precision) than a random-effects analysis [110],[196],[199]. In the absence of any between-study heterogeneity, the fixed- and random-effects estimates will coincide. In addition, there are different methods for performing both types of meta-analysis [200]. Common fixed-effect approaches are Mantel-Haenszel and inverse variance, whereas random-effects analyses usually use the DerSimonian and Laird approach, although other methods exist, including Bayesian meta-analysis [201]. In the presence of demonstrable between-study heterogeneity (see below), some consider that the use of a fixed-effect analysis is counterintuitive because their main assumption is violated. Others argue that it is inappropriate to conduct any meta-analysis when there is unexplained variability across trial results. If the reviewers decide not to combine the data quantitatively, a danger is that eventually they may end up using quasi-quantitative rules of poor validity (e.g., vote counting of how many studies have nominally significant results) for interpreting the evidence. Statistical methods to combine data exist for almost any complex situation that may arise in a systematic review, but one has to be aware of their assumptions and limitations to avoid misapplying or misinterpreting these methods. Assessment of Consistency (Heterogeneity) We expect some variation (inconsistency) in the results of different studies due to chance alone. Variability in excess of that due to chance reflects true differences in the results of the trials, and is called “heterogeneity.” The conventional statistical approach to evaluating heterogeneity is a chi-squared test (Cochran's Q), but it has low power when there are few studies and excessive power when there are many studies [202]. By contrast, the I2 statistic quantifies the amount of variation in results across studies beyond that expected by chance and so is preferable to Q [202],[203]. I2 represents the percentage of the total variation in estimated effects across studies that is due to heterogeneity rather than to chance; some authors consider an I2 value less than 25% as low [202]. However, I2 also suffers from large uncertainty in the common situation where only a few studies are available [204], and reporting the uncertainty in I2 (e.g., as the 95% confidence interval) may be helpful [145]. When there are few studies, inferences about heterogeneity should be cautious. When considerable heterogeneity is observed, it is advisable to consider possible reasons [205]. In particular, the heterogeneity may be due to differences between subgroups of studies (see Item 16). Also, data extraction errors are a common cause of substantial heterogeneity in results with continuous outcomes [139]. When meta-analysis is done, authors should specify the effect measure (e.g., relative risk or mean difference) (see Item 13), the statistical method (e.g., inverse variance), and whether a fixed- or random-effects approach, or some other method (e.g., Bayesian) was used (see Box 6). If possible, authors should explain the reasons for those choices. Item 15: RISK OF BIAS ACROSS STUDIES Specify any assessment of risk of bias that may affect the cumulative evidence (e.g., publication bias, selective reporting within studies). Examples. “For each trial we plotted the effect by the inverse of its standard error. The symmetry of such ‘funnel plots’ was assessed both visually, and formally with Egger's test, to see if the effect decreased with increasing sample size.” [118] “We assessed the possibility of publication bias by evaluating a funnel plot of the trial mean differences for asymmetry, which can result from the non publication of small trials with negative results…Because graphical evaluation can be subjective, we also conducted an adjusted rank correlation test and a regression asymmetry test as formal statistical tests for publication bias…We acknowledge that other factors, such as differences in trial quality or true study heterogeneity, could produce asymmetry in funnel plots.” [119] Explanation Reviewers should explore the possibility that the available data are biased. They may examine results from the available studies for clues that suggest there may be missing studies (publication bias) or missing data from the included studies (selective reporting bias) (see Box 7). Authors should report in detail any methods used to investigate possible bias across studies. Box 7. Bias Caused by Selective Publication of Studies or Results within Studies Systematic reviews aim to incorporate information from all relevant studies. The absence of information from some studies may pose a serious threat to the validity of a review. Data may be incomplete because some studies were not published, or because of incomplete or inadequate reporting within a published article. These problems are often summarized as “publication bias” although in fact the bias arises from non-publication of full studies and selective publication of results in relation to their findings. Non-publication of research findings dependent on the actual results is an important risk of bias to a systematic review and meta-analysis. Missing Studies Several empirical investigations have shown that the findings from clinical trials are more likely to be published if the results are statistically significant (p 0.05 [208]. Also, among published studies, those with statistically significant results are published sooner than those with non-significant findings [209]. When some studies are missing for these reasons, the available results will be biased towards exaggerating the effect of an intervention. Missing Outcomes In many systematic reviews only some of the eligible studies (often a minority) can be included in a meta-analysis for a specific outcome. For some studies, the outcome may not be measured or may be measured but not reported. The former will not lead to bias, but the latter could. Evidence is accumulating that selective reporting bias is widespread and of considerable importance [42],[43]. In addition, data for a given outcome may be analyzed in multiple ways and the choice of presentation influenced by the results obtained. In a study of 102 randomized trials, comparison of published reports with trial protocols showed that a median of 38% efficacy and 50% safety outcomes per trial, respectively, were not available for meta-analysis. Statistically significant outcomes had a higher odds of being fully reported in publications when compared with non-significant outcomes for both efficacy (pooled odds ratio 2.4; 95% confidence interval 1.4 to 4.0) and safety (4.7, 1.8 to 12) data. Several other studies have had similar findings [210],[211]. Detection of Missing Information Missing studies may increasingly be identified from trials registries. Evidence of missing outcomes may come from comparison with the study protocol, if available, or by careful examination of published articles [11]. Study publication bias and selective outcome reporting are difficult to exclude or verify from the available results, especially when few studies are available. If the available data are affected by either (or both) of the above biases, smaller studies would tend to show larger estimates of the effects of the intervention. Thus one possibility is to investigate the relation between effect size and sample size (or more specifically, precision of the effect estimate). Graphical methods, especially the funnel plot [212], and analytic methods (e.g., Egger's test) are often used [213],[214],[215], although their interpretation can be problematic [216],[217]. Strictly speaking, such analyses investigate “small study bias”; there may be many reasons why smaller studies have systematically different effect sizes than larger studies, of which reporting bias is just one [218]. Several alternative tests for bias have also been proposed, beyond the ones testing small study bias [215],[219],[220], but none can be considered a gold standard. Although evidence that smaller studies had larger estimated effects than large ones may suggest the possibility that the available evidence is biased, misinterpretation of such data is common [123]. It is difficult to assess whether within-study selective reporting is present in a systematic review. If a protocol of an individual study is available, the outcomes in the protocol and the published report can be compared. Even in the absence of a protocol, outcomes listed in the methods section of the published report can be compared with those for which results are presented [120]. In only half of 196 trial reports describing comparisons of two drugs in arthritis were all the effect variables in the methods and results sections the same [82]. In other cases, knowledge of the clinical area may suggest that it is likely that the outcome was measured even if it was not reported. For example, in a particular disease, if one of two linked outcomes is reported but the other is not, then one should question whether the latter has been selectively omitted [121],[122]. Only 36% (76 of 212) of therapeutic systematic reviews published in November 2004 reported that study publication bias was considered, and only a quarter of those intended to carry out a formal assessment for that bias [3]. Of 60 meta-analyses in 24 articles published in 2005 in which formal assessments were reported, most were based on fewer than ten studies; most displayed statistically significant heterogeneity; and many reviewers misinterpreted the results of the tests employed [123]. A review of trials of antidepressants found that meta-analysis of only the published trials gave effect estimates 32% larger on average than when all trials sent to the drug agency were analyzed [67]. Item 16: ADDITIONAL ANALYSES Describe methods of additional analyses (e.g., sensitivity or subgroup analyses, meta-regression), if done, indicating which were pre-specified. Example. “Sensitivity analyses were pre-specified. The treatment effects were examined according to quality components (concealed treatment allocation, blinding of patients and caregivers, blinded outcome assessment), time to initiation of statins, and the type of statin. One post-hoc sensitivity analysis was conducted including unpublished data from a trial using cerivastatin.” [124] Explanation Authors may perform additional analyses to help understand whether the results of their review are robust, all of which should be reported. Such analyses include sensitivity analysis, subgroup analysis, and meta-regression [125]. Sensitivity analyses are used to explore the degree to which the main findings of a systematic review are affected by changes in its methods or in the data used from individual studies (e.g., study inclusion criteria, results of risk of bias assessment). Subgroup analyses address whether the summary effects vary in relation to specific (usually clinical) characteristics of the included studies or their participants. Meta-regression extends the idea of subgroup analysis to the examination of the quantitative influence of study characteristics on the effect size [126]. Meta-regression also allows authors to examine the contribution of different variables to the heterogeneity in study findings. Readers of systematic reviews should be aware that meta-regression has many limitations, including a danger of over-interpretation of findings [127],[128]. Even with limited data, many additional analyses can be undertaken. The choice of which analysis to undertake will depend on the aims of the review. None of these analyses, however, are exempt from producing potentially misleading results. It is important to inform readers whether these analyses were performed, their rationale, and which were pre-specified. RESULTS Item 17: STUDY SELECTION Give numbers of studies screened, assessed for eligibility, and included in the review, with reasons for exclusions at each stage, ideally with a flow diagram. Examples. In text: “A total of 10 studies involving 13 trials were identified for inclusion in the review. The search of Medline, PsycInfo and Cinahl databases provided a total of 584 citations. After adjusting for duplicates 509 remained. Of these, 479 studies were discarded because after reviewing the abstracts it appeared that these papers clearly did not meet the criteria. Three additional studies…were discarded because full text of the study was not available or the paper could not be feasibly translated into English. The full text of the remaining 27 citations was examined in more detail. It appeared that 22 studies did not meet the inclusion criteria as described. Five studies…met the inclusion criteria and were included in the systematic review. An additional five studies…that met the criteria for inclusion were identified by checking the references of located, relevant papers and searching for studies that have cited these papers. No unpublished relevant studies were obtained.” [129] See flow diagram Figure 2. 10.1371/journal.pmed.1000100.g002 Figure 2 Example Figure: Example flow diagram of study selection. DDW, Digestive Disease Week; UEGW, United European Gastroenterology Week. Reproduced with permission from [130]. Explanation Authors should report, ideally with a flow diagram, the total number of records identified from electronic bibliographic sources (including specialized database or registry searches), hand searches of various sources, reference lists, citation indices, and experts. It is useful if authors delineate for readers the number of selected articles that were identified from the different sources so that they can see, for example, whether most articles were identified through electronic bibliographic sources or from references or experts. Literature identified primarily from references or experts may be prone to citation or publication bias [131],[132]. The flow diagram and text should describe clearly the process of report selection throughout the review. Authors should report: unique records identified in searches; records excluded after preliminary screening (e.g., screening of titles and abstracts); reports retrieved for detailed evaluation; potentially eligible reports that were not retrievable; retrieved reports that did not meet inclusion criteria and the primary reasons for exclusion; and the studies included in the review. Indeed, the most appropriate layout may vary for different reviews. Authors should also note the presence of duplicate or supplementary reports so that readers understand the number of individual studies compared to the number of reports that were included in the review. Authors should be consistent in their use of terms, such as whether they are reporting on counts of citations, records, publications, or studies. We believe that reporting the number of studies is the most important. A flow diagram can be very useful; it should depict all the studies included based upon fulfilling the eligibility criteria, whether or not data have been combined for statistical analysis. A recent review of 87 systematic reviews found that about half included a QUOROM flow diagram [133]. The authors of this research recommended some important ways that reviewers can improve the use of a flow diagram when describing the flow of information throughout the review process, including a separate flow diagram for each important outcome reported [133]. Item 18: STUDY CHARACTERISTICS For each study, present characteristics for which data were extracted (e.g., study size, PICOS, follow-up period) and provide the citation. Examples. In text: “Characteristics of included studies Methods All four studies finally selected for the review were randomised controlled trials published in English. The duration of the intervention was 24 months for the RIO-North America and 12 months for the RIO-Diabetes, RIO-Lipids and RIO-Europe study. Although the last two described a period of 24 months during which they were conducted, only the first 12-months results are provided. All trials had a run-in, as a single blind period before the randomisation. Participants The included studies involved 6625 participants. The main inclusion criteria entailed adults (18 years or older), with a body mass index greater than 27 kg/m2 and less than 5 kg variation in body weight within the three months before study entry. Intervention All trials were multicentric. The RIO-North America was conducted in the USA and Canada, RIO-Europe in Europe and the USA, RIO-Diabetes in the USA and 10 other different countries not specified, and RIO-Lipids in eight unspecified different countries. The intervention received was placebo, 5 mg of rimonabant or 20 mg of rimonabant once daily in addition to a mild hypocaloric diet (600 kcal/day deficit). Outcomes Primary In all studies the primary outcome assessed was weight change from baseline after one year of treatment and the RIO-North America study also evaluated the prevention of weight regain between the first and second year. All studies evaluated adverse effects, including those of any kind and serious events. Quality of life was measured in only one study, but the results were not described (RIO-Europe). Secondary and additional outcomes These included prevalence of metabolic syndrome after one year and change in cardiometabolic risk factors such as blood pressure, lipid profile, etc. No study included mortality and costs as outcome. The timing of outcome measures was variable and could include monthly investigations, evaluations every three months or a single final evaluation after one year.” [134] In table: See Table 2. 10.1371/journal.pmed.1000100.t002 Table 2 Example Table: Summary of included studies evaluating the efficacy of antiemetic agents in acute gastroenteritis. Source Setting No. of Patients Age Range Inclusion Criteria Antiemetic Agent Route Follow-Up Freedman et al., 2006 ED 214 6 months–10 years GE with mild to moderate dehydration and vomiting in the preceding 4 hours Ondansetron PO 1–2 weeks Reeves et al., 2002 ED 107 1 month–22 years GE and vomiting requiring IV rehydration Ondansetron IV 5–7 days Roslund et al., 2007 ED 106 1–10 years GE with failed oral rehydration attempt in ED Ondansetron PO 1 week Stork et al., 2006 ED 137 6 months–12 years GE, recurrent emesis, mild to moderate dehydration, and failed oral hydration Ondansetron and dexamethasone IV 1 and 2 days ED, emergency department; GE, gastroenteritis; IV, intravenous; PO, by mouth. Adapted from [135]. Explanation For readers to gauge the validity and applicability of a systematic review's results, they need to know something about the included studies. Such information includes PICOS (Box 2) and specific information relevant to the review question. For example, if the review is examining the long-term effects of antidepressants for moderate depressive disorder, authors should report the follow-up periods of the included studies. For each included study, authors should provide a citation for the source of their information regardless of whether or not the study is published. This information makes it easier for interested readers to retrieve the relevant publications or documents. Reporting study-level data also allows the comparison of the main characteristics of the studies included in the review. Authors should present enough detail to allow readers to make their own judgments about the relevance of included studies. Such information also makes it possible for readers to conduct their own subgroup analyses and interpret subgroups, based on study characteristics. Authors should avoid, whenever possible, assuming information when it is missing from a study report (e.g., sample size, method of randomization). Reviewers may contact the original investigators to try to obtain missing information or confirm the data extracted for the systematic review. If this information is not obtained, this should be noted in the report. If information is imputed, the reader should be told how this was done and for which items. Presenting study-level data makes it possible to clearly identify unpublished information obtained from the original researchers and make it available for the public record. Typically, study-level characteristics are presented as a table as in the example in Table 2. Such presentation ensures that all pertinent items are addressed and that missing or unclear information is clearly indicated. Although paper-based journals do not generally allow for the quantity of information available in electronic journals or Cochrane reviews, this should not be accepted as an excuse for omission of important aspects of the methods or results of included studies, since these can, if necessary, be shown on a Web site. Following the presentation and description of each included study, as discussed above, reviewers usually provide a narrative summary of the studies. Such a summary provides readers with an overview of the included studies. It may for example address the languages of the published papers, years of publication, and geographic origins of the included studies. The PICOS framework is often helpful in reporting the narrative summary indicating, for example, the clinical characteristics and disease severity of the participants and the main features of the intervention and of the comparison group. For non-pharmacological interventions, it may be helpful to specify for each study the key elements of the intervention received by each group. Full details of the interventions in included studies were reported in only three of 25 systematic reviews relevant to general practice [84]. Item 19: RISK OF BIAS WITHIN STUDIES Present data on risk of bias of each study and, if available, any outcome-level assessment (see Item 12). Example. See Table 3. 10.1371/journal.pmed.1000100.t003 Table 3 Example Table: Quality measures of the randomized controlled trials that failed to fulfill any one of six markers of validity. Trials Concealment of Randomisation RCT Stopped Early Patients Blinded Health Care Providers Blinded Data Collectors Blinded Outcome Assessors Blinded Liu No No Yes Yes Yes Yes Stone Yes No No Yes Yes Yes Polderman Yes Yes No No No Yes Zaugg Yes No No No Yes Yes Urban Yes Yes No No, except anesthesiologists Yes Yes RCT, randomized controlled trial. Adapted from [96]. Explanation We recommend that reviewers assess the risk of bias in the included studies using a standard approach with defined criteria (see Item 12). They should report the results of any such assessments [89]. Reporting only summary data (e.g., “two of eight trials adequately concealed allocation”) is inadequate because it fails to inform readers which studies had the particular methodological shortcoming. A more informative approach is to explicitly report the methodological features evaluated for each study. The Cochrane Collaboration's new tool for assessing the risk of bias also requests that authors substantiate these assessments with any relevant text from the original studies [11]. It is often easiest to provide these data in a tabular format, as in the example. However, a narrative summary describing the tabular data can also be helpful for readers. Item 20: RESULTS OF INDIVIDUAL STUDIES For all outcomes considered (benefits and harms), present, for each study: (a) simple summary data for each intervention group and (b) effect estimates and confidence intervals, ideally with a forest plot. Examples. See Table 4 and Figure 3. 10.1371/journal.pmed.1000100.g003 Figure 3 Example Figure: Overall failure (defined as failure of assigned regimen or relapse) with tetracycline-rifampicin versus tetracycline-streptomycin. CI, confidence interval. Reproduced with permission from [137]. 10.1371/journal.pmed.1000100.t004 Table 4 Example Table: Heterotopic ossification in trials comparing radiotherapy to non-steroidal anti-inflammatory drugs after major hip procedures and fractures. Author (Year) Radiotherapy NSAID Kienapfel (1999) 12/49 24.5% 20/55 36.4% Sell (1998) 2/77 2.6% 18/77 23.4% Kolbl (1997) 39/188 20.7% 18/113 15.9% Kolbl (1998) 22/46 47.8% 6/54 11.1% Moore (1998) 9/33 27.3% 18/39 46.2% Bremen-Kuhne (1997) 9/19 47.4% 11/31 35.5% Knelles (1997) 5/101 5.0% 46/183 25.4% NSAID, non-steroidal anti-inflammatory drug. Adapted from [136]. Explanation Publication of summary data from individual studies allows the analyses to be reproduced and other analyses and graphical displays to be investigated. Others may wish to assess the impact of excluding particular studies or consider subgroup analyses not reported by the review authors. Displaying the results of each treatment group in included studies also enables inspection of individual study features. For example, if only odds ratios are provided, readers cannot assess the variation in event rates across the studies, making the odds ratio impossible to interpret [138]. Additionally, because data extraction errors in meta-analyses are common and can be large [139], the presentation of the results from individual studies makes it easier to identify errors. For continuous outcomes, readers may wish to examine the consistency of standard deviations across studies, for example, to be reassured that standard deviation and standard error have not been confused [138]. For each study, the summary data for each intervention group are generally given for binary outcomes as frequencies with and without the event (or as proportions such as 12/45). It is not sufficient to report event rates per intervention group as percentages. The required summary data for continuous outcomes are the mean, standard deviation, and sample size for each group. In reviews that examine time-to-event data, the authors should report the log hazard ratio and its standard error (or confidence interval) for each included study. Sometimes, essential data are missing from the reports of the included studies and cannot be calculated from other data but may need to be imputed by the reviewers. For example, the standard deviation may be imputed using the typical standard deviations in the other trials [116],[117] (see Item 14). Whenever relevant, authors should indicate which results were not reported directly and had to be estimated from other information (see Item 13). In addition, the inclusion of unpublished data should be noted. For all included studies it is important to present the estimated effect with a confidence interval. This information may be incorporated in a table showing study characteristics or may be shown in a forest plot [140]. The key elements of the forest plot are the effect estimates and confidence intervals for each study shown graphically, but it is preferable also to include, for each study, the numerical group-specific summary data, the effect size and confidence interval, and the percentage weight (see second example [Figure 3]). For discussion of the results of meta-analysis, see Item 21. In principle, all the above information should be provided for every outcome considered in the review, including both benefits and harms. When there are too many outcomes for full information to be included, results for the most important outcomes should be included in the main report with other information provided as a Web appendix. The choice of the information to present should be justified in light of what was originally stated in the protocol. Authors should explicitly mention if the planned main outcomes cannot be presented due to lack of information. There is some evidence that information on harms is only rarely reported in systematic reviews, even when it is available in the original studies [141]. Selective omission of harms results biases a systematic review and decreases its ability to contribute to informed decision making. Item 21: SYNTHESES OF RESULTS Present the main results of the review. If meta-analyses are done, include for each, confidence intervals and measures of consistency. Examples. “Mortality data were available for all six trials, randomizing 311 patients and reporting data for 305 patients. There were no deaths reported in the three respiratory syncytial virus/severe bronchiolitis trials; thus our estimate is based on three trials randomizing 232 patients, 64 of whom died. In the pooled analysis, surfactant was associated with significantly lower mortality (relative risk = 0.7, 95% confidence interval = 0.4–0.97, P = 0.04). There was no evidence of heterogeneity (I2 = 0%)”. [142] “Because the study designs, participants, interventions, and reported outcome measures varied markedly, we focused on describing the studies, their results, their applicability, and their limitations and on qualitative synthesis rather than meta-analysis.” [143] “We detected significant heterogeneity within this comparison (I2 = 46.6%; χ2 = 13.11, df = 7; P = 0.07). Retrospective exploration of the heterogeneity identified one trial that seemed to differ from the others. It included only small ulcers (wound area less than 5 cm2). Exclusion of this trial removed the statistical heterogeneity and did not affect the finding of no evidence of a difference in healing rate between hydrocolloids and simple low adherent dressings (relative risk = 0.98, [95% confidence interval] 0.85 to 1.12; I2 = 0%).” [144] Explanation Results of systematic reviews should be presented in an orderly manner. Initial narrative descriptions of the evidence covered in the review (see Item 18) may tell readers important things about the study populations and the design and conduct of studies. These descriptions can facilitate the examination of patterns across studies. They may also provide important information about applicability of evidence, suggest the likely effects of any major biases, and allow consideration, in a systematic manner, of multiple explanations for possible differences of findings across studies. If authors have conducted one or more meta-analyses, they should present the results as an estimated effect across studies with a confidence interval. It is often simplest to show each meta-analysis summary with the actual results of included studies in a forest plot (see Item 20) [140]. It should always be clear which of the included studies contributed to each meta-analysis. Authors should also provide, for each meta-analysis, a measure of the consistency of the results from the included studies such as I2 (heterogeneity; see Box 6); a confidence interval may also be given for this measure [145]. If no meta-analysis was performed, the qualitative inferences should be presented as systematically as possible with an explanation of why meta-analysis was not done, as in the second example above [143]. Readers may find a forest plot, without a summary estimate, helpful in such cases. Authors should in general report syntheses for all the outcome measures they set out to investigate (i.e., those described in the protocol; see Item 4) to allow readers to draw their own conclusions about the implications of the results. Readers should be made aware of any deviations from the planned analysis. Authors should tell readers if the planned meta-analysis was not thought appropriate or possible for some of the outcomes and the reasons for that decision. It may not always be sensible to give meta-analysis results and forest plots for each outcome. If the review addresses a broad question, there may be a very large number of outcomes. Also, some outcomes may have been reported in only one or two studies, in which case forest plots are of little value and may be seriously biased. Of 300 systematic reviews indexed in MEDLINE in 2004, a little more than half (54%) included meta-analyses, of which the majority (91%) reported assessing for inconsistency in results. Item 22: RISK OF BIAS ACROSS STUDIES Present results of any assessment of risk of bias across studies (see Item 15). Examples. “Strong evidence of heterogeneity (I2 = 79%, P<0.001) was observed. To explore this heterogeneity, a funnel plot was drawn. The funnel plot in Figure 4 shows evidence of considerable asymmetry.” [146] “Specifically, four sertraline trials involving 486 participants and one citalopram trial involving 274 participants were reported as having failed to achieve a statistically significant drug effect, without reporting mean HRSD [Hamilton Rating Scale for Depression] scores. We were unable to find data from these trials on pharmaceutical company Web sites or through our search of the published literature. These omissions represent 38% of patients in sertraline trials and 23% of patients in citalopram trials. Analyses with and without inclusion of these trials found no differences in the patterns of results; similarly, the revealed patterns do not interact with drug type. The purpose of using the data obtained from the FDA was to avoid publication bias, by including unpublished as well as published trials. Inclusion of only those sertraline and citalopram trials for which means were reported to the FDA would constitute a form of reporting bias similar to publication bias and would lead to overestimation of drug–placebo differences for these drug types. Therefore, we present analyses only on data for medications for which complete clinical trials' change was reported.” [147] 10.1371/journal.pmed.1000100.g004 Figure 4 Example Figure: Example of a funnel plot showing evidence of considerable asymmetry. SE, standard error. Adapted from [146], with permission. Explanation Authors should present the results of any assessments of risk of bias across studies. If a funnel plot is reported, authors should specify the effect estimate and measure of precision used, presented typically on the x-axis and y-axis, respectively. Authors should describe if and how they have tested the statistical significance of any possible asymmetry (see Item 15). Results of any investigations of selective reporting of outcomes within studies (as discussed in Item 15) should also be reported. Also, we advise authors to tell readers if any pre-specified analyses for assessing risk of bias across studies were not completed and the reasons (e.g., too few included studies). Item 23: ADDITIONAL ANALYSES Give results of additional analyses, if done (e.g., sensitivity or subgroup analyses, meta-regression [see Item 16]). Examples. “…benefits of chondroitin were smaller in trials with adequate concealment of allocation compared with trials with unclear concealment (P for interaction = 0.050), in trials with an intention-to-treat analysis compared with those that had excluded patients from the analysis (P for interaction = 0.017), and in large compared with small trials (P for interaction = 0.022).” [148] “Subgroup analyses according to antibody status, antiviral medications, organ transplanted, treatment duration, use of antilymphocyte therapy, time to outcome assessment, study quality and other aspects of study design did not demonstrate any differences in treatment effects. Multivariate meta-regression showed no significant difference in CMV [cytomegalovirus] disease after allowing for potential confounding or effect-modification by prophylactic drug used, organ transplanted or recipient serostatus in CMV positive recipients and CMV negative recipients of CMV positive donors.” [149] Explanation Authors should report any subgroup or sensitivity analyses and whether or not they were pre-specified (see Items 5 and 16). For analyses comparing subgroups of studies (e.g., separating studies of low- and high-dose aspirin), the authors should report any tests for interactions, as well as estimates and confidence intervals from meta-analyses within each subgroup. Similarly, meta-regression results (see Item 16) should not be limited to p-values, but should include effect sizes and confidence intervals [150], as the first example reported above does in a table. The amount of data included in each additional analysis should be specified if different from that considered in the main analyses. This information is especially relevant for sensitivity analyses that exclude some studies; for example, those with high risk of bias. Importantly, all additional analyses conducted should be reported, not just those that were statistically significant. This information will help avoid selective outcome reporting bias within the review as has been demonstrated in reports of randomized controlled trials [42],[44],[121],[151],[152]. Results from exploratory subgroup or sensitivity analyses should be interpreted cautiously, bearing in mind the potential for multiple analyses to mislead. DISCUSSION Item 24: SUMMARY OF EVIDENCE Summarize the main findings, including the strength of evidence for each main outcome; consider their relevance to key groups (e.g., health care providers, users, and policy makers). Example. “Overall, the evidence is not sufficiently robust to determine the comparative effectiveness of angioplasty (with or without stenting) and medical treatment alone. Only 2 randomized trials with long-term outcomes and a third randomized trial that allowed substantial crossover of treatment after 3 months directly compared angioplasty and medical treatment…the randomized trials did not evaluate enough patients or did not follow patients for a sufficient duration to allow definitive conclusions to be made about clinical outcomes, such as mortality and cardiovascular or kidney failure events. Some acceptable evidence from comparison of medical treatment and angioplasty suggested no difference in long-term kidney function but possibly better blood pressure control after angioplasty, an effect that may be limited to patients with bilateral atherosclerotic renal artery stenosis. The evidence regarding other outcomes is weak. Because the reviewed studies did not explicitly address patients with rapid clinical deterioration who may need acute intervention, our conclusions do not apply to this important subset of patients.” [143] Explanation Authors should give a brief and balanced summary of the nature and findings of the review. Sometimes, outcomes for which little or no data were found should be noted due to potential relevance for policy decisions and future research. Applicability of the review's findings, to different patients, settings, or target audiences, for example, should be mentioned. Although there is no standard way to assess applicability simultaneously to different audiences, some systems do exist [153]. Sometimes, authors formally rate or assess the overall body of evidence addressed in the review and can present the strength of their summary recommendations tied to their assessments of the quality of evidence (e.g., the GRADE system) [10]. Authors need to keep in mind that statistical significance of the effects does not always suggest clinical or policy relevance. Likewise, a non-significant result does not demonstrate that a treatment is ineffective. Authors should ideally clarify trade-offs and how the values attached to the main outcomes would lead different people to make different decisions. In addition, adroit authors consider factors that are important in translating the evidence to different settings and that may modify the estimates of effects reported in the review [153]. Patients and health care providers may be primarily interested in which intervention is most likely to provide a benefit with acceptable harms, while policy makers and administrators may value data on organizational impact and resource utilization. Item 25: LIMITATIONS Discuss limitations at study and outcome level (e.g., risk of bias), and at review level (e.g., incomplete retrieval of identified research, reporting bias). Examples. Outcome level: “The meta-analysis reported here combines data across studies in order to estimate treatment effects with more precision than is possible in a single study. The main limitation of this meta-analysis, as with any overview, is that the patient population, the antibiotic regimen and the outcome definitions are not the same across studies.” [154] Study and review level: “Our study has several limitations. The quality of the studies varied. Randomization was adequate in all trials; however, 7 of the articles did not explicitly state that analysis of data adhered to the intention-to-treat principle, which could lead to overestimation of treatment effect in these trials, and we could not assess the quality of 4 of the 5 trials reported as abstracts. Analyses did not identify an association between components of quality and re-bleeding risk, and the effect size in favour of combination therapy remained statistically significant when we excluded trials that were reported as abstracts. Publication bias might account for some of the effect we observed. Smaller trials are, in general, analyzed with less methodological rigor than larger studies, and an asymmetrical funnel plot suggests that selective reporting may have led to an overestimation of effect sizes in small trials.” [155] Explanation A discussion of limitations should address the validity (i.e., risk of bias) and reporting (informativeness) of the included studies, limitations of the review process, and generalizability (applicability) of the review. Readers may find it helpful if authors discuss whether studies were threatened by serious risks of bias, whether the estimates of the effect of the intervention are too imprecise, or if there were missing data for many participants or important outcomes. Limitations of the review process might include limitations of the search (e.g., restricting to English-language publications), and any difficulties in the study selection, appraisal, and meta-analysis processes. For example, poor or incomplete reporting of study designs, patient populations, and interventions may hamper interpretation and synthesis of the included studies [84]. Applicability of the review may be affected if there are limited data for certain populations or subgroups where the intervention might perform differently or few studies assessing the most important outcomes of interest; or if there is a substantial amount of data relating to an outdated intervention or comparator or heavy reliance on imputation of missing values for summary estimates (Item 14). Item 26: CONCLUSIONS Provide a general interpretation of the results in the context of other evidence, and implications for future research. Example. Implications for practice: “Between 1995 and 1997 five different meta-analyses of the effect of antibiotic prophylaxis on infection and mortality were published. All confirmed a significant reduction in infections, though the magnitude of the effect varied from one review to another. The estimated impact on overall mortality was less evident and has generated considerable controversy on the cost effectiveness of the treatment. Only one among the five available reviews, however, suggested that a weak association between respiratory tract infections and mortality exists and lack of sufficient statistical power may have accounted for the limited effect on mortality.” Implications for research: “A logical next step for future trials would thus be the comparison of this protocol against a regimen of a systemic antibiotic agent only to see whether the topical component can be dropped. We have already identified six such trials but the total number of patients so far enrolled (n = 1056) is too small for us to be confident that the two treatments are really equally effective. If the hypothesis is therefore considered worth testing more and larger randomised controlled trials are warranted. Trials of this kind, however, would not resolve the relevant issue of treatment induced resistance. To produce a satisfactory answer to this, studies with a different design would be necessary. Though a detailed discussion goes beyond the scope of this paper, studies in which the intensive care unit rather than the individual patient is the unit of randomisation and in which the occurrence of antibiotic resistance is monitored over a long period of time should be undertaken.” [156] Explanation Systematic reviewers sometimes draw conclusions that are too optimistic [157] or do not consider the harms equally as carefully as the benefits, although some evidence suggests these problems are decreasing [158]. If conclusions cannot be drawn because there are too few reliable studies, or too much uncertainty, this should be stated. Such a finding can be as important as finding consistent effects from several large studies. Authors should try to relate the results of the review to other evidence, as this helps readers to better interpret the results. For example, there may be other systematic reviews about the same general topic that have used different methods or have addressed related but slightly different questions [159],[160]. Similarly, there may be additional information relevant to decision makers, such as the cost-effectiveness of the intervention (e.g., health technology assessment). Authors may discuss the results of their review in the context of existing evidence regarding other interventions. We advise authors also to make explicit recommendations for future research. In a sample of 2,535 Cochrane reviews, 82% included recommendations for research with specific interventions, 30% suggested the appropriate type of participants, and 52% suggested outcome measures for future research [161]. There is no corresponding assessment about systematic reviews published in medical journals, but we believe that such recommendations are much less common in those reviews. Clinical research should not be planned without a thorough knowledge of similar, existing research [162]. There is evidence that this still does not occur as it should and that authors of primary studies do not consider a systematic review when they design their studies [163]. We believe systematic reviews have great potential for guiding future clinical research. FUNDING Item 27: FUNDING Describe sources of funding or other support (e.g., supply of data) for the systematic review; role of funders for the systematic review. Examples: “The evidence synthesis upon which this article was based was funded by the Centers for Disease Control and Prevention for the Agency for Healthcare Research and Quality and the U.S. Prevention Services Task Force.” [164] “Role of funding source: the funders played no role in study design, collection, analysis, interpretation of data, writing of the report, or in the decision to submit the paper for publication. They accept no responsibility for the contents.” [165] Explanation Authors of systematic reviews, like those of any other research study, should disclose any funding they received to carry out the review, or state if the review was not funded. Lexchin and colleagues [166] observed that outcomes of reports of randomized trials and meta-analyses of clinical trials funded by the pharmaceutical industry are more likely to favor the sponsor's product compared to studies with other sources of funding. Similar results have been reported elsewhere [167],[168]. Analogous data suggest that similar biases may affect the conclusions of systematic reviews [169]. Given the potential role of systematic reviews in decision making, we believe authors should be transparent about the funding and the role of funders, if any. Sometimes the funders will provide services, such as those of a librarian to complete the searches for relevant literature or access to commercial databases not available to the reviewers. Any level of funding or services provided to the systematic review team should be reported. Authors should also report whether the funder had any role in the conduct or report of the review. Beyond funding issues, authors should report any real or perceived conflicts of interest related to their role or the role of the funder in the reporting of the systematic review [170]. In a survey of 300 systematic reviews published in November 2004, funding sources were not reported in 41% of the reviews [3]. Only a minority of reviews (2%) reported being funded by for-profit sources, but the true proportion may be higher [171]. Additional Considerations for Systematic Reviews of Non-Randomized Intervention Studies or for Other Types of Systematic Reviews The PRISMA Statement and this document have focused on systematic reviews of reports of randomized trials. Other study designs, including non-randomized studies, quasi-experimental studies, and interrupted time series, are included in some systematic reviews that evaluate the effects of health care interventions [172],[173]. The methods of these reviews may differ to varying degrees from the typical intervention review, for example regarding the literature search, data abstraction, assessment of risk of bias, and analysis methods. As such, their reporting demands might also differ from what we have described here. A useful principle is for systematic review authors to ensure that their methods are reported with adequate clarity and transparency to enable readers to critically judge the available evidence and replicate or update the research. In some systematic reviews, the authors will seek the raw data from the original researchers to calculate the summary statistics. These systematic reviews are called individual patient (or participant) data reviews [40],[41]. Individual patient data meta-analyses may also be conducted with prospective accumulation of data rather than retrospective accumulation of existing data. Here too, extra information about the methods will need to be reported. Other types of systematic reviews exist. Realist reviews aim to determine how complex programs work in specific contexts and settings [174]. Meta-narrative reviews aim to explain complex bodies of evidence through mapping and comparing different over-arching storylines [175]. Network meta-analyses, also known as multiple treatments meta-analyses, can be used to analyze data from comparisons of many different treatments [176],[177]. They use both direct and indirect comparisons, and can be used to compare interventions that have not been directly compared. We believe that the issues we have highlighted in this paper are relevant to ensure transparency and understanding of the processes adopted and the limitations of the information presented in systematic reviews of different types. We hope that PRISMA can be the basis for more detailed guidance on systematic reviews of other types of research, including diagnostic accuracy and epidemiological studies. Discussion We developed the PRISMA Statement using an approach for developing reporting guidelines that has evolved over several years [178]. The overall aim of PRISMA is to help ensure the clarity and transparency of reporting of systematic reviews, and recent data indicate that this reporting guidance is much needed [3]. PRISMA is not intended to be a quality assessment tool and it should not be used as such. This PRISMA Explanation and Elaboration document was developed to facilitate the understanding, uptake, and dissemination of the PRISMA Statement and hopefully provide a pedagogical framework for those interested in conducting and reporting systematic reviews. It follows a format similar to that used in other explanatory documents [17],[18],[19]. Following the recommendations in the PRISMA checklist may increase the word count of a systematic review report. We believe, however, that the benefit of readers being able to critically appraise a clear, complete, and transparent systematic review report outweighs the possible slight increase in the length of the report. While the aims of PRISMA are to reduce the risk of flawed reporting of systematic reviews and improve the clarity and transparency in how reviews are conducted, we have little data to state more definitively whether this “intervention” will achieve its intended goal. A previous effort to evaluate QUOROM was not successfully completed [178]. Publication of the QUOROM Statement was delayed for two years while a research team attempted to evaluate its effectiveness by conducting a randomized controlled trial with the participation of eight major medical journals. Unfortunately that trial was not completed due to accrual problems (David Moher, personal communication). Other evaluation methods might be easier to conduct. At least one survey of 139 published systematic reviews in the critical care literature [179] suggests that their quality improved after the publication of QUOROM. If the PRISMA Statement is endorsed by and adhered to in journals, as other reporting guidelines have been [17],[18],[19],[180], there should be evidence of improved reporting of systematic reviews. For example, there have been several evaluations of whether the use of CONSORT improves reports of randomized controlled trials. A systematic review of these studies [181] indicates that use of CONSORT is associated with improved reporting of certain items, such as allocation concealment. We aim to evaluate the benefits (i.e., improved reporting) and possible adverse effects (e.g., increased word length) of PRISMA and we encourage others to consider doing likewise. Even though we did not carry out a systematic literature search to produce our checklist, and this is indeed a limitation of our effort, PRISMA was nevertheless developed using an evidence-based approach, whenever possible. Checklist items were included if there was evidence that not reporting the item was associated with increased risk of bias, or where it was clear that information was necessary to appraise the reliability of a review. To keep PRISMA up-to-date and as evidence-based as possible requires regular vigilance of the literature, which is growing rapidly. Currently the Cochrane Methodology Register has more than 11,000 records pertaining to the conduct and reporting of systematic reviews and other evaluations of health and social care. For some checklist items, such as reporting the abstract (Item 2), we have used evidence from elsewhere in the belief that the issue applies equally well to reporting of systematic reviews. Yet for other items, evidence does not exist; for example, whether a training exercise improves the accuracy and reliability of data extraction. We hope PRISMA will act as a catalyst to help generate further evidence that can be considered when further revising the checklist in the future. More than ten years have passed between the development of the QUOROM Statement and its update, the PRISMA Statement. We aim to update PRISMA more frequently. We hope that the implementation of PRISMA will be better than it has been for QUOROM. There are at least two reasons to be optimistic. First, systematic reviews are increasingly used by health care providers to inform “best practice” patient care. Policy analysts and managers are using systematic reviews to inform health care decision making, and to better target future research. Second, we anticipate benefits from the development of the EQUATOR Network, described below. Developing any reporting guideline requires considerable effort, experience, and expertise. While reporting guidelines have been successful for some individual efforts [17],[18],[19], there are likely others who want to develop reporting guidelines who possess little time, experience, or knowledge as to how to do so appropriately. The EQUATOR Network (Enhancing the QUAlity and Transparency Of health Research) aims to help such individuals and groups by serving as a global resource for anybody interested in developing reporting guidelines, regardless of the focus [7],[180],[182]. The overall goal of EQUATOR is to improve the quality of reporting of all health science research through the development and translation of reporting guidelines. Beyond this aim, the network plans to develop a large Web presence by developing and maintaining a resource center of reporting tools, and other information for reporting research (http://www.equator-network.org/). We encourage health care journals and editorial groups, such as the World Association of Medical Editors and the International Committee of Medical Journal Editors, to endorse PRISMA in much the same way as they have endorsed other reporting guidelines, such as CONSORT. We also encourage editors of health care journals to support PRISMA by updating their “Instructions to Authors” and including the PRISMA Web address, and by raising awareness through specific editorial actions. Supporting Information Figure S1 Flow of information through the different phases of a systematic review (downloadable template document for researchers to re-use). (0.08 MB DOC) Click here for additional data file. Text S1 Checklist of items to include when reporting a systematic review or meta-analysis (downloadable template document for researchers to re-use). (0.04 MB DOC) Click here for additional data file.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              High-quality health systems in the Sustainable Development Goals era: time for a revolution

              Executive summary Although health outcomes have improved in low-income and middle-income countries (LMICs) in the past several decades, a new reality is at hand. Changing health needs, growing public expectations, and ambitious new health goals are raising the bar for health systems to produce better health outcomes and greater social value. But staying on current trajectory will not suffice to meet these demands. What is needed are high-quality health systems that optimise health care in each given context by consistently delivering care that improves or maintains health, by being valued and trusted by all people, and by responding to changing population needs. Quality should not be the purview of the elite or an aspiration for some distant future; it should be the DNA of all health systems. Furthermore, the human right to health is meaningless without good quality care because health systems cannot improve health without it. We propose that health systems be judged primarily on their impacts, including better health and its equitable distribution; on the confidence of people in their health system; and on their economic benefit, and processes of care, consisting of competent care and positive user experience. The foundations of high-quality health systems include the population and their health needs and expectations, governance of the health sector and partnerships across sectors, platforms for care delivery, workforce numbers and skills, and tools and resources, from medicines to data. In addition to strong foundations, health systems need to develop the capacity to measure and use data to learn. High-quality health systems should be informed by four values: they are for people, and they are equitable, resilient, and efficient. For this Commission, we examined the literature, analysed surveys, and did qualitative and quantitative research to evaluate the quality of care available to people in LMICs across a range of health needs included in the Sustainable Development Goals (SDGs). We explored the ethical dimensions of high-quality care in resource-constrained settings and reviewed available measures and improvement approaches. We reached five conclusions: The care that people receive is often inadequate, and poor-quality care is common across conditions and countries, with the most vulnerable populations faring the worst Data from a range of countries and conditions show systematic deficits in quality of care. In LMICs, mothers and children receive less than half of recommended clinical actions in a typical preventive or curative visit, less than half of suspected cases of tuberculosis are correctly managed, and fewer than one in ten people diagnosed with major depressive disorder receive minimally adequate treatment. Diagnoses are frequently incorrect for serious conditions, such as pneumonia, myocardial infarction, and newborn asphyxia. Care can be too slow for conditions that require timely action, reducing chances of survival. At the system level, we found major gaps in safety, prevention, integration, and continuity, reflected by poor patient retention and insufficient coordination across platforms of care. One in three people across LMICs cited negative experiences with their health system in the areas of attention, respect, communication, and length of visit (visits of 5 min are common); on the extreme end of these experiences were disrespectful treatment and abuse. Quality of care is worst for vulnerable groups, including the poor, the less educated, adolescents, those with stigmatised conditions, and those at the edges of health systems, such as people in prisons. Universal health coverage (UHC) can be a starting point for improving the quality of health systems. Improving quality should be a core component of UHC initiatives, alongside expanding coverage and financial protection. Governments should start by establishing a national quality guarantee for health services, specifying the level of competence and user experience that people can expect. To ensure that all people will benefit from improved services, expansion should prioritise the poor and their health needs from the start. Progress on UHC should be measured through effective (quality-corrected) coverage. High-quality health systems could save over 8 million lives each year in LMICs More than 8 million people per year in LMICs die from conditions that should be treatable by the health system. In 2015 alone, these deaths resulted in US$6 trillion in economic losses. Poor-quality care is now a bigger barrier to reducing mortality than insufficient access. 60% of deaths from conditions amenable to health care are due to poor-quality care, whereas the remaining deaths result from non-utilisation of the health system. High-quality health systems could prevent 2·5 million deaths from cardiovascular disease, 1 million newborn deaths, 900 000 deaths from tuberculosis, and half of all maternal deaths each year. Quality of care will become an even larger driver of population health as utilisation of health systems increases and as the burden of disease shifts to more complex conditions. The high mortality rates in LMICs for treatable causes, such as injuries and surgical conditions, maternal and newborn complications, cardiovascular disease, and vaccine preventable diseases, illustrate the breadth and depth of the healthcare quality challenge. Poor-quality care can lead to other adverse outcomes, including unnecessary health-related suffering, persistent symptoms, loss of function, and a lack of trust and confidence in health systems. Waste of resources and catastrophic expenditures are economic side effects of poor-quality health systems. As a result of this, only one-quarter of people in LMICs believe that their health systems work well. Health systems should measure and report what matters most to people, such as competent care, user experience, health outcomes, and confidence in the system Measurement is key to accountability and improvement, but available measures do not capture many of the processes and outcomes that matter most to people. At the same time, data systems generate many metrics that produce inadequate insight at a substantial cost in funds and health workers’ time. For example, although inputs such as medicines and equipment are commonly counted in surveys, these are weakly related to the quality of care that people receive. Indicators such as proportion of births with skilled attendants do not reflect quality of childbirth care and might lead to false complacency about progress in maternal and newborn health. This Commission calls for fewer, but better, measures of health system quality to be generated and used at national and subnational levels. Countries should report health system performance to the public annually by use of a dashboard of key metrics (eg, health outcomes, people’s confidence in the system, system competence, and user experience) along with measures of financial protection and equity. Robust vital registries and trustworthy routine health information systems are prerequisites for good performance assessment. Countries need agile new surveys and real-time measures of health facilities and populations that reflect the health systems of today and not those of the past. To generate and interpret data, countries need to invest in national institutions and professionals with strong quantitative and analytical skills. Global development partners can support the generation and testing of public goods for health system measurement (civil and vital registries, routine data systems, and routine health system surveys) and promote national and regional institutions and the training and mentoring of scientists. New research is crucial for the transformation of low-quality health systems to high-quality ones Data on care quality in LMICs do not reflect the current disease burden. In many of these countries, we know little about quality of care for respiratory diseases, cancer, mental health, injuries, and surgery, as well as the care of adolescents and elderly people. There are vast blind spots in areas such as user experience, system competence, confidence in the system, and the wellbeing of people, including patient-reported outcomes. Measuring the quality of the health system as a whole and across the care continuum is essential, but not done. Filling in these gaps will require not only better routine health information systems for monitoring, but also new research, as proposed in the research agenda of this Commission. For example, research will be needed to rigorously evaluate the effects and costs of recommended improvement approaches on health, patient experience, and financial protection. Implementation science studies can help discern the contextual factors that promote or hinder reform. New data collection and research should be explicitly designed to build national and regional research capacity. Improving quality of care will require system-wide action To address the scale and range of quality deficits we documented in this Commission, reforming the foundations of the health system is required. Because health systems are complex adaptive systems that function at multiple interconnected levels, fixes at the micro-level (ie, health-care provider or clinic) alone are unlikely to alter the underlying performance of the whole system. However, we found that interventions aimed at changing provider behaviour dominate the improvement field, even though many of these interventions have a modest effect on provider performance and are difficult to scale and sustain over time. Achieving high-quality health systems requires expanding the space for improvement to structural reforms that act on the foundations of the system. This Commission endorses four universal actions to raise quality across the health system. First, health system leaders need to govern for quality by adopting a shared vision of quality care, a clear quality strategy, strong regulation, and continuous learning. Ministries of health cannot accomplish this alone and need to partner with the private sector, civil society, and sectors outside of health care, such as education, infrastructure, communication, and transport. Second, countries should redesign service delivery to maximise health outcomes rather than geographical access to services alone. Primary care could tackle a greater range of low-acuity conditions, whereas hospitals or specialised health centres should provide care for conditions, such as births, that need advanced clinical expertise or have the risk of unexpected complications. Third, countries should transform the health workforce by adopting competency-based clinical education, introducing training in ethics and respectful care, and better supporting and respecting all workers to deliver the best care possible. Fourth, governments and civil society should ignite demand for quality in the population to empower people to hold systems accountable and actively seek high-quality care. Additional targeted actions in areas such as health financing, management, district-level learning, and others can complement these efforts. What works in one setting might not work elsewhere, and improvement efforts should be adapted for local context and monitored. Funders should align their support with system-wide strategies rather than contribute to the proliferation of micro-level efforts. In this Commission, we assert that providing health services without guaranteeing a minimum level of quality is ineffective, wasteful, and unethical. Moving to a high-quality health system—one that improves health and generates confidence and economic benefits—is primarily a political, not technical, decision. National governments need to invest in high-quality health systems for their own people and make such systems accountable to people through legislation, education about rights, regulation, transparency, and greater public participation. Countries will know that they are on the way towards a high-quality, accountable health system when health workers and policymakers choose to receive health care in their own public institutions. Introduction The past 20 years have been called a golden age for global health. 1 Fuelled by a major increase in domestic health spending and donor funding, LMICs have vastly expanded access to health determinants (eg, clean water and sanitation) and health services alike (eg, vaccination, antenatal care, and HIV treatment). 2–4 These expansions have saved the lives of millions of children, men, and women, largely by averting deaths from infectious diseases. 5 However, these past decades were not as favourable for preventing deaths from non-communicable diseases and acute conditions, such as ischaemic heart disease, stroke, diabetes, neonatal mortality, and injuries, for which mortality stagnated or increased. 6 The lowest-income countries and the poorest people within countries generally had the worst outcomes, despite considerable efforts to increase use of health care. 7 The strategy that brought big wins for child health and infectious diseases will not suffice to reach the health-related SDGs. The newly ascendant health conditions, including chronic and complex conditions, require more than a single visit or standardised pill pack; they require highly skilled, longitudinal, and integrated care. Such care is also needed to address the substantial residual mortality from maternal and child conditions and infectious diseases. In short, it is becoming clear that access to health care is not enough, and that good quality of care is needed to improve outcomes. India learned this with Janani Suraksha Yojana, a cash incentive programme for facility births, which massively increased facility delivery but did not measurably reduce maternal or newborn mortality. 8 High-quality care involves thorough assessment, detection of asymptomatic and co-existing conditions, accurate diagnosis, appropriate and timely treatment, referral when needed for hospital care and surgery, and the ability to follow the patient and adjust the treatment course as needed. Health systems should also take into account the needs, experiences, and preferences of people and their right to be treated with respect. 9 Although many consumer services make user experience a central mission, health systems—like other public sector systems—are often difficult to use, indifferent to the time and preferences of people, and reluctant to share decision-making processes. 10 Indeed, some providers are rude and even abusive—a fundamental abrogation of human rights and health system obligations. 9 At the same time, health workers might not receive the support and respect required to have a fulfilling professional life. Finally, systems can be inefficient, wasting scarce resources on unnecessary care and on low-quality clinics that people bypass, while imposing high costs on users. 11 The SDG era demands new ways of thinking about health systems. Although they are only one contributor to good health—other major contributors being social determinants of health such as education, wealth, employment, and social protections, and cross-sectoral public health actions such as tobacco taxation and improved food, water, and road and occupational safety regulations 12 —access to high-quality health care is a human right and moral imperative for every country. 13 Moreover, health systems are a powerful engine for improving survival and wellbeing and they are the focus of our report. 14,15 We endorse WHO’s definition of a health system as consisting of “all organisations, people, and actions whose primary intent is to promote, restore, or maintain health”, and we focus this Commission on the organised health sector, public and private, including community health workers. 16 Although informal providers (those with little or no formal clinical training) also provide care in some countries, there are—with a few notable exceptions—insufficient data on the quality of care offered by these providers, and we do not cover them in this Commission. Addressing quality of care is particularly pertinent as countries begin to implement UHC. 17 UHC represents a substantial new investment of national resources—one that embodies new concrete commitments about the type of care that people have a right to expect. Newly transparent benefit packages can, in turn, create public expectations that governments will be under pressure to fulfil. Furthermore, new investments in health care will face scrutiny from finance ministers, who will demand efficient use of resources and better results measured in longer lifespans, restored physical and mental functions, user satisfaction, and economic productivity. What should a high-quality health system look like in countries with resource constraints and competing health priorities that aspire to reach the SDGs? The Lancet Global Health Commission on High-Quality Health Systems in the SDG Era, comprised of 30 academics, policy makers, and health system experts from 18 countries, seeks to answer this question. 18 In this Commission, we propose new ways to define, measure, and improve the performance of health systems. We review evidence of past approaches and look for strategies that can change the trajectory of health systems in LMICs. Our work is informed by several principles. First, the principle that health systems are for people. Health systems need to work with people not only to improve health outcomes, but also to generate non-health-related value, such as trust and economic benefit for all people, including the poor and vulnerable. Second, the principle that people should be able to receive good quality, respectful care for any health concern that can be tackled within their country’s resource capacity. Third, the principle that high-quality care should be the raison d’être of the health system, rather than a peripheral activity in ministries of health. Finally, the principle that fundamental change should be prioritised over piecemeal approaches. We recognise that health systems are complex adaptive systems that resist change and can be impervious to isolated interventions; indeed, multiple small-scale efforts can be deleterious. Quality of care is an emergent property that requires shared aims among all health system actors, favourable health system foundations, and is honed through iterative efforts to improve and learn from successes and failures. These considerations guided our analysis. We are also aware of other major efforts on quality of care at the time of the writing of this Commission. WHO convened the Quality of Care Network to facilitate joint learning, accelerate scale-up of quality maternal, newborn, and child services, and strengthen the evidence for cost-effective approaches. WHO, the World Bank, and the Organisation for Economic Co-operation and Development (OECD) published a global report on quality of health care earlier in 2018. 19 The US National Academy of Medicine has begun a study on improving the quality of health care across the globe. There is also new interest in stronger primary care that can promote health, prevent illness, identify the sick from the healthy, and efficiently manage the needs of those with chronic disease. 20 The Primary Health Care Performance Initiative, a multistakeholder effort, is focusing on measuring and comparing the functioning of primary health-care systems and identifying pathways for improvement. 21 Primary care has been a main platform for provision of health care in low-income countries, but there—as elsewhere—the changing disease burden, urbanisation, and rising demand for advanced services and excellent user experience are challenging this current model of care. Our work was substantially strengthened with input from nine National High-Quality Health Systems Commissions that were formed to explore quality of care in their local contexts alongside the global Commission. To ensure that our work reflects the needs of people and communities, we have sought input from a people’s voice advisory board and we obtained advice and policy perspectives from an external advisory council. Our intended audiences for the report are people, national leaders, health and finance ministers, policy makers, managers, providers, global partners (bilateral and multilateral institutions and foundations), advocates, civil society, and academics. This report is arranged in the following manner: in section 1, we propose a new definition for high-quality health systems; in section 2, we describe the state of health system quality in LMICs, bringing together multiple national and cross-national data on quality of care for the first time; in section 3, we tackle the ethics of good quality of care and propose mechanisms for ensuring that the poor and vulnerable benefit from improvement; in section 4, we review the current status of quality measurements and propose how to measure better and more efficiently; in section 5, we reassess the available options for improvement and recommend new structural solutions; in section 6, we conclude with a summary of our key messages, our recommendations, and a research agenda. We recognise that the level of ambition implied in our recommendations might be daunting to low-income countries that are struggling to put in place the basics of health care. In this Commission, we are describing a new aspiration for health systems that can guide policies and investments now. Regardless of starting point, every country has opportunities to get started on the path to high-quality health systems. Section 1: Redefining high-quality health systems The systematic examination of health-care quality began with the work of Avedis Donabedian, whose 1966 article 22 proposed a framework for quality of care assessment that described quality along the dimensions of structure, process, and outcomes of care. At the turn of the 21st century, the Committee on Quality of Health Care in America of the Institute of Medicine (IOM) produced two influential quality reports 23,24 that galvanised the examination of quality in the US health system and prompted similar investigations in other industrialised countries. The IOM Committee defined quality of care as “the degree to which health services for individuals and populations increase the likelihood of desired health outcomes and are consistent with current professional knowledge”. 23 The committee noted that 21st century health systems should seek to improve performance on six dimensions of quality of care: safety, effectiveness, patient-centredness, timeliness, efficiency, and equity. The committee also observed that “the current care systems cannot do the job. Trying harder will not work. Changing systems of care will.” 23 In 2010, Michael Porter proposed 25 that health systems be fundamentally accountable for producing value, which should be defined around the user. International organisations, such as WHO, and many low-income and high-income countries have relied on the IOM definition of quality and its core dimensions. WHO has also separately defined integrated people-centred health systems as systems where “all people have equal access to quality health services that are coproduced in a way that meets their life course needs”. 26 Building on this and other work, this section sets out our rationale for an updated definition of high-quality health systems and a conceptual framework ready for the health challenges, patient expectations, and rising ambitions of today. 27,28 The improvement of health outcomes is the sine qua non of health systems; these outcomes include longer lives, better quality of life, and improved capacity to function. In addition to better health, people derive security and confidence from having a trusted source of care when illness renders them most vulnerable. In this way, health systems also function as key social institutions, both deriving from and shaping social norms and able to promote or corrode public trust. 29,30 Finally, health systems cannot be static and must adapt to changing societal needs. This Commission defines a high-quality health system as the following: A high-quality health system is one that optimises health care in a given context by consistently delivering care that improves or maintains health outcomes, by being valued and trusted by all people, and by responding to changing population needs. Context is paramount in this definition; health systems have been shaped by different histories and, as a result, function differently across LMICs. High-quality health systems are underpinned by four values: high-quality health systems are for people and are equitable, resilient, and efficient. A focus on people begins with the self-evident observation that health systems must reach people—access is a prerequisite for benefiting from health care. However, this focus also signifies that people are not just beneficiaries of health services, but have a right to health care and have agency over their health and health-care decisions. Therefore, people become accountability agents, able to hold health system actors to account. The emphasis on people-centredness is especially crucial in health care because of the asymmetry of power and information between provider and patient. The focus on people works not only as a moral imperative to protect against the adverse effects of this power imbalance, but also as a corrective action that reduces the imbalance through patient empowerment and better accountability. Health systems must also treat well the people that work within them, who deserve a supportive work environment (safe working conditions, efficient and supportive management, and appropriate role assignment) and are themselves health-care users. Demotivated providers cannot contribute to a high-quality health system. A focus on equity means that high-quality health care needs to be available and affordable for all people, regardless of underlying social disadvantages. Measures of quality need to be disaggregated by stratifiers of social power—such as wealth, gender, or ethnicity—and quality improvements should explicitly include poor and vulnerable people to redress existing inequities. Health systems in LMICs have been slow to change from their legacy functions focused on infectious diseases and maternal and child health, but health needs and expectations are shifting, sometimes quickly. Health crises, such as the Ebola epidemic, acutely illustrate the need for resilient systems, defined as systems that can prepare for and effectively respond to crises while maintaining core functions and reorganising if needed. 31 High-quality health systems also need everyday resilience to respond to routine challenges, and this requires accountable leaders who respect and motivate their front-line staff. 32 Lastly, health systems must be efficient: although spending on health systems is tightly associated with income and therefore varies greatly across LMICs, all health systems should aim to avoid waste and achieve the maximum possible improvement in health outcomes with the investment received. We propose a new conceptual framework for high-quality health systems with three key domains: foundations, processes of care, and quality impacts (figure 1). This framework stems from our definition of high-quality health systems and is informed by past frameworks in the fields of health systems and quality improvement, including Donabedian’s framework, 22 WHO’s building blocks 16 and maternal quality of care 27 frameworks, Judith Bruce’s family planning quality framework, 28 Getting Health Reform Right, 33 the Juran trilogy, and the Deming quality cycle. 34 Figure 1 High-quality health system framework Our high-quality health system framework focuses on health system function, user experience, and how people benefit from health care. This Commission believes that the quality of health systems should be primarily measured by these processes and impacts rather than by inputs. Facilities staffed by health workers and equipped with running water, electricity, and medicines are essential for good quality care, but the presence of these inputs is not itself a measure of high-quality care. Empirical work shows that the quantity of such inputs does not predict the care that people receive and whether their health will improve—poor care often happens in the presence of adequate tools. 35 Table 1 summarises the components of the three framework domains (quality impacts, processes of care, and foundations). The quality impacts begin with better health, including reduced mortality and morbidity, and positive health markers such as quality of life, function and wellbeing, and absence of serious health-related suffering. 36 These health outcomes should also encompass patient-reported measures. Another impact of high-quality health systems is confidence in the system, including trust in health workers and appropriate care uptake. Confidence goes beyond the more traditional measure of satisfaction with care; it is the extent to which people trust and are willing to use health care. Trust is essential for maximising outcomes because it can motivate active participation in care—ie, adherence to recommendations and uptake of services, including in emergencies. 37 Trust is also essential for the success of UHC, because financing for UHC will be primarily domestic and people are unlikely to agree to contribute taxes or pay premiums for health services that they do not value. Finally, although good quality of care might require additional investment in many health systems of LMICs, high-quality health systems have the potential to generate economic benefits. First, by reducing premature mortality and improving people’s health, ability to work, and ability to attend school, high-quality health systems can foster economic productivity. Second, high-quality health systems can reduce waste from unnecessary, ineffective, and harmful care and prevent inappropriate hospital admissions and the bypassing of cost-effective options, such as primary care. Additionally, high-quality health systems with appropriate financing mechanisms, particularly mandatory insurance, can reduce the incidence of catastrophic or impoverishing health expenditures. Therefore, financing that provides people with financial protection and promotes high-quality, efficient care is an integral foundation of a high-quality health system. Table 1 High-quality health system framework components Components Quality impacts Better health Level and distribution of patient-reported outcomes: function, symptoms, pain, wellbeing, quality of life, and avoiding serious health-related suffering Confidence in system Satisfaction, recommendation, trust, and care uptake and retention Economic benefit Ability to work or attend school, economic growth, reduction in health system waste, and financial risk protection Processes of care Competent care and systems Evidence-based, effective care: systematic assessment, correct diagnosis, appropriate treatment, counselling, and referral; capable systems: safety, prevention and detection, continuity and integration, timely action, and population health management Positive user experience Respect: dignity, privacy, non-discrimination, autonomy, confidentiality, and clear communication; user focus: choice of provider, short wait times, patient voice and values, affordability, and ease of use Foundations Population Individuals, families, and communities as citizens, producers of better health outcomes, and system users: health needs, knowledge, health literacy, preferences, and cultural norms Governance Leadership: political commitment, change management; policies: regulations, standards, norms, and policies for the public and private sector, institutions for accountability, supportive behavioural architecture, and public health functions; financing: funding, fund pooling, insurance and purchasing, provider contracting and payment; learning and improvement: institutions for evaluation, measurement, and improvement, learning communities, and trustworthy data; intersectoral: roads, transport, water and sanitation, electric grid, and higher education Platforms Assets: number and distribution of facilities, public and private mix, service mix, and geographic access to facilities; care organisation: roles and organisation of community care, primary care, secondary and tertiary care, and engagement of private providers; connective systems: emergency medical services, referral systems, and facility community outreach Workforce Health workers, laboratory workers, planners, managers: number and distribution, skills and skill mix, training in ethics and people-centred care, supportive environment, education, team work, and retention Tools Hardware: equipment, supplies, medicines, and information systems; software: culture of quality, use of data, supervision, and feedback The processes of care include competent care and user experience, which we consider to be complementary elements of quality. These elements must be present in both the health system as a whole and in individual care visits. Competent systems provide people and communities with health promotion and prevention when healthy and effective and timely care when sick. People should be able to count on their conditions being detected and managed in an integrated manner. Systems should also be user-focused: easy to navigate, with short wait times and attention to people’s values and preferences—this is the definition of people-centredness. When people visit providers, they should expect to receive evidence-based care, including careful assessment, correct diagnosis, and appropriate treatment and counselling. And providers should treat all people with dignity, communicate clearly, and provide autonomy and confidentiality. Disrespectful and discriminatory behaviours are crucial quality failures, as are work environments that demean or disempower providers. The foundations of high-quality health systems begin with the populations that they serve: individuals, families, and communities. People are necessary partners in providing health care and improving health outcomes; they are not only the core beneficiaries of the health system, but also the agents who can hold these systems to account. The health needs, knowledge, and preferences of people should shape the health system response. High-quality health systems require strong governance, and financing, to promote the desired outcomes and policies to regulate providers, organise care, and institutionalise accountability to citizens. However, regulation will not be enough; health system leaders will need to inspire and sustain the values of professionalism and excellence that underpin high-quality health care. In most countries, health care is provided by three platforms: community health, primary care, and hospital care. An appropriate facility and provider mix, quality-centred service delivery models, and functioning connections between levels of care (eg, referral, prehospital transport) will be required to ensure that the whole system maximises outcomes and the efficient use of resources. Providers, from health workers to managers, are fundamental for health systems, and require adequate numbers, preparation, professionalism, and motivation. Providers need high-quality, competency-focused clinical education, with training in ethics, and a supportive environment for achieving the desired performance. Finally, health systems require not only physical tools, such as equipment, medicines, and supplies, but also new attitudes, skills, and behaviours, including quality mindsets, supervision and feedback, and the ability and willingness to learn from data. The foundations alone will not create good care, and the system will not be able to adapt to new challenges without built-in mechanisms for learning and improvement, including having timely information on performance, assessment of new ideas, and the means to retire ineffective approaches. This framework can be used to measure health systems over time on elements that matter to people (through processes and impacts) and to guide opportunities for improvement (through shoring up or rethinking foundations). Section 2: What quality of care are people receiving in LMICs today? In this section, we describe the current state of healthcare quality in LMICs. We compiled data from multiple sources to present the most comprehensive and detailed picture of health system quality. We analysed data from health facility, household, telephone, and internet surveys collected in the past 10 years, and summarised findings from global estimates, systematic reviews, and individual studies (data sources are listed in appendix 1 and a comparison of methods used to collect the data can be found in appendix 2). Within the constraints of the available data, we describe quality across all health conditions addressed by the SDGs (list of conditions in appendix 1) and across health system platforms (community outreach, primary and hospital care, and the linkages between them: referral systems and emergency medical services). Following the Commission’s framework, we describe the current situation with regard to provision of evidence-based care, competent health systems, and user experience and we present available evidence on the links between quality and health, confidence, and economic benefits. Our focus is on describing the processes of care and their impacts. Foundations—the facilities, people, and tools required for care—are crucial to high-quality health systems, but their availability does not guarantee quality care. Lastly, we explore why some population groups are more vulnerable to poor-quality care. Where multicountry medians are presented throughout the section, country-specific data are included in appendix 2. Key findings are shown in panel 1. Panel 1: Section 2 key findings Poor-quality health systems result in more than 8 million deaths per year in LMICs, leading to economic welfare losses of $6 trillion. Health providers in low-income and middle-income countries (LMICs) often do less than half of recommended evidence-based care actions. For example, only two in five women who delivered in a facility were examined within 1 h after birth. Approximately one third of patients experience disrespectful care, short consultations, poor communication, or long wait times. Inadequate integration across platforms and weak referral systems undermine the ability of health systems to care for complex and emerging conditions. Less than one quarter of people in LMICs believe that their health system works well, compared with half of people in high-income countries. Clinics and providers with good performance can be found in every country and studying them can inform country-wide efforts for improvement. High-quality health care is inequitably distributed in many countries, with poor and vulnerable groups having worse quality care—both in terms of competent care and user experience. People can be especially vulnerable to poor-quality care on the basis of particular settings of care, health conditions, and demographic factors. Processes of care Evidence-based care Evidence-based care includes systematic patient assessments, accurate diagnoses, provision of appropriate treatments, and proper patient counselling. In this section, we assess how these aspects are being followed, across selected SDG conditions. Data from direct observations of clinical consultations allowed us to measure the quality of reproductive, maternal, and child health services. Using guidelines from WHO, we identified essential elements of reproductive, maternal, and child health care and built quality indices (appendix 1). On the basis of these indices, data from observations of 81856 consultations in 18 countries showed that adherence to evidence-based guidelines is low (figure 2A). On average, providers fulfilled only 47% of recommended care—with median performance ranging from 44% for family planning consultations to 64% for labour and delivery care (appendix 2). However, median figures can mask important variations within countries (appendix 2). These large variations in performance across providers suggest that better quality of care is possible in these countries. Identifying and replicating local best practices might be valuable to inform improvement strategies. 38 Figure 2 Adherence to evidence-based guidelines and diagnostic accuracy Dots represent country-specific means, vertical bars indicate median performance across countries, and boxes delineate the IQR. Indicator definitions are shown in appendix 1, and country specific means are shown in appendix 2. (A) Data are from Service Provision Assessment (SPA) surveys done in ten countries (Ethiopia 2014, Haiti 2013, Kenya 2010, Malawi 2013, Namibia 2009, Nepal 2015, Rwanda 2007, Senegal 2015–16, Tanzania 2015, and Uganda 2007) and baseline facility surveys of Results-based Financing impact evaluations (RBF) in eight countries (Burkina Faso 2013, Central African Republic 2012, Cameroon 2011, Republic of the Congo 2014, Democratic Republic of the Congo 2015, Kyrgyzstan 2012–13, Nigeria 2013, and Tajikistan 2014–15). (B) Data are from clinical vignettes from the Service Delivery Indicators surveys done by the World Bank, in cooperation with the African Economic Research Consortium and the African Development Bank in Kenya (2012), Nigeria (2013), Tanzania (2014), Togo (2013), and Uganda (2013) and from the Service Provision Assessment survey in Ethiopia (2014). Other studies have also shown that providers often fail to adhere to clinical guidelines. In Uttar Pradesh, India, facility-based birth attendants did only 40% of items on the WHO safe childbirth checklist in a typical birth. 39 Across 12 countries, only 50% of diarrhoea cases were correctly managed in health-care facilities according to WHO and UNICEF recommendations. 40 In standardised patient studies in China 41 and Kenya, 42 only 13–45% of suspected tuberculosis cases were correctly managed by primary care providers according to the International Standards for Tuberculosis Care guidelines. A systematic patient assessment involves gathering clinically relevant information by asking appropriate medical history questions and doing recommended examinations and tests. Data from LMICs showed that systematic patient assessments are not always done. For example, after giving birth, women should be assessed for abnormal bleeding, perineal tears, signs of infections, and high blood pressure. 43 However, in many countries, few women reported receiving any postpartum check-up after giving birth in a health-care facility, including only 27% of women in Swaziland and 44% in Ethiopia, Burundi, and Rwanda (appendix 2). Similarly, during antenatal care, monitoring of blood pressure and urine and blood sample analyses are crucial to detect pre-eclampsia, nutritional deficiencies, infections, and other pregnancy risks. 44 Across 91 countries, only 73% of women attending antenatal care with a skilled provider reported receiving these elements of care—ranging from an average of 54% in 30 low-income countries to 94% in 27 upper-middle-income countries (appendix 2). 45 Poor availability of laboratory facilities and diagnostic equipment are also barriers to patient assessment and diagnosis, even when providers are aware of the necessary tests. For example, pathology service coverage in sub-Saharan Africa is approximately one-tenth of that in high-income countries. 46 Even simple tests are often unavailable: studies showed that blood glucose meters and urine strips were available in only 18–61% of facilities across Mali, Mozambique, and Zambia. 47 A study of ten countries found that only 2% of health-care facilities had the eight diagnostic tests defined as essential for basic service readiness by WHO. 48 Incorrect diagnoses have deleterious consequences on health and contribute to treatment delays and antimicrobial resistance. For example, diagnostic uncertainty about undifferentiated fever often leads to overprescription of antimicrobial therapy. 49 Our analyses of data from clinical vignettes done in LMICs revealed wide variations in diagnostic accuracy. In six sub-Saharan African countries, correct diagnoses ranged from 0 providers in Togo identifying malaria with anaemia to 94% of providers in Kenya diagnosing post-partum haemorrhage (figure 2B, appendix 2). Other work has shown that, across six eastern European and central Asian countries, acute myocardial infarctions were correctly diagnosed by only 33% of providers. 50 Performance in practice is also likely to be worse than on vignettes: diagnostic accuracy ranging from only 8% to 20% has been reported for childhood pneumonia in Malawi 51 and for a range of primary care conditions in India. 52 Poor quality of laboratory testing and a heavy reliance on outdated diagnostic technologies can also contribute to misdiagnoses. For example, an external quality assessment 53 in the Democratic Republic of the Congo found that only 4% of laboratories correctly identified the parasites that cause malaria and human African trypanosomiasis on all slides analysed. Similarly, studies 54 in Latin America have reported Pap smear sensitivity as low as 20–25% and lower than expected rates of HER2 (human epidermal growth factor receptor 2) positivity in women with early breast cancer. For tuberculosis, uptake of newer diagnostics has been slow and many countries continue to rely on often inaccurate smear microscopy. 55 In high-burden countries, nine sputum smears are done for every gold standard test (Xpert MTB/RIF) used. 55 Poor-quality care also includes the underuse 56 of effective care and the overuse 11 of unnecessary care. Our analyses of survey data revealed that individuals in LMICs often do not receive appropriate treatments during consultations, including preventive interventions during skilled antenatal care, oral rehydration therapy for children with diarrhoea, or antibiotics for those with symptoms of pneumonia (figure 3, appendix 2). Similarly, another study 57 in Malawi reported that only 38·7% of patients with non-severe pneumonia confirmed on re-examination were correctly prescribed first-line antibiotics during consultation. Additionally, despite being diagnosed, many patients are untreated or undertreated for conditions such as HIV, tuberculosis, hypertension, diabetes, and depression. 58–63 In LMICs where data are available, only 68% of people aware of their HIV status are on antiretroviral therapy, and only 5% of people with a diagnosis of major depressive disorder receive minimally adequate treatment (figure 3, appendix 2). Individuals in severe pain are also systematically undertreated in LMICs. 36 Of the 298·5 metric tonnes of morphine-equivalent opioids distributed in the world every year, only 0·03% of that is distributed in low-income countries, leading to a 98% unmet need for morphine. 36 A study 64 showed that, among patients with ST-segment elevation myocardial infarctions admitted to Chinese hospitals, only half of ideal candidates for reperfusion therapy received the treatment. Other treatments that reduce mortality in patients were also underused, with only 58% of eligible patients receiving β blockers and 66% receiving angiotensin-converting-enzyme inhibitors. 64 All these reports represent major missed opportunities to improve outcomes among people already using the health system. Figure 3 Proportion of individuals receiving appropriate treatments among those who seek care in 112 low-income and middle-income countries Dots represent country-specific means, vertical bars indicate median performance across countries, and boxes delineate the IQR. Data sources for tetanus injections and iron during antenatal care were Demographic and Health surveys (DHS) and Multiple Indicator Cluster surveys in 75 countries; for oral rehydration therapy (ORT) were DHS in 54 countries; for antibiotics for pneumonia were DHS and Multiple Indicator Cluster surveys in 63 countries; for antiretroviral therapy among those aware of their HIV status were UNAIDS estimates in 78 countries; and for minimally adequate depression treatment were World Mental Health Surveys in 8 countries. Indicators are defined in appendix 1; country specific means are shown in appendix 2. Overuse of unnecessary or ineffective care has also been documented in LMICs. In the previously mentioned study 64 in China, almost a third of patients received magnesium sulphate—a treatment that is ineffective—on admission and more than half of patients were given traditional Chinese medicine, despite little evidence of its efficacy and safety. 64 Other instances of inappropriate care in LMICs include unnecessary use of antibiotics for diarrhoea, inappropriate cardiac interventions, overuse of steroids, and unnecessary hysterectomies. 11,65,66 Although many women still do not have access to needed caesarean sections, rates of unnecessary caesarean sections have been increasing in LMICs. 11,67 Inappropriate use and overprescription of antimicrobials, combined with poor sanitation, inadequate access to diagnostic tools, and low diagnostic accuracy, have fuelled antimicrobial resistance throughout LMICs. 68 A 2018 study 69 assessed the quality of antimicrobial prescribing for hospital inpatients in 53 countries, including 25 LMICs. Inappropriate antibiotic prescribing practices included prescriptions for unknown diagnoses, prescriptions without stop or review dates (to avoid unnecessarily long antibiotic courses), and prolonged surgical prophylaxis. Proper counselling and health education are essential elements of evidence-based care. We found that during antenatal care, many skilled providers do not advise women on the signs of pregnancy complications or how to prevent HIV infections, and, when prescribing contraceptives, many providers fail to discuss their potential side-effects (appendix 2). Similarly, providers often do not state their diagnosis during the consultation. 52 In observations of sick child consultations in 17 countries, only 43% of providers informed caregivers about the diagnosis of their child (appendix 2). Counselling is particularly important for chronic disease management. Tobacco use, excess weight, unhealthy diets, and physical inactivity are the leading risk factors for non-communicable diseases. Data from the WHO STEPS survey in seven LMICs showed that providers did not counsel many patients diagnosed with cardiometabolic diseases: only 16% of patients were counselled on tobacco, 29% on exercise, and 55% on dietary changes (appendix 2). In six Latin American and Caribbean countries, only 56% of patients diagnosed with at least one chronic condition reported receiving advice on diet and exercise from primary care providers (appendix 2). 70 Competent systems Beyond the content of the health-care visit, competent care requires the whole health system to function for the patient. Here, we describe current evidence on four elements of competent health systems: safety, prevention and detection, continuity and integration, and timely care. The literature documents a range of safety problems in health care, including adverse drug events, adverse events and injuries due to medical devices, injuries due to surgical and anaesthesia errors (including wrong-site surgery), health-care-associated infections, improper transfusion and injection practices, falls, burns, and pressure ulcers. 71 Despite lower health-care use rates, LMICs bear the majority of the global burden of adverse events from unsafe care. 72 Surgical site infections, the most common type of health-care-associated infection, are markedly higher in LMICs than in high-income countries. 73 Patient safety literature has been largely focused on inpatient care, but adverse events also occur to outpatients, including medication errors, infections resulting from poor hand hygiene, unsafe injections, blood samples, or reusable equipment. LMICs are estimated to have rates of medication-related adverse events similar to those of high-income countries, but they result in twice as many years of healthy life lost because more younger patients are affected in LMICs. 72 One study found that, across 54 LMICs, 35% of healthcare facilities do not have water and soap for handwashing and 19% do not have improved sanitation. 74 This absence of services compromises efforts to improve hygiene behaviours and reduce health-care-associated infections. However, although water and sanitation are necessary, handwashing does not necessarily associate with their presence: low adherence to hand hygiene was found even in facilities with available supplies. 75 Beyond their costs to human lives and disability, adverse events from unsafe care are also costly in terms of loss of trust in the health system. The prevention and early detection of diseases, including through recommended screenings, is an important function of high-quality health systems. Across six Latin American and Caribbean countries, less than half of adults reported having had their blood pressure checked in the past year and their cholesterol checked in the past 5 years. 76 Rates of cervical and breast cancer screening also vary widely. 54 Across six LMICs surveyed by the WHO study on global ageing and adult health (SAGE), mammogram coverage averaged 20% of all women of screening age and was as low as 1% in India and 2% in Ghana (appendix 2). 63 Across nine countries in the Americas, average Pap smear coverage was 36% of women in need, ranging from 10% in Nicaragua to 97% in Panama. 77 Even people in the health system might not receive the needed screening or early detection. In countries with HIV prevalence higher than 5%, WHO recommends that all pregnant women be tested for HIV. 78 In five of nine high-prevalence countries, more than 95% of pregnant women attending antenatal care were tested for HIV. However, despite a HIV prevalence of 27% in Swaziland and 12% in Mozambique, only 56% of women in Swaziland and 69% in Mozambique are tested during antenatal care (appendix 2). Continuity of care is reflected by the ability of the health system to retain people in care and by the patient’s ability to see a clinician familiar with their medical history. Integration is the extent to which health services are delivered in a complementary and coherent manner. These two dimensions are important for the management of non-communicable diseases and other chronic conditions, such as HIV, that require continuous patient support after diagnosis and a comprehensive treatment approach. 58 Across services including antenatal care, child vaccination, antiretroviral therapy, and mental health care, retention rates ranged from 87% for diphtheria-tetanus-pertussis (DTP3) vaccination in 83 LMICs to only 55% retention for mental health care in 12 LMICs (appendix 2). 79,80 Similarly, lapses in the follow-up of test results have also been reported and pose severe challenges for infectious conditions such as HIV and tuberculosis. 59,71 A systematic review 81 estimated patient losses to the system between diagnosis and treatment for tuberculosis to be as high as 18% in Africa and 13% in Asia. Regarding integration, all tuberculosis patients should be tested for HIV, because of risk factors shared between the two infections. 78 In the WHO African Region, where the burden of HIV-associated tuberculosis is highest, 82% of patients with tuberculosis were tested for HIV. 82 For people with life-threatening emergencies, such as labour complications, trauma, and stroke, treatment delays substantially increase mortality risk. Timeliness is also central for other conditions that can be cured if treated early—including many cancers—and conditions such as tuberculosis or diabetes, in which early treatment prevents transmission or disease progression. Time intervals from admission to surgery for traumatic fractures of the femur were found to be substantially longer in LMIC hospitals than in high-income country hospitals. 83 Numerous studies have described the delays that occur during labour complications in women deciding to seek care and in reaching health facilities—the so-called first and second delays. However, the third delay—in providing high-quality care once women reach health-care facilities—is emerging as an important contributor to maternal and newborn child mortality. 84 For example, a study 85 in India found that attending to women within 10 min of their arrival to the facility could have prevented 37% of recorded stillbirths. Additionally, the absence of immediate postpartum care can lead to serious obstetric complications being missed. Across 41 countries with a demographic and health survey, we found that only 41% of women delivering in a health-care facility reported someone checking on their health within 1 h of delivery (appendix 2). For infectious diseases, such as tuberculosis, making a timely diagnosis is crucial for interrupting transmission and optimising treatment outcomes. A review 86 of studies done in LMICs found that an average of 28·4 days passed between the first contact of patients with the health system and the date of tuberculosis diagnosis, ranging from 2 days in China to 87 days in Pakistan. Regarding cancer care, delays caused by both patient and health system contribute to advanced disease at presentation and high cancer mortality rates in LMICs. Studies 54,87,88 from Brazil, Ghana, Mexico, Peru, and Rwanda reported delays of up to 28 weeks between presentation to a doctor and definitive diagnoses of cervical or breast cancer. Data from the Mexican Institute of Social Security, the largest health system in Mexico, revealed that 51% of women with breast cancer waited more than 30 days between mammography and diagnosis, and 44% of women with cervical cancer waited more than 30 days between Pap smear and diagnosis. 89 Delays in initiating treatments further affected the prognosis of patients. According to the Mexican Institute of Social Security, as many as 70% of women with breast cancer and 61% of women with cervical cancer waited more than 21 days between receiving the diagnosis and beginning therapy. 89 Similarly, a study 90 done in Buenos Aires hospitals, Argentina, found that the median time elapsed between diagnosis of breast cancer and treatment with chemotherapy was 76 days in public hospitals and 60 days in private hospitals. These delays are concerning because waiting more than 5 weeks before starting definitive treatment can worsen survival for cervical cancer, and delays in diagnosis longer than 12 weeks are considered suboptimal for breast cancer. 54,87 User experience Competent care and competent health systems are necessary for achieving high-quality care, but a positive user experience is also important. In addition to having an intrinsic value, positive user experience can improve retention in care, adherence to treatments, and, ultimately, confidence in health systems. 91 Additionally, some studies have found that positive user experience is linked to better technical quality. 91,92 To address insufficient cross-national data on user experience, this Commission did an internet survey on user experience in 12 countries in Africa, Latin America, Asia, and the Middle East. Full results will be presented in forthcoming papers, but some of the key results of this survey are shown in figure 4, along with indicators from four other surveys done in 49 LMICs and 11 high-income countries (appendix 2). 70 We found that an average of 34% of people in LMICs reported poor user experience, citing a lack of attention or respect from facility staff (41%), long wait times (37%), poor communication (21%), or short time spent with providers (37%). This result on the short time spent with providers was echoed by a 2017 review 93 that found that primary care consultations lasted fewer than 5 min on average in LMICs. Figure 4 User experience in 49 low-income and middle-income countries (LMICs) and 11 high-income countries Dots represent country-specific means, vertical bars indicate median performance across countries, and boxes delineate the IQR. High-income countries do not contribute to the illustrated medians. Data are from the surveys indicated. AFRO=Afrobarometer survey done in 34 African LMICs (2011–13). HQSS=Commission-led internet survey done in 12 LMICs (2017). IDB=nationally representative phone survey on primary care access, use, and quality done by the Inter-American Development Bank in six Latin-American and Caribbean LMICs (2013). SPA=Service Provision assessment surveys done in ten LMICs (2007–16). CWF=International Health Policy Survey done by the Commonwealth Fund in 11 high-income countries (2013). Indicators are defined in appendix 1; country specific means are shown in appendix 2. Panel 2: Beyond the numbers—experiences in the health system* Interviews with patients help to paint a more comprehensive picture of their experiences within the health system. The Word Bank’s landmark publication, Voices of the Poor,A1 in 2000 shared the narratives of individuals across the world and described the challenges that the poor face in not only accessing health care but also successfully navigating the health system. Since then, several qualitative studies have further illuminated the ways in which people receive differential treatment while seeking care. We did a rapid review of these studies (methods are described in appendix 1). The stories described in these studies highlight disparities in both competent care and user experience. Patients across a wide range of low-income and middle-income countries have described the lack of competent care and health systems. In Egypt, a woman said that “at the hospital, they do nothing to people unless they are staff relatives, or rich people that have power or authority.”A1 A focus group participant in TanzaniaA2 stated that “they are very often saying that medicines are available or not available. When someone tells you they aren’t, it’s her siri (secret). She is the only one who knows. She decides when she sees you coming. … This really upsets us…. The obstacles are like these ones of medicines even if there are no medicines what makes me feel bad is the game.” Patients also reported improper examinations and care. A focus group participant in EthiopiaA3 described her delivery care: “they left the placenta inside me. Because they are impatient, they did not examine me. After I gave birth, I rested there for 5 h but no one came and asked me whether I was bleeding… After 3 days, my face got swollen… I almost died.” Studies also highlight poor user experience, including verbal abuse and neglect from health-care workers. According to a patient in Russia, “the hospital is like a prison”.A1 A person in GhanaA4 recounted that “people always say that the nurses are shouting too much, and saying bad things to them, and maybe they don’t want to treat them. They only care for those big people who have money to give them.” Poor patients, such as this respondent in Timor Leste,A5 also frequently report disrespectful, discriminatory treatment from health-care workers: “Health workers yell at us like a slave… they give priority to the important people, rich and intellectual and neglecting the poor, no money, stupid and dirty…That is the reason why people do not want to go to the hospital although they have a letter of referral.” *Panel references can be found in appendix 1. Some differences across surveys are worth noting. In Afrobarometer survey countries, 42% of respondents reported never experiencing a lack of attention or respect, whereas in the internet survey, 75% of respondents reported respectful care at their last visit. Differences in countries and income groups (our survey was done in more middle-income countries than those of Afrobarometer), wording (“never experienced” was used in Afrobarometer surveys), time frames (past year vs last visit), and survey sampling (internet users have a higher average socioeconomic status than household respondents) might explain these differences. Differing expectations of quality can also influence the perception of user experience. No benchmarks exist for what constitutes good user experience. However, user ratings of communication and time spent with providers were consistently higher in high-income countries than in LMICs (figure 4), with only 11% of respondents reporting poor communication and 17% reporting insufficient time with providers (compared with 74% and 60% on average in the six Latin American and Caribbean countries surveyed by the Inter-American Development Bank). Disrespect and abuse of women during childbirth has been widely reported in LMICs, 9 including documented instances of physical abuse, non-consented clinical care, no confidentiality and dignity, discrimination, abandonment, and detention in facilities. A review 9 of studies showed a range of 19–98% of women reporting mistreatment during childbirth across LMICs, with 3–36% reporting physical abuse. Beyond being an indicator of poor-quality care, disrespect and abuse should be unacceptable in any health system. Nonetheless, these numbers can only tell part of the story. The quality of the processes of care, particularly of the user experience, is also reflected in the patient voices in panel 2. Quality impacts High-quality care—both competent care and positive user experience—can have an effect on people’s health, their confidence and trust in health systems, and economic outcomes. In this section, we present available evidence on morbidity and mortality linked to poor quality care. We also synthesise data on people’s confidence in health systems, and we address the potential economic benefits of high-quality care. Health Although the causes of death are often multifactorial, and are not solely influenced by health care, deaths from some conditions are highly dependent on quality of care and are regarded as sensitive indicators of how well a health system is functioning. For this Commission, we did an analysis of the mortality burden of poor-quality care across health conditions relevant to SDGs. 94 We compared mortality for conditions amenable to health care between LMICs and countries with well performing health systems, to estimate the mortality that can be attributed to poor-quality health systems. We estimated that 8·6 million deaths per year (uncertainty interval [UI] 8·5–8·8 million) in 137 LMICs are due to inadequate access to quality care. Of these, 3·6 million (UI 3·5–3·7 million) are people who did not access the health system, whereas 5·0 million (UI 4·9–5·2 million) are people who sought care but received poor-quality care. Poor-quality care resulted in 82 deaths per 100 000 people in LMICs—an annual mortality rate equivalent to that from cerebrovascular disease globally. 94 Cardiovascular deaths make up 33% of deaths amenable to health care (figure 5). 94 Ischaemic heart disease is the largest contributor to amenable cardiovascular disease deaths, with 1·4 million deaths due to poor-quality care and 260 000 due to non-utilisation of health systems. Of the 2 million deaths from neonatal conditions and tuberculosis that are amenable to health care, 56% occurred in people who used the health system, but did not receive good quality care. Across several other health priorities for which coverage is still low, including chronic respiratory disease, cancer, mental health, and diabetes, non-utilisation of health systems plays a larger role than poor-quality care, but this will change as access increases. Our results highlight that health systems could be more effective in saving lives across a spectrum of conditions by improving quality of care along with expanding coverage. An analysis done with similar methods for a shorter list of conditions found that, globally, 8·0 million deaths could be averted with access to high-quality care. 95 Figure 5 Deaths from Sustainable Development Goal conditions due to poor-quality care and non-utilisation in 137 low-income and middle-income countries External factor deaths are those due to poisonings and adverse medical events. Other infectious diseases deaths are those due to diarrhoeal diseases, intestinal infections, malaria, and upper and lower respiratory infections. Maternal and newborn deaths are a particularly sensitive measure of health system quality, because many deaths stemming from labour complications can be averted with appropriate treatment. 96 Figure 6 shows the comparison of rates of maternal and newborn deaths in countries with similar, high coverage of skilled attendants during birth (80–90% of births). Countries were grouped by income to reduce the influence of social and economic determinants. Across countries with similar coverage, large disparities in maternal and neonatal mortality are apparent. The ratio of worst to best performing country for maternal mortality was 2·1 in low-income, 12·2 in lower-middle-income, and 5·7 in upper-middle-income countries; for neonatal mortality it was 1·4, 3·7, and 2·9, respectively, suggesting differences in quality of care. Figure 6 Differences in maternal and neonatal mortality rates across low-income and middle-income countries with 80–90% skilled birth attendance coverage Mortality estimates are from WHO, using 2015 modelled estimates. Skilled birth attendance is from the World Bank World Development Indicators, using the most recent data available in the past 10 years. Horizontal lines indicate Sustainable Development Goal targets. Few deaths in these countries are recorded in complete vital registration systems; global estimates must account for missing and unreliable data. Mortality estimates should be interpreted with caution because of uncertainty from measurement error. References can be found in appendix 1. The frequency of stillbirths can also be reduced with high-quality care. 97 An analysis done for this Commission—with use of the Lives Saved Tool—in 81 countries that are the focus of the Countdown to 2030 collaboration, estimated that 520 000 stillbirths could be prevented and 670 000 neonatal and 86 000 maternal lives could be saved in these countries by 2020 if adequate quality of care is provided at current levels of health system use (appendix 1). Because quality was measured by use of inputs to care rather than by processes of care, these figures might underestimate actual mortality. An older analysis that used different methods found similar effects on stillbirths, but more maternal and newborn lives saved. 98 In addition to improving the quality of labour and delivery care, improving the quality of antenatal care and family planning is crucial to reducing stillbirths. 97 Population-based cancer survival is also an indicator of overall health system effectiveness. 99 Using cancer registries from 71 countries, a 2018 study 99 found varying rates of cancer survival between countries and for different cancers. For example, most countries reported an increasing trend in 5-year net survival from breast cancer since 1995, but survival did not always increase in countries such as India, Thailand, and several eastern European countries. 99 More broadly, hospital mortality can be useful for gauging the quality of care in facilities, when adjusted for disease severity and underlying risk, and can provide useful insight on the quality of secondary care in a region or country, when aggregated. Delivering high-quality hospital care requires well functioning facility systems that include appropriate triage in emergency departments, rapid decision making for very sick patients, close inpatient monitoring, and rigorous infection prevention practices, among other elements. Studies in LMICs have revealed high institutional maternal, perioperative, and emergency department mortality rates and high in-hospital mortality rates in patients admitted for acute myocardial infarctions. For example, the WHO multicountry survey 100 on maternal and newborn health found intrahospital maternal mortality ratios that were 2–3 times higher than expected on the basis of case severity. High rates of perioperative and anaesthetic-related mortality were also found in LMIC hospitals, reflecting gaps in surgical and hospital care quality. 101–104 The African surgical outcomes study 101 found that patients in Africa were twice as likely to die after surgery compared with the global average, despite being younger, with a lower surgical risk profile, and undergoing less complex surgeries. Most of the deaths occurred post surgery, suggesting that many lives could be saved by effective surveillance for physiological deterioration in patients who have developed complications. Similarly, although the quality of emergency and trauma care in LMICs is understudied, one study found that mortality recorded in emergency departments in LMICs is many times higher than that generally reported in high-income countries, pointing to gaps in the quality and appropriateness of services being provided in these emergency departments. 105 In patients admitted with ST-segment elevation myocardial infarction in China, in-hospital mortality did not significantly change between 2001 and 2011, suggesting a need for improvements in quality. 64 Mortality alone does not capture the full burden of poor-quality care. People accessing poor-quality care can develop morbidities, including physical sequelae, persistent symptoms, reduced function, pain, and poor quality of life. For example, for many people in LMICs, access to health care does not result in control of manageable conditions such as hypertension, diabetes, HIV, tuberculosis, chronic lung diseases, and depression. Poor quality of care during childbirth can also result in morbidities with lifelong consequences. A study 106 of 1·7 million adults in China found that only 24% of patients under treatment for hypertension had achieved blood pressure control. A nationally representative study, 107 also from China, found that among patients receiving treatment for diabetes, only 40% had achieved adequate glycaemic control. Complications of diabetes such as blindness, kidney failure, and lower limb amputation can be largely averted through high-quality primary care. However, in 2016, the Mexican Social Security Institute reported 4518 major lower limb amputations in patients with diabetes, for an incidence of 120 per 100 000 patients. This continues a previously documented trend of increasing incidence of diabetic amputations and is higher than the comparable incidence in most, but not all, OECD countries. 89,108 According to 2017 UNAIDS estimates, 79 only 71% of people on antiretroviral therapy in LMICs have achieved viral suppression, and only ten countries have reached the 90% viral suppression target. Tuberculosis treatment success rates are also reflective of the quality of care, and only eight of the 30 countries with high tuberculosis burden have reached 90% first-line treatment success rate. 109 In countries with high drug-resistant tuberculosis burden, treatment success rates range between 50% and 85%. 109 These figures show a need for better follow-up, treatment, and counselling of patients with manageable conditions in LMICs. Obstetric fistula is a highly debilitating condition with severe social and health consequences. Women with fistula have leakage of urine or stool through the vagina and are ostracised because of this in some regions. 110 Fistulas typically develop in women with prolonged obstructed labour. Although cultural factors, such as child marriage, increase the risk of obstructed labour, the existence of fistulas on a wide scale, as documented in studies, is an indicator of poor quality obstetric care and a broader health system failure. 111 Using data from demographic and health surveys in 25 countries, we estimated the proportion of women who suffered from symptoms of an obstetric fistula among those whose last birth was attended by a skilled provider. In women whose last delivery was done with a skilled attendant, ten per 1000 women reported symptoms of an obstetric fistula, ranging from 0·54 per 1000 in Burkina Faso to 32 per 1000 in Pakistan (appendix 2). By contrast, obstetric fistulas have been almost eliminated in high-income countries. Another goal of treatment is remission or reduction of symptoms. In the WHO SAGE, only 50% of patients receiving treatment for chronic lung disease and only 7% receiving treatment for depression reported having no symptoms from the two diseases in the preceding 2 weeks (appendix 2). The Lancet Commission 36 on palliative care and pain relief quantified the global burden of serious health-related suffering and found that more than 80% of the global 61 million patients affected by serious health-related suffering live in LMICs. Confidence in the system The quality of care that people receive also has important consequences for their confidence and trust in their government and health system, which can affect their decisions of when and where to seek care. Figure 7 shows varying degrees of confidence and trust in health systems across 45 LMICs. Only 24% of people stated that they believe that their health system worked “pretty well” and that only minor changes were necessary to make it work better. 112 In comparison, 47% of respondents agreed with the same statement in 11 high-income countries, ranging from 24% in the USA to 61% in the UK (appendix 2). 113 Differences in survey sampling and indicator wording might account for some of the variation across surveys. For example, increased confidence in the ability to receive the care needed present in the internet survey led by this Commission might be explained partly by a higher socioeconomic status of internet users. Gallup World polls 114 also showed large gaps in satisfaction between low-income and high-income countries: in sub-Saharan Africa, northern Africa, and the Middle East, only 42–49% of respondents were satisfied with the availability of high-quality care near them, compared with 86% in northern Europe. Nonetheless, patient satisfaction should be interpreted with caution as a measure of quality (panel 3). Figure 7 Confidence and trust in health systems in 45 low-income and middle-income countries (LMICs) and 11 high-income countries Dots represent country-specific means, vertical bars indicate median performance across countries, and boxes delineate the IQR. High-income countries do not contribute to the illustrated medians. Data are from the surveys indicated. AFRO=Afrobarometer survey done in 34 African countries (2011–13). HQSS=Commission-led internet survey done in 12 LMICs (2017). IDB=nationally representative phone survey on primary care access, use, and quality done by the Inter-American Development Bank in six Latin-American and Caribbean LMICs (2013). CWF=International Health Policy Survey done by the Commonwealth Fund in 11 high-income countries (2013). Indicators are defined in appendix 1; country specific means are shown in appendix 2. Panel 3: Why are people satisfied with poor quality?* Perhaps paradoxically, because of the prevalence of poor-quality health care, patients in low-income and middle-income countries tend to report high satisfaction with the care received. Across eight low-income countries, 79% of patients and caregivers reported being very satisfied with the care received during consultations in which providers did less than half of essential clinical actions (results in appendix 2). This percentage ranged from 75% for care of sick children to 85% for family planning (appendix 2). High satisfaction with health care is common across low-income and middle-income country surveys, but patient satisfaction as a measure of quality should be carefully interpreted. Although satisfaction is influenced by the quality of care, it is also influenced by care accessibility, costs, health status, expectations, immediate outcomes of care, and gratitude.A9 Additionally, satisfaction measures can be subject to substantial survey bias.A10 In the Commission’s internet survey of patient experience, we tested one factor thought to be influential in generating high satisfaction: low expectations for quality of care. Respondents were asked to rate the quality of care on the basis of short vignettes. A vignette that described a nurse changing the medication of a patient with hypertension without measuring blood pressure or asking about symptoms was rated as good to excellent quality of care by an average of 53% of 17 966 respondents across 12 countries, and as high as 62% of 1292 respondents in Senegal, suggesting a low threshold for what is considered to be good care (appendix 2). Low expectations of what constitutes good quality might be a consequence of the prevailing poor-quality care, low agency, and inadequate functioning mechanisms to hold systems accountable. Other studies have also shown that patient satisfaction surveys are influenced by acquiescence bias. Surveys framing statements in a positive way and inviting patients to agree or disagree will lead to positive responses much more frequently than surveys with more neutral statements.A10 More discussion on the utility of patient satisfaction as a measure of health system quality can be found in Section 4. *Panel references can be found in appendix 1. Other research has found that increased technical quality of health services, combined with responsive service delivery, fair treatment, better health outcomes, and financial risk protection, was associated with an increase in the probability of having trust in government. 29 Similarly, a better user experience (communication and time spent with providers) was associated with better trust in health systems in Latin America and the Caribbean. 112 Research suggests that quality, particularly that perceived by the patient, might have an effect on healthcare utilisation patterns, retention in care, and people’s decision to bypass facilities. 115,116 In the internet survey led by this Commission, more than half of patients who decided not to seek care in the preceding year (despite needing medical attention) stated that their decision was made for quality reasons (eg, poor provider knowledge, long wait times, or disrespect), as opposed to cost of care or distance to facilities. The highest proportion of patients was in Mexico, where 73% cited quality reasons for not seeking care. Similarly, a study 117 in Haiti found that higher quality primary care facilities were associated with higher utilisation. Perceived poor quality of care can also lead people to bypass certain facilities. Households might choose to travel further distances or pay more out of pocket to seek better quality care. 118,119 In India, many patients choose to seek care from the private sector, which is viewed as more competent than public facilities. India’s District Level Household and Facility Survey found that 51% of households bypassed their nearby public facility for their usual care; of these, 80% cited at least one quality concern as a reason (figure 8, appendix 1). Some people might also choose to bypass primary care facilities and seek care at hospitals or higher-level facilities for conditions that could be treated in primary care. 120 A survey 121 in China found that poor quality of care and lack of trust in primary care institutions were among the most common reasons for bypassing primary care and going directly to hospitals. Primary care is the cornerstone of a high-quality health system, serving as the main entry point for most concerns and playing a crucial role in coordinating care and ensuring continuity across health system platforms. Nonetheless, primary care facilities often fail to fulfil their role. Using facility surveys from nine countries, we built a primary care quality score based on three domains of quality—evidence-based care, competent systems, and user experience—and found an average score of only 0·41 out of 1, ranging from 0·32 on average in Ethiopia to 0·46 in Namibia (appendix 2). By contrast, some studies 122 have not found a relation between utilisation and measures of quality, such as doctors’ competence, probably because of information asymmetry. A crucial area for future research will be to estimate the demand response to higher quality of care, focusing on the role of information and perception of quality in influencing utilisation patterns. Figure 8 Proportion of households that report quality concerns as reason for bypassing public facilities in districts in India Data are from the fourth cycle of the District Level Household and Facility Survey done by the International Institute of Population Sciences from 2012 to 2014, in 21 states of India. A quality concern was defined as mentioning any of the following as a reason for bypassing government facilities: inadequate infrastructure, doctor not available, absent health workers, poor quality, drugs not available, inconvenient hours, long wait time, or distrust. In darker coloured districts, a higher proportion of households cited quality concerns. Economic benefit Improving health system quality can be justified on ethical, epidemiological, and economic grounds. Little evidence exists on the link between levels of quality of care and economic outcomes. Here, we describe three types of economic consequences that could be averted by high-quality health systems: macroeconomic effects of premature mortality, health system waste, and catastrophic or impoverishing health expenditures faced by households. A 2018 analysis 95 estimated the macroeconomic effect of mortality that could be prevented with access to high-quality care in LMICs. The analysis was done by use of two distinct approaches to quantify economic losses from preventable mortality. The first approach projected gross domestic product (GDP) losses over 15 years due to the consequences of mortality on labour force and physical capital accumulation. In 91 LMICs, amenable deaths due to insufficient good quality care would result in a projected cumulative loss of US$11·2 trillion (UI 8·6–15·2 trillion) between 2015 and 2030. This economic output loss was greatest in low-income countries, costing 2·6% of their GDP compared with 0·9% in upper-middle-income countries. 95 The second approach estimated the current value of total economic welfare losses on the basis of the concept of a statistical life, which attempts to capture the value placed on good health in and of itself. In 2015 alone, poor access to quality care resulted in an estimated $6·0 trillion of losses in 130 LMICs. 95 Upper-middle-income regions lost the least, whereas losses in sub-Saharan Africa accounted for more than 15% of GDP. This analysis shows that poor-quality care can result in a great macroeconomic burden that is inequitably distributed across countries. Beyond the economic losses from premature mortality, poor-quality care can also lead to important waste and inefficiency. Waste in health care has been defined as any “health-care spending that can be eliminated without reducing the quality of care”. 123 Health-care waste includes the overuse of unnecessary care or ineffective approaches, medical errors, unsafe care, incoordination of care, misuse (including inappropriate hospital admissions and bypassing), fraud, and abuse. There have been few measurements of health-care waste attributable to poor-quality care in LMICs. However, evidence from high-income settings suggests that averting these costs could help LMICs make better use of scarce resources. For example, the annual costs of extra hospital stays and readmissions for treatments of surgical site infections were estimated to range between $3·5 billion and $10 billion in the USA and between €1·47 billion to €19·1 billion in Europe. 73 Similarly, the global economic effects of antimicrobial resistance remain largely unknown, but in the USA alone, its yearly cost to the health system is estimated to range between $21 billion and $34 billion. 124 Lastly, the global cost of unnecessary caesarean sections done each year is estimated to be $2·32 billion, which far surpasses the cost of needed caesarean sections. 125 Because care delivered in hospitals has a greater risk of complications and is more costly, inappropriate hospital admissions also represent a substantial burden to the health system. High-quality primary care can prevent the need for hospital admissions for several health conditions called ambulatory care-sensitive. 11 In the USA, $31 billion are spent annually on hospital admissions for these conditions. 123 Better perceived quality and greater trust in health systems can also improve care-seeking patterns and reduce the bypassing of primary care facilities for overcrowded hospitals in LMICs. Finally, people living in countries with poorly functioning health systems, without appropriate financing mechanisms and insurance, risk suffering from catastrophic or impoverishing expenditures when seeking care. Out-of-pocket payments (ie, health spending made by patients themselves at the point of care) as a share of household consumption have been increasing worldwide. 126 In 2010, 808 million people (11·7% of the world’s population) incurred catastrophic health expenditures—ie, exceeding 10% of household consumption. 17 Catastrophic spending increased by 2 percentage points since 2000 and was associated with economic growth and per capita health spending. Nearly 100 million people are pushed into extreme poverty each year because of out-of-pocket expenses. 17 For poorer households, out-of-pocket payments often mean choosing between paying for health and paying for other necessities, such as food or rent, straining their day-to-day survival capacity and affecting their physical, social, and economic wellbeing. 127 High-quality health systems with appropriate financing mechanisms can enable facilities and providers to give affordable care to the population. To help reduce impoverishing and catastrophic expenditures, prepaid health expenditures should replace out-of-pocket payments. A study 128 published in 2018, found that the proportion of the population covered by health insurance schemes or by national or subnational health services was not associated with financial protection. Conversely, increased shares of prepayment in total health expenditure, typically achieved through taxes and mandatory contributions, were important for protecting people against catastrophic spending. 128 The economic consequences we have described could be attenuated or averted in high-quality health systems. However, improving health system quality will require additional investments in many countries. Analyses have suggested that these will be substantial but affordable in most settings, excepting the poorest countries. In 2017, WHO published 129 an estimation of the cost of interventions and health-system strengthening strategies required for reaching all SDG-related health goals in 67 LMICs. WHO estimated additional annual costs of $263 billion, which would save 97 million lives from 2016 to 2030. The estimated total costs per person ranged from $112 in low-income countries to $536 in upper-middle-income countries. The Disease Control Priorities Project 14,130 estimated the costs for reaching 80% effective coverage for 218 interventions, to meet UHC targets, in 83 LMICs and found that an additional $260 billion per year would be required. This represents $76 per person in low-income countries and $110 in lower-middle-income countries; this investment would result in 6·2 million deaths averted by 2030. Further research is needed to measure the costs of specific quality improvement strategies, including those advanced by this Commission. A health systems view must also be used to understand quality. This section addressed health care that is delivered at different levels of the health system, including through community outreach, primary care, and hospital care, and the linkages between them—referral systems and emergency medical services. Figure 9 summarises evidence on quality across these key health system platforms. Figure 9 Quality of care across health system platforms in low-income and middle-income countries (LMICs) DALYs=disability-adjusted life-years. HDI=Human Development Index. References can be found in appendix 1. Equity of high-quality care We have thus far reviewed the available evidence on quality of care at a national or multinational level. However, these estimates mask important variations within countries. Equitable distribution of high-quality health care is essential to make the gains in health set out by the SDGs and ultimately contribute towards the realisation of the right to health. We now explore why some groups are more vulnerable to poor-quality care than others and who receives worse quality care. Defining equity in the quality of health care Braveman and Gruskin 131 defined health equity as “the absence of systematic disparities in health (or in the major social determinants of health) between groups with different levels of underlying social advantage/disadvantage—that is, wealth, power, or prestige”. This definition emphasises equitable health outcomes. The health-care system is one major determinant of health, and equitable access to the system is, therefore, important. But equitable access will not result in more equitable health outcomes unless all people—not just the privileged—are able to access high-quality services. Equity in the quality of health care can be defined as the absence of disparities in the quality of health services between individuals and groups with different levels of underlying social disadvantage. Groups vulnerable to poor quality of care In 1971, Julian Tudor Hart 132 stated that “the availability of good medical care tends to vary inversely with the need for it in the population served.” There is evidence of this inverse care law in many health systems—LMICs and high-income countries alike. For instance, tuberculosis has a strong socioeconomic gradient between countries, within countries, and within communities. 133 Drug resistance arises in areas with poor tuberculosis control programmes and among subpopulations that face barriers to quality treatment. Similarly, a systematic review 134 focused on diabetes showed that low individual socioeconomic status and deprivation in the residential area are associated with worse process indicators and intermediate outcomes, resulting in higher risks of microvascular and macrovascular complications. The 2030 agenda for sustainable development is built on principles of universality and aims to ensure that no one is systematically left behind. 135,136 This commitment is echoed in the World Health Assembly resolution number 69·11, 137 which calls for “health system strengthening for UHC, with a special emphasis on the poor, vulnerable, and marginalised segments of the population”. Therefore, an effective implementation demands the defining and targeting of those most vulnerable. 136 WHO’s definition of vulnerability encompasses the effects of “marginalisation, exclusion, and discrimination that contribute to poor health outcomes”. 138 Vulnerability can vary substantially, change over time, and be multidimensional. 139 Factors such as gender, ethnicity, displacement, disability, and health status can increase vulnerability of both individuals and communities. These factors are often fluid and have intersecting points, presenting serious obstacles to individuals in accessing high-quality health services. 139 However, many countries fail to recognise the existence and impact of intersecting discrimination. As a result, the experiences and needs of these populations are not integrated into national health strategies, further entrenching the discrimination and disadvantage that they face. In this Commission, we highlight three dimensions that might make people especially vulnerable to poor-quality care: settings of care, conditions, and demographic factors (figure 10). Within settings of care, vulnerability is greater for individuals on the margins of mainstream services or displaced from home, such as those who are in a humanitarian crisis or in refugee camps, internally displaced, living in informal settlements, prisoners, and migrant populations. People with stigmatised conditions can face worse treatment in the health system than others; these conditions can include HIV and AIDS, mental health and substance abuse disorders, and some reproductive health services such as abortion. Finally, previously recognised social and demographic factors that indicate asymmetric power, such as gender, age, sexual orientation, ethnic group, disability, and insurance coverage, can predispose people to experiencing poor-quality care. Figure 10 Dimensions of vulnerability to poor-quality care Reasons for poor-quality care in these three dimensions include the collapse of health services, insufficient financial and human resources, low patient empowerment, barriers to continuity of care, insufficient legislative controls, and breakdown in trust between patient and system. These dimensions of vulnerability, along with an understanding of why these groups could receive poor-quality care and suffer worse health outcomes than others, can inform policies and programmes that target specific vulnerability factors. Panel 4: Why quality of maternal mental health care might suffer for vulnerable groups: perinatal depression care in primary care setting in Nigeria* Women with perinatal depression can experience stigma associated with mental illness in some low-income and middle-income countries. People with mental disorders are often victims of discrimination and denial of basic rights.A37 They can also internalise shame, anticipate rejection and discrimination, and accept diminished expectations from others. These two forms of stigma, enacted and felt, have the effect of exposing individuals with mental disorders to poor and inequitable quality of care. Therefore, in the context of perinatal depression, stigma would increase the likelihood that those suffering are denied access to the basic and often rudimentary services available. A formative study done as part of the project Scaling up Care for Perinatal Depression for Improving Maternal and Infant Health in Nigeria, assessed the factors that might promote or hinder the delivery of quality services to women with perinatal depression (appendix 1). All 23 facilities sampled had the lowest level of institutional support for continuous care for depression. Of the 218 patients who screened positive for perinatal depression by use of a validated tool, only three were identified by primary health-care workers. The treatment offered to these three patients was non-existent or grossly inadequate. None were provided with structured psychosocial interventions or offered specific follow-up to address their depression. However, 96% of the women in all sampled facilities reported that the quality of care provided in the clinics was good and of sufficient quality, and 98% reported that they were satisfied with the care they had received. The low capacity of all the sampled facilities to provide quality care for depression, and the extremely low detection rates of depression by primary health-care workers recorded in the study showed important gaps in both the organisational structures and the manpower capacity of the front-line facilities to respond to common perinatal mental health conditions in a fully functional integrated chronic care model. Despite the objectively rated poor quality of service being provided, the women using these facilities still rated them high regarding quality of care and personal satisfaction with the level of service provided. This paradox is an important indicator of the existing inequity in the system: people who have never experienced high-quality services set their expectations low and do not know how to demand higher-quality health care. Source: Olatunde Ayinde and Oye Gureje. *Panel references can be found in appendix 1. Panel 4 and panel 5 illustrate how conditions (eg, mental health) and settings of care (eg, humanitarian crisis or refugee camps) can exacerbate poor-quality care and what might be done to address these inequalities. Who receives worse quality care? The monitoring and tracking of equity in health intervention coverage has been the focus of major international efforts. 140 Many studies 61,140,141 have shown that some population groups are systematically less likely to have access to or use health services for several conditions. However, there has been less work done on equity in the quality of care. As described earlier in this section, quality of care varies between and within countries. Quality of care can also vary between certain population groups and across conditions in the same area. For example, a study 142 in Kenya showed that the quality of labour and delivery care was generally low, but care available to the poor was substantially worse than that for wealthier people. Similarly, it was found that in Madhya Pradesh, India, poor people living in poor communities received especially poor-quality care. 143 Additionally, poor people throughout the world live and die with little to no palliative care or pain relief. 36 We disaggregated several indicators of quality in maternal and child health presented earlier in this section by wealth, urban and rural residence, maternal age, gender, and education (appendix 1); we also assessed variation in quality between the public and private sector. We found evidence that quality care is inequitably distributed across these stratifiers. Regarding evidence-based care, figure 11A shows the proportion of women and caregivers reporting different elements of antenatal and child health care by wealth quintiles. We found evidence of a wealth gradient across most of these indicators. Among women attending antenatal care with a skilled provider, wealthier women were more likely to report receiving antenatal care assessments and appropriate preventive treatments and more likely to be retained in care until the fourth antenatal care visit. For example, among women attending antenatal care, we found that the wealthiest were four times more likely to report blood pressure measurements and urine and blood tests than the poorest women in their country (relative index of inequality 4·0, 95% CI 3·9–4·1). 45 When seeking care at facilities for pneumonia, children in the wealthiest quintiles in low-income countries were more likely to receive antibiotics than those in the lowest; among all children who received the first diphtheria, tetanus, and pertussis vaccine dose, those from wealthier families were more likely to complete the vaccination series (receiving the third dose by age 1 year) than children from poorer families. These inequities tended to be larger in low-income countries than in lower-middle-income and upper-middle-income countries. Figure 11 Equity in maternal and child health-care quality and in user experience in low-income and middle-income countries (LMICs) (A) Data are from Demographic and Health Surveys and Multiple Indicator Cluster Surveys done in 90 LMICs (2007–16); wealth quintiles are pooled across countries and sampling weights are adjusted to weigh countries equally. (B) Data are from Demographic and Health Surveys and Multiple Indicator Cluster Surveys done in 91 LMICs (2007–16) and are weighted using individual-level survey weights. (C) Data from Commission-led internet survey in 12 LMICs (2017); proportion of respondents who classified their experience for each indicator as “good”, “very good”, or “excellent” (vs “fair” or “poor”) for their last outpatient visit within the prior 12 months; education levels are pooled across country. Indicators are defined in appendix 1. ORT=oral rehydration therapy. DTP=diphtheria tetanus pertussis vaccine. Panel 5: Quality of humanitarian health services for populations affected by armed conflict and natural disasters* During 2016, there were 49 active armed conflicts with about 170 million people affected, including 60 million refugees and internally displaced people throughout the world.A38–A40 Additionally, an estimated 200 million people are affected by natural disasters annually.A41 These crises cause excess morbidity and mortality through multiple pathways.A42 One of these is the disruption of what are often already weak public health systems. In most crises, the health system undergoes substantial degradation and fragmentation, with the void left by reduced government activities often filled by faith-based, private, and informal providers.A43 There are logistical, safety, and practical difficulties in undertaking research during times of conflict that have led to insufficient data on the quality of health services being provided in these situations.A44 However, methods that have been used to assess the quality of care showed low levels of competent care and user experience, issues with staff motivation, and less complicated conditions receiving better quality care than patients who were seriously ill.A45 During the past two decades, humanitarian actors have undertaken various, largely normative, initiatives to promote health-care quality. However, accountability and enforcement remains low, and few humanitarian agencies have implemented health governance systems. Here, we discuss several challenges that need to be tackled to advance the quality agenda in the humanitarian health sector. First, the pursuit of quality remains weak and needs to be incentivised. For example, donors of humanitarian activities should place greater emphasis and funding on strengthening the use and reporting of quality standards and performance metrics. Failure to collect and report these data should have consequences for agencies, such as removal of permission to operate and loss of funding. Second, quality is impeded by insufficient capacity within the humanitarian health workforce. Efforts to professionalise the humanitarian health workforce need to be scaled up through training and updated technical standards and competency frameworks. Third, existing coordination mechanisms need to evolve into technical leadership arrangements, whereby, in exchange for the benefits of taking part in coordination (eg, access to specific funding pools), actors agree to operate according to a standard package of care and specific service quality standards. Fourth, governments need to explicitly consider crisis areas when implanting health interview and population surveys. The actors in these areas should collect data in a way that matches the quality indicators defined by the public health information systems, including assessment of confidence in the system. Lastly, health governance in the humanitarian systems remains weak. Robust governance arrangements, ideally interagency, need to be established to develop concrete accountability and liability in the humanitarian health sector. Source: Bayard Roberts and Francesco Checchi. *Panel references can be found in appendix 1. We also found important urban–rural differences in several of these quality indicators, whereby women and caregivers in urban areas were significantly more likely to report better maternal and child health-care quality than those in rural settings (figure 11B). These urban–rural differences were also largest in low-income countries. In terms of user experience, this Commission’s 12-country internet survey also showed that people with some primary education consistently rated their user experience as worse than did those with secondary education or higher (figure 11C). The largest gap was found in the rating of the overall quality of the last outpatient visit, for which people with primary education or less reported significantly lower quality than did those with more education. A total of 34% of respondents reported that staff had treated them poorly because of their identity and, of those, 10% attributed this to their poverty (appendix 1). These inequalities could be underestimated because studies have shown that less educated people tend to be more accepting of the care they receive. 144,145 Additionally, adolescent women seeking maternal and child health care can also face particular stigma and poorer quality care (appendix 2). Among women attending antenatal care and delivering in health-care facilities, young adolescents were less likely to report receiving different elements of care than women aged 20–35 years. Younger mothers were less likely than others to receive post-partum checkups before discharge after giving birth in a health-care facility. The youngest adolescents (15-year-olds) appeared to be substantially less likely to receive all four recommended antenatal care visits, and their children were less likely to complete the diphtheria, tetanus, and pertussis vaccination series. An analysis of data from the STEPS survey on receipt of lifestyle advice from health-care providers among adults diagnosed with diabetes, hypertension, or hypercholesterolaemia found that women were less likely to receive advice about tobacco use and physical activity than men, and overall, those with no formal schooling were more likely to receive advice about tobacco use and dietary change than those with primary or secondary schooling. Individuals with secondary schooling were more likely to receive advice about physical activity, maintaining a healthy bodyweight, or losing weight than those with primary or no schooling. Additionally, evidence from the Prospective Urban Rural Epidemiology study 146 found that the use of medication for secondary prevention of coronary heart disease was extremely low, with people in the poorest countries having the lowest rates of use. Within countries, women and rural dwellers had lower use than men and urban dwellers; less educated patients were less likely to use antiplatelet drugs and statins than more educated patients. Quality can also differ between public and private facilities, but these differences vary across contexts. Such differences also depend on the types of provider included in the definition of private sector. In terms of evidence-based care and competent systems in the Democratic Republic of the Congo, Kenya, Rwanda, and Uganda, adherence to WHO guidelines for sick child care was higher in private facilities than in public ones. Additionally, adherence to checklists was higher among private providers than among public ones in a standardised patient study 52 in India. However, an analysis 147 of household surveys in 46 countries found that public and private sectors did similarly in terms of antenatal care quality. By contrast, a systematic review 148 in LMICs found that private sector providers (including unlicensed and uncertified providers) were less likely to follow medical standards of practice, had poorer patient outcomes, and reported lower efficiency than public sector providers, resulting partly from perverse incentives for unnecessary testing and treatment. For user experience, public providers did worse in terms of timeliness and hospitality to patients than private providers. 148 Nonetheless, quality can vary considerably within the same sector in a country. Additionally, country differences were found to be more influential than all other subnational factors combined in explaining variation in the quality of primary care services and labour and delivery care. 38 This finding might point to the importance of structural factors in producing quality. Panel 6: Section 3 key findings Previous right-to-health discussions did not sufficiently elaborate on the quality of health services promised to people Spending scarce resources on expanding access to services without ensuring quality is wasteful and inefficient; as countries embark on universal health coverage, services should be accompanied by a national guarantee of quality Quality improvement efforts should start in areas with the greatest quality deficits, with a focus on care received by disadvantaged populations There are concrete mechanisms available to improve health system accountability; this lies at the core of realising the right to the highest attainable standard of health for all people Section 2 conclusion The epidemic of poor-quality care described in this section casts doubt on the ability of legacy health systems to achieve the SDG health targets. Poor-quality care in LMICs is reflected by inadequate adherence to evidence-based care, negative patient experiences, unequal treatment and access to health services, and by deficiencies in safety, prevention, continuity, and timeliness, leading to poor health, adverse economic outcomes, and loss of trust and confidence in health systems. Additionally, poor and vulnerable groups appear to experience worse quality care. Despite the breadth of the evidence presented in this section, there were still many gaps in the availability of data on quality of care (appendix 2). Poor-quality care has been attributed to the poor knowledge and competence of providers and to fatigued or unmotivated health workers. However, the scale and range of the problem across countries, settings, and health conditions suggests that it is a manifestation of a broader systems failure. LMIC health facilities are underequipped, overcrowded, and frequently understaffed. Pre-service education and specialty trainings are inadequate. Processes are inefficient or inexistent, including financial incentives and remuneration of providers, referral networks, and triage in emergency departments. These fragmented health-care systems are unable to support health workers in providing high-quality care. Section 3: The ethical basis of high-quality health systems The core principle of this Commission is that health systems are for people. This section asks: are they for all people? We review the right to high-quality care and provide insights into steps that national governments and communities can take to address the issue of equity and build a strong high-quality health system that targets the poor and vulnerable groups. The key findings of this section are shown in panel 6. Implementing the right to high-quality care through a national quality guarantee What is the right to quality care in settings with few resources? The health and human rights agenda has been essential to motivating investments and actions to improve health in LMICs, as well as globally. This agenda historically emphasised inputs and access to care, but did not specify the quality of services provided. In 2000, 13 the UN Committee on Economic, Social, and Cultural Rights adopted general comment 14, which states that the right to the highest attainable standard of health includes availability, accessibility, acceptability, and quality. In a review for this Commission 149 of global health policy milestones since 2000, we found that the global discourse has been focused on access to care and foundations of quality, but not enough appears on processes of care or quality-specific impacts, such as trust or satisfaction. However, with the implementation of the 2007 WHO framework for action on strengthening health systems to improve health outcomes and the 2016 WHO framework on integrated, people-centred health services, the trend is moving in the direction of patient-centred care and measures of quality focused on processes of care. Health systems should communicate the right to health through a national health plan, initiatives to ensure that the public knows its entitlements and how to realise them, and data on health system quality. 30 Are there ethical trade-offs between improving quality and expanding access? One reason that quality has lagged behind access in global health discussions is the perceived trade-off between expanding coverage and improving quality. A trade-off is a compromise between two or more desirable, but competing considerations and, thus, involves a sacrifice made in one dimension to obtain benefits or ensure respect for rights in other dimensions. 150 There was (and still is in many low-income countries) an understandable sense of urgency to expand essential services to the population at any cost—without an explicit focus on quality. This finding can be interpreted as the result of a trade-off made by decision makers: equitable access for all is better than access to high-quality services for some. Quality is essential to the equity agenda. We recognise that on the high end of care, such as expensive advanced technologies and medicines, provision of cheaper and somewhat less effective treatments can be an appropriate option in low-resource settings. One example is the use of the visual inspection with acetic acid method for cervical cancer screening instead of the more expensive and time consuming Papanicolaou smear and human papillomavirus co-testing. 151 However, we believe that a concern for equity implies access to a minimally assured level of quality for all. There are two reasons for this: ethical achievement of health outcomes and efficient use of resources. First, increased access will not translate to better health outcomes for disadvantaged people unless all people have access to high-quality services. Second, spending scarce resources to expand access without quality is wasteful and inefficient. Countries can build on their achievements in expanding coverage by improving the quality of services offered to meet the minimum quality level. They can then consider further expansion of quality services. As countries pursue UHC, approaches such as progressive universalism—a determination to include people who are poor from the beginning—have proven to be effective ways to target poor and vulnerable groups of society. 152,153 Brazil’s Family Health programme 154 and Mexico’s Seguro Popular initiative 155 are two examples of programmes designed to increase coverage first among disadvantaged groups. This Commission endorses this approach. Defining a national quality guarantee Many countries recognise the need to be accountable for the health care of the population. One clear manifestation of this is patients’ rights charters that outline a country’s approach to patient care and provide an ethical basis for care. Although these charters contain many of the same basic principles, such as legal and human rights guarantees, they vary substantially in length, scope, and detail. Patients’ rights charters are well intentioned, but not operational. South Africa is attempting to make its promises actionable through its National Health Insurance Policy, which underpins the establishment of a unified health system based on the principles of social solidarity, progressive universalism, equity, and health as a public good and a social investment (appendix 2). This Commission recommends that countries adopt a national quality guarantee—ie, quality sufficient to consistently produce a health benefit. This would be concrete and operational for covered services. What are the elements of such a guarantee? First, clearly poor-quality services, providing more harm or risks than benefits, fall below the thresholds of a guarantee. Second, the quality of services must be sufficient to generate health benefits. For example, a rural clinic should specify to the patient the level of services that it is competent in providing. Third, services must be provided in a respectful people-centred manner. An integral aspect of people-centred health systems is the relationship between provider and patient. Patient–provider relationships are shaped by societal norms and are susceptible to power imbalances. Pre-service and in-service training on respectful care is one way to improve the ethical competence of providers in low-income settings. 156 However, to end the poor treatment of patients and greatly improve health care, people-centred and patient-driven approaches that shift the power from the health-care system and providers to the patients are needed. 157 The quality guarantee should accompany any efforts to expand service coverage; in many countries, the movement to UHC is an excellent starting point. National standards for conditions covered by a UHC benefit package might include descriptions of adequate assessment and diagnosis, treatment and care, assurance of continuum of care, and referral. This is a corrective to the current UHC discussion that revolves around the pooling of funds to expand the coverage of populations and services while decreasing the cost. Without building in quality, the increased coverage will not result in health gains for people. Although many countries can do more to provide quality health services with existing funds, others will require additional funds. Data from WHO 4 show that global government spending on health as a percentage of all government expenditures rose by an average of 10% between 2000 and 2015; however, it was flat in lower-middle-income countries, and fell substantially in low-income countries—the very countries struggling with poor-quality care. Beyond these general considerations, countries need to undertake analyses and open discussions to specify their national standards. National guarantees should start with the reality of social norms and health system functions and be context-specific. 158 Guarantees will depend on budget, setting, disease type, intervention, and delivery platform. Current national standards are often defined and implemented through standard operating procedures or clinical practice guidelines. Standards included in the national quality guarantee should be developed by health policymakers and professionals, in collaboration with users and national regulatory agencies, to ensure that upholding the guarantee does not fall solely on providers. The guarantee is not intended to be punitive against individual providers; any redress mechanisms should be targeted to the appropriate level of the health system. Improving accountability for quality Over the past three decades, the concept of accountability in provision of health care has gained increased attention. However, accountability for quality in health care has been less explored. In this subsection, we refer to Brinkerhoff’s definition 159 of accountability, which encompasses both answerability and enforceability. The three general categories of accountability are financial, performance, and political or democratic. In this section, we use elements of financial and political or democratic accountability to discuss legal and social mechanisms. Performance accountability is discussed in the subsequent sections. For accountability to function, there must be actors responsible for activities, standards to define what actors should deliver, agents to hold actors to account, and tools or methods to do so. A review done for this Commission on the accountability ecosystem and its relation to the delivery of quality care (methods in appendix 1) supported the notion that accountability mechanisms can serve as a catalyst to initiate and sustain improvements in quality and advance the progressive realisation of the human right to health and quality health care. The review found that multiple accountability tools have been used, and documented in the peer-reviewed literature, to improve access to essential and effective health care (appendix 1). A key finding of the review was that single interventions do not have the power to induce large-scale change. Additionally, governance and coordination must be strengthened, resources must be planned and budgeted, and a performance monitoring system must make the information collected available. Therefore, to improve quality, countries need to devise accountability strategies that encompass elements of legal and social accountability. Legal accountability National governments are the primary agents for accountability. Human rights conventions can provide the basis for legislation that recognises the right to health and health care, and can be an essential and minimum foundation for approaches to improve access and quality of care. Meaningful legislation should not only recognise the right to health and health care, but also cater for the right to meaningful public participation, freedom of civil society, and freedom of information. Where such legislation exists, it can be used for accelerating action. Quasijudicial mechanisms exist in many LMICs, such as the ombudsman in South Africa tasked with addressing the system failures that led to the deaths of 94 mental health-care users. 160 Also in South Africa, the Treatment Action Campaign defeated the Government in the constitutional court to increase access to HIV treatment to mothers and newborn babies. A high court in Kenya awarded a woman 2·5 million Kenyan shillings for mistreatment and abuse during childbirth, which was caught on film. 161 Additionally, in Malawi and Mozambique, human rights concerns and entitlements were used by civil society organisations to expand national policy for maternal, newborn, and child health. 162 Social accountability Social accountability refers to approaches that involve communities, citizens, and service users directly; these approaches include attempts to increase community involvement, awareness, and demand generation for high-quality care. 163 A 2004 World Development Report 164 suggested that social accountability tools could be used to increase transparency and accountability, shortening the long route of democratic accountability between citizens and politicians. Multiple tools are available to foster social accountability. They include citizen report cards, community monitoring, social audits, participatory budgeting, citizen charters, and health committees. Mechanisms for creating and acting on such tools exist in LMICs today. Institutions tasked with reporting on quality-related indicators include the Health Data Advisory and Coordinating Committee in South Africa and the General Directorate for Quality Healthcare and Education in Mexico. 165 There are licensing and assessment activities with internal and occasionally public reporting, such as the Ideal Clinic in South Africa, Big Results Now project in Tanzania, and the Kenya Patient Safety Impact Evaluation. 166–168 Finally, direct public reporting of local progress can be effective, such as Imihigo, 169 the televised reporting of progress on commitments by local leaders in Rwanda, including maternal health outcomes. These social accountability mechanisms should be seen as complementary rather than substitutes to the legal approaches previously discussed. Panel 7: Actions to support legal and social accountability A literature review done for this Commission aimed to present findings on the accountability–quality relationship and explore how accountability mechanisms contribute to improvements in quality of care. The review focused on legal and social accountability mechanisms pertaining to reproductive, maternal, and child health. The key findings were synthesised and the following actions were identified as important for effective and transparent accountability: Adopt and enact legislation that recognises the right to health and quality health care Invest in rights awareness and education at all levels, including among policy makers, parliamentarians, programme managers, service providers, and the public Share information on health system performance with the public and promote transparency of quality measurements Institutionalise mechanisms for remedy and redress, such as ombudsperson or tribunals Develop multipronged strategies for accountability for quality of care that combine legal, performance, and social accountability tools Methods are described in appendix 1. Source: David Clarke, Rajat Khosla, Blerta Maliqi, Marcus Stahlhofer, and Bernadette Daelmans. Panel 7 synthesises the key findings from the review on legal and social accountability and proposes actions to support effective and transparent accountability at the national level. Section 3 conclusion Health systems should give priority to poor and vulnerable groups of society to reduce inequities and expand the right to quality health care through progressive universalism. A movement towards UHC offers countries the opportunity to start on this path by expanding coverage tied to a national quality guarantee. Legal and social accountability mechanisms can assist in upholding these quality standards. Enacting accountability is predicated on insight into current health system quality. In the next section, we assess the purpose, status, and promise of health system quality measurement. Section 4: Measuring health system quality The key findings of this section are shown in panel 8. Why measure health system quality? Valid and reliable information is a necessary input to a high-quality health system. 170,171 Multiple national, international, and global efforts are underway to identify measures to improve care delivery and amplify patient voices. These efforts include the National Quality Forum in the USA, the Health Data Collaborative, and initiatives undertaken by OECD, the Inter-American Development Bank, and the China Joint Study Partnership. 70,76,172–175 These efforts show that the measurement of health-care quality is a concern of populations and governments around the world; high-income settings, in particular, have invested in institutions to strengthen health system performance through measurement. Although some efforts, such as the Health Metrics Network, have included LMICs, country ownership of this agenda has been inconsistent, and progress on health system measurement remains incomplete. Panel 8: Section 4 key findings Accountability and action are the guiding purposes of quality measurement; measurement not used for these purposes can burden the health system. Current quality measurement is fragmented by disease, focused on inputs rather than outcomes, and poorly aligned to population health needs. Decision makers do not have timely information that provides a picture of the health system as a whole. National and global actors should seize three opportunities to improve measurement of health system quality: (1) measure effective coverage—use quality-corrected coverage metrics to track progress towards UHC; (2) adopt fewer, but better measures by shedding inefficient indicators and prioritising measures of system competence, user experience, and outcomes, including clinical and patient-reported health, confidence in the system, and economic benefit; (3) invest in country-led quality measurement, including strengthening national capacity for data use and policy translation, releasing an annual health system quality dashboard, and disaggregating results for vulnerable populations. Indeed, the findings described in Section 2 on healthcare quality in LMICs reveal crucial measurement gaps. Existing data on quality of care have largely been generated within vertical programmes, resulting in measures that have not been combined in ways that could illustrate quality of the health system as a whole, whether at local or national levels. 176 Systematic data on the performance of health system platforms (such as primary care) or on user experience, population confidence, and patient-reported health outcomes are scarce. Moreover, research on health system quality—including the policy and implementation research urgently needed to bring effective interventions to scale—has not kept pace with the magnitude of the challenge, reflecting inadequacies in measurement approaches and data use. A bibliometric search for quality-related research between 2000, and 2016, revealed that, although this type of research is increasing in LMICs, it remains overwhelmingly located in high-income countries (appendix 2). The demands made of health systems are growing: the burden of disease is shifting towards non-communicable diseases and injuries, 6 health emergencies are rising, 177 countries are actively moving towards UHC, 17 and people are demanding better services and outcomes. 119 The health priorities of the SDG 178 era—with ambitious targets of improved survival and quality of life for all—demand new approaches that promote accountability and action to drive broad health system improvements. To meet these challenges, measurement approaches need to be responsive to new health system demands, relevant to people, and efficient. At the heart of this reframing is the question: why measure and for whom? This Commission proposes two main purposes for the measurement of health system quality: accountability and action. Accountability requires the provision of information when questioned, whether for routine monitoring or detailed justification, paired with a mechanism for oversight. 159 This section focuses on measurement for performance accountability—how the health system delivers on its intentions—and social accountability—whether it is responsive to society. 159 The measurement of performance accountability should show results against benchmarks, support crossnational or subnational comparisons, disaggregate evidence for vulnerable subpopulations, and do this in or near real time. For both performance and social accountability, data will typically need to be representative of the target population, comparable, and systematic. Measurement should further include elements that are of high value to people; for example, they should include not only health outcomes such as survival, but also function, pain, and processes such as respectful treatment (panel 9). Panel 9: What different measures tell us about health system quality* Measures of health system quality have usually been organised into inputs (eg, workforce, tools, facilities), processes of care (eg, adherence to guidelines, communication), and outcomes (eg, morbidity, mortality).A46 In low-income and middle-income countries, many quality measurement and improvement efforts have emphasised inputs to health services. Inputs are foundational to health-care provision and are easily measured, but they provide narrow insight into quality of care. Studies have found weak associations between input measures and care competence,A47 particularly when facility size is considered.A25 The relation between input availability and the quality of care received can differ over the course of care delivery,A48 underscoring the need for motivated and competent providers and supportive systems for good care delivery. Similarly, multiple studiesA49,A50 attest to the know–do gap: the deficit between provider knowledge and the clinical care provided. These issues do not mean that input measures are unimportant; indeed, timely and specific information on inputs, such as stock levels and equipment functionality, is crucial for health service planning and operation and should be collected by health systems. However, these measures should not be used as indicators that health systems are providing high-quality care. Process measures can play an important role in illuminating the quality of care provided. These measures are immediate and relevant at the point of care, and they provide direct insight on care provision without risk adjustment, which makes them particularly valuable in assessing gaps or disparities in care for vulnerable subpopulations.A51 A judicious selection of process measures is essential, emphasising measures validated against the outcomes that matter to patients,A52,A53 whereas overmeasurement can divert provider time and weaken the quality and usefulness of data. The proliferation of process measures in high-income countries, for example, has increased the burden of measurement and resulted in unintended consequences, including fixation on the measure rather than the intent, reallocation of efforts towards meeting measurement targets and away from other essential tasks, and gaming (manipulation of the quality assessment system).A54,A55 Health outcome measures attest to the central goal of a health system—maintaining or improving health and wellbeing.A56,A57 However, these measures can be challenging to attribute directly to health system performance because of the involvement of multiple factors. Baseline risk information is required for valid comparisons of health outcomes over time or between facilities.A58 Despite this complexity, there is global recognition of the crucial need for health-system-sensitive and patient-focused outcome measurements, even in very low-income settings.A56 Health-system-sensitive outcomes include perioperative mortality, inpatient suicide, 5-year cancer survival, obstetric fistula, caesarean section, unsuppressed HIV viral load, uncontrolled blood pressure, lower extremity amputation in patients with diabetes, and hospitalisation due to ambulatory care-sensitive conditions. High-income settings are increasingly turning to patient-reported outcomes (PRO) as a means of realigning health care with patient values.A59 PRO measures have been used to improve monitoring, decision making, and patient–provider communication,A60 with evidence suggesting that the use of these measures improved patient perceptions of careA61 and led to better health outcomes for some conditions,A62 although their usefulness in aggregate has yet to be fully demonstrated.A63 Routine measurement and the use of health-system-sensitive outcome data and PRO are integral to achieving patient-centred health systems. *Panel references can be found in appendix 1. Measurement for action is at the heart of learning health systems. These measures should provide decision makers with answers to specific questions about the functioning of the health system and the quality of care delivered, help identify the targets and interventions for improvement, and monitor the results of the changes implemented. Quantitative data should be complemented by so-called soft intelligence, the insight on the context and processes of care delivery, to help inform action. 179 The focus of measurement for action is likely to differ in a complex adaptive system: acting directly at the level of the process indicator (eg, attempting to address poor adherence to guidelines with printed reminders) might not yield expected effects if the indicator merely signals a deeper quality deficit at the health-system foundation level. 158,179 Measurement for action is discussed further in Section 5. Fulfilling either purpose of measurement—ie, for accountability or action—requires valid and reliable measures, transparency in information exchange, and an entity with the power to demand a response. Panel 10 outlines conditions required for measurement to induce change. To meet the SDG targets and improve health system quality by 2030, countries will need to embark on a measurement agenda that will take time and investment to fulfil. This agenda starts by knowing what is currently being measured. What is—and is not—measured in LMIC health systems today Multiple strategies have been used to capture the range of information needed to assess health system quality, including measuring population health needs, health outcomes, and health system performance. Table 2 describes the platforms in use and their best application; given the multiplicity of tools, central organisation and triangulation are needed to gain insight and act on these data. We, and others, have found that health system data collection is often costly, uncoordinated, and disconnected from decision making. 173,180 Tools and indicators are fragmented by disease and funding source, with inadequate harmonisation and few national plans for coordination and data use. 180,181 For example, 26 different bilateral, multilateral, governmental, and non-governmental organisations fund health information systems in Kenya, resulting in duplication of efforts and uneven distribution of resources within the country. 182 120 distinct digital health-related information systems operate in Tanzania. 173 More than 1000 indicators are collected at the national level across the three major public health systems in Mexico, but only 27 overlap at least two health systems, preventing comparison and standardisation. Table 2 Platforms for health system measurement Frequency Level of collection Relevant quality subdomains (Commission framework) Best uses in measuring high-quality health systems Administrative data (eg, HMIS) Routine Individual level data aggregated by condition within facilities and then by geographical unit Population (care seeking), competent care and systems, and health outcomes Monitor facility and clinician performance; monitor health status at the community and district level Electronic health records Routine Individual patient Population (care seeking), competent care and systems, and health outcomes Inform clinical care; monitor facility and clinician performance; monitor health status at the community and district level Population surveys Periodic or continuous * Population Population (care seeking), user experience, selected health outcomes, confidence, and economic benefit Represent both users and non-users of the health system; permit analysis of equity for subpopulations; have potential to be adapted for innovations in measurement, such as patient experience and patient-reported outcomes Facility assessmentst Periodic or continuous * Health system Workforce, tools; with observation or exit interviews: competent care, user experience, and confidence Generate a representative assessment of health systems for subnational and national benchmarking; allow for assessment of user perspectives Patient registries Routine Individual Health outcomes, user experience, confidence Monitor patient-reported experience and outcomes measurement over time Vital and civic registries Routine Population Health outcomes Monitor population health status; form basis for policy guidance, projections, and planning HMIS=health management information system. Commission framework is depicted in figure 1. * Continuous household and facility survey methods that permit regular data synthesis, review, and health programme decision making have been proposed as an alternative to one-off surveys,A83 tested subnationally,A84 and adopted, for example by Peru since 2004, and by Senegal in 2012. tFacility assessments can include audits of structural inputs, interviews with health-care workers, direct observation of care, and exit interviews. References can be found in appendix 1. Panel 10: From measurement to action* Measurement alone will not ensure health system quality. Actionable information must reach agents capable and empowered to use it to effect change in the health system. Freedom of information—the right to access information held by public bodies—was enshrined in the 1948 Universal Declaration of Human Rights and has been adopted into law by more than 90 countries.A64,A65 Applied to health systems, freedom of information demands transparency of data within the system and to the public.A66,A67 High-quality health systems are not automatically produced by governments. A regulatory system that engages an array of actors should hold the system to account for high-quality care. This system includes formal mechanisms such as audits, ombudsmen, and courts and informal actors such as patients, the press, professional organisations, and civil society.A67,A68 A range of barriers can inhibit the flow of information about health systems. Power differentials can stymie communication, restricting the transmission of and responsiveness to local knowledge;A69–A71 hierarchical norms and fear of reprisal can inhibit incident reporting about health-care failures;A72 and, ironically, a surfeit of indicators in routine measurement systems can prevent the ready understanding and use of locally relevant information.A69,A72–A74 Although governments often claim to want to reach users through open government initiatives, scant attention to how people understand and use information has led to an abundance of information but a minimal effect on care seeking and other outcomes.A75–A77 Countries have the opportunity to take better advantage of increasing health system data by building trust in data, promoting learning cultures within the health system, and ensuring freedom of information. Obligatory reporting with data audit trails or data quality assurance institutions can bolster confidence in the indicators generated.A74,A79 A culture of information and learning within and across health facilities can lead to greater transparency and action.A69,A80 For instance, facility audits and licensing exercises should include clear criteria for improvement and result in non-punitive responses, such as support for addressing deficiencies.A81,A82 To ensure freedom of information, formal protection for whistle-blowers is an important guarantee, although a culture of secrecy and professional protectionism should also be addressed.A72 The free operation of traditional and social media can provide external accountability levers.A67,A69 Open government initiatives are an initial step, but their success should be judged on the basis of information use, not on quantity of data released. One path to fulfilling these opportunities is the development of a national body for monitoring health system quality, informing the public, and identifying and responding to failures, to serve as a locus for measurement, accountability, and action. *Panel references can be found in appendix 1. The proliferation of indicators burdens health-care workers and systems. In sub-Saharan Africa, an estimated one-third of health-care providers’ time is spent on recording and reporting. 173 Health facility assessments cost a minimum of $100 000 per national survey and typically many times that amount, 183 but are rarely used for national planning. Furthermore, fragmentation of these and other data sources prevents the coherent assessment of health system performance, to say nothing of actions in response to the data. To understand how well this plethora of tools measures health system quality, we analysed multicountry health system indicator sets or surveys and sample national indicator sets from LMICs against this Commission’s quality framework (figure 1; appendix 1). Quality frameworks do not imply a need for equal measurement of each subdomain for all health services and conditions, but they do make apparent the multiple aspects of quality and highlight duplication and gaps. Measurement sets focused on the foundations of care, with global sets devoting 47% of indicators to this domain, crossnational sets devoting 70%, and national sets devoting 44% (figure 12). Inputs, such as tools and workforce, were the most commonly assessed subdomains and formed the entirety or bulk of the Service Availability and Readiness Assessment (SARA), Service Delivery Indicators, and Service Provision Assessments; our findings were consistent with existing research on the predominance of input measures in health system survey tools. 184 All assessed sets, except SARA, addressed competent care processes, particularly care delivered, such as oral rehydration solution for children with diarrhoea. Although global and national measurement sets included population health outcomes such as neonatal mortality rate, user experience and non-health effects were sparsely measured across all sets. Figure 12 Representation of quality subdomains in global, crossnational, and national measurement sets We mapped indicators against domains of the high-quality health systems framework (figure 1), identifying the single domain most relevant for each indicator. We additionally classified indicators as patient-reported if the data were collected with individual self-reports. Full methods are detailed in appendix 1. Cells are coloured by greatest number of indicators per row (source), with red indicating 0, orange and yellow the midrange, and green the maximum number observed for that measurement set. DHIS2=District Health Information System 2. DHS=Demographic and Health Surveys. HIS=Health Information System. HMIS=Health Management Information System. IMSS=Instituto Mexicano del Seguro Social. IPCHS=Integrated People-Centred Health System. ISSSTE=Instituto de Seguridad y Servicios Sociales de los Trabajadores del Estado. OECD=Organisation for Economic Co-operation and Development. SARA=Service Availability and Readiness Assessment. SDG=Sustainable Development Goals. SDI=Service Delivery Indicator Survey—health. SPA=Service Provision Assessment. * Population, governance, platforms, workforce, and tools. The extensive collection of input measures is problematic. When collected through surveys, input data are quickly out of date and thus lose usefulness for supply planning. Moreover, our analysis found that readiness metrics are only weakly connected to the content of care delivered. 35 Although the outcome indicators identified in this analysis are valuable for monitoring population health, we found few health-system-sensitive outcomes and almost no patient-reported outcomes. The remaining measures in global sets pertained to competent care and, to a lesser extent, systems. Much of this measurement is focused on a subset of conditions, mainly maternal and child health and infectious diseases. Even in these areas, the validity of indicators collected raised concerns: for example, household surveys are not well suited for identifying children who truly have pneumonia to estimate appropriate treatment, and maternal morbidity and mortality in hospitals greatly exceeded the estimated rates based on documented administration of essential interventions. 100,185 The validity of tools measuring clinical care is discussed in appendix 2. In summary, the available measures do not promote accountability for high-quality health systems. Globally funded facility surveys overmeasure inputs that provide inadequate value for accountability. At the national and global levels, health system measurement is insufficient to assess performance of the health system as a whole and inadequate for holding the system accountable to people for the user experience provided or the effect on impacts—health and non-health—that matter to patients. Data quality Data must be of adequate quality to be used for accountability or action. 186 Efforts in the past few years have identified dimensions of data quality such as completeness and timeliness, internal consistency, external consistency, and external comparisons, although assessment tools focus mainly on completeness and accuracy. 186 Routine health information systems, whether individual-level electronic health records or aggregate reporting such as the District Health Information System (DHIS) 2, provide information on the use and content of care that, if the data are of adequate quality, should form a crucial element of health system measurement for accountability. 187 34 LMICs—chiefly upper-middle-income, but including 13 low-income and lower-middle-income countries—had adopted national electronic health records systems by 2015· 188 41 LMICs, including 23 low-income countries, use DHIS2 at a national scale for aggregate reporting from electronic or paper registers in facilities. 189 Notably, private sector facilities can be included under national health management information systems, although their participation and data completion are often low. 148 Barriers to robust implementation and use of electronic health records and DHIS include restricted ownership by end users, scarce training on data skills, lack of motivation and engagement by overburdened health workers, large numbers of indicators required, and inadequate functionality of electronic platforms. 181,190 As a result, data quality in routine health information systems is poor, with vertical programme assessments often identifying high prevalence of missing or inaccurate data. 181,191 New evidence from Kenya, Nigeria, and Mexico suggests that such deficiencies in data quality also pertain to indicators of health system quality (appendix 2). Moving forward: three opportunities to measure better Opportunity 1: Measure effective coverage Countries should incorporate measures of quality within a broader health system assessment to appropriately track the value of the health system. The geographic availability of facilities overstates health system performance: reduced mortality due to acute abdominal conditions was associated with proximity to well resourced hospitals in India, but not with access to lower-quality hospitals. 192 New analysis suggests that this relation also occurs for obstetric conditions, acute surgical conditions, and time-critical adult infections in India, but less certainly for myocardial infarction (appendix 2). Even basic process indicators provide greater insight into hospital capacity than the availability of a facility or equipment. Service coverage monitoring that does not explicitly include quality will similarly overestimate health system performance and will do so substantially in many cases because of quality deficits. Achieving UHC requires effective coverage, such that “people who need health services obtain them in a timely manner and at a level of quality necessary to obtain the desired effect and potential health gains.” 193 The current monitoring of UHC does not reflect this. Figure 13 lists the current coverage indicators for monitoring UHC specifically and the health-related SDGs more broadly. Only one of these indicators (effective treatment coverage for tuberculosis) captures the health system effect on population outcomes. Calculating effective coverage requires defining the population in need, access to care, and receipt of quality care. 194 In figure 13 we also provide illustrative effective coverage indicators to suggest directions for future monitoring, and indicators for additional conditions are in appendix 2. Research is ongoing to identify standard indicators for many SDG conditions. Some indicators are available but need to be better implemented (eg, HIV), others need to be refined by selecting the best indicators and determining efficient methods of collection (eg, maternal health), and others still need to be developed de novo or validated for use at scale in low-resource contexts (eg, substance use). Figure 13 Illustrative indicators for advancing Sustainable Development Goal (SDG) monitoring from coverage towards effective coverage Tier 1=priority action is implementation (routine or targeted, as for immunisation). Tier 2=priority action is determining efficiency in indicators and data collection. Tier 3=priority action is development of valid indicators for use at scale. IMCI=Integrated Management of Childhood Illness. *Excludes health indicators focusing on population outcomes alone. †Six indicators not shown: two primarily measuring determinants outside the health system (tobacco use and access to basic household sanitation) and four service capacity and access indicators. References can be found in appendix 1. Care cascades are an extension of the concept of effective coverage: instead of a single number, cascades break performance along the continuum of care to allow analysis of health system function. 195 Cascade steps typically follow a patient population from health need through diagnosis, timely treatment, disease control, wellbeing, and survival. With each step conditional on the previous one, cascades illustrate health system failures in functions such as diagnosis, retention, and evidence-based care, while linking system performance to patient outcomes. Although specific indicators can vary across conditions (for example, disease control could be measured by viral load for HIV, blood pressure for hypertension, symptom-free days for major depressive disorder, and years without recurrence for breast cancer), the drop-offs in a disease-specific cascade can illustrate system-wide deficits: low rates of screening suggest failures in primary care as a first contact service, whereas poor outcomes among those on treatment implicate inadequate coordinated and continuous care. We provide examples and discussion in appendix 2. Opportunity 2: Fewer, better metrics For effective measurement of accountability and action, health system assessments must be reoriented away from measures that are poorly fit for purpose and towards people. A people-centred measurement means thinking about individuals across the life course and the total sum of their health system experiences rather than discrete services. 196 Panel 11: Innovation in patient experience and outcome measurement* The examples in this panel describe proof-of-principle testing of patient-reported indicators in low-income and middle-income countries. Shared investment, innovation, and learning will be needed to validate and define the use at scale of patient-reported measures for action and accountability. Measuring maternal care experience: companion of choice The Quality of Care Network for maternal and newborn health is leading efforts to standardise measures of childbirth care experience. Labour companion of choice is one of the quality measures for emotional support and is recommended in four WHO guidelines to date.A86,A87 Evidence shows that women who received continuous labour support might be more likely to give birth vaginally, be satisfied with their birth experience, and be less likely to have caesarean birth or use pain medication.A88 Labour companions can also play a role in the prevention of mistreatment of the woman during childbirth by serving as an advocate, witness, and safeguard. A process indicator would be the proportion of women who wanted and had a companion supporting them during labour, childbirth, and immediate post-partum period in a health facility, based on observation or facility or population survey. Currently, nine countries in the network are in the process of including and testing different mechanisms for three common experience of care indicators (including labour companion) as part of large-scale quality improvement efforts for maternal and newborn health.A89 Measuring patient-reported outcome measures (PROMs) for pregnancy and childbirth in Nairobi, Kenya The objective was to understand the application of value-based health-care principles in a low-resource setting; specifically, to test a model for collecting PROMs in pregnancy and childbirth in a low-resource setting, to determine feasibility and scalability of using mobile platforms to measure PROMs, and to identify how to engage patients in collecting PROMs and motivate health-care providers to measure outcomes. Outcome variables to pilot were selected from the pregnancy and childbirth standard set of the International Consortium for Health Outcomes Measurement, on the basis of importance, feasibility, acceptability (cultural and social), and literacy. Patient-reported outcomes included health status (incontinence, pain with intercourse), breastfeeding (success with breastfeeding), mental health (ante-partum or postpartum depression), and satisfaction with care during pregnancy, labour, and after birth. Five facilities providing antenatal, delivery, and postnatal care services were involved and patient liaison officers were trained to support patient enrolment, maintain engagement, and oversee follow-up. Real-time collection of medical and financial data was done with M-TIBA, a mobile health wallet that tracks patients through the health system. PROM items were administered using text messages through mSurvey. 173 of 200 women enrolled, with survey completion rates near 90% through 6 weeks post delivery. See appendix 1 for full methods. Sources: Özge Tunçalp and Meghan Bohren; Ishtar Al-Shammari and David Ljungman. *Panel references can be found in appendix 1. Processes of care and quality impacts must be better measured to have health systems that are truly for people, with three areas for improved measurement: positive user experience, patient-reported outcomes, and non-health effects of care. OECD countries are moving towards standard crosscutting measures of patient experience, particularly communication and patient voice. 175 Wide adaptation and validation of these measures would enable global comparisons. Other areas of user focus and respect pose more challenges for measurement, such as dignity, privacy, and non-discrimination. Vertical programmes with long experience in measuring such domains, including family planning, maternal care, and HIV care, can offer insight. 9,197,198 Similar efforts to make patient-reported outcome measures more broadly useful are underway, including a focus of the OECD on population outcome measures such as quality of life. 175 The International Consortium for Health Outcomes Measurement released a standard set of outcomes (including patient-reported outcomes) for hypertension, with an explicit focus on LMICs. 199 Standard sets of patient-reported outcome measures for general adult and paediatric health are in development. The enhanced use of these measures will require clarity about minimum supporting data, such as risk factors. 200 Panel 11 highlights efforts to adapt and apply patient-reported measures in LMICs. Available measurements of confidence or trust in health systems fall short of the importance of this domain in shaping population behaviour and health outcomes. Satisfaction with health care or the health system is a commonly used measure and, from a legal and rights perspective, it reflects the ultimate judgment of the consumer. 201 Satisfaction is associated with objective measures of process quality (eg, clinician competence) and with health outcomes (eg, mortality). 92,202 However, satisfaction is also strongly influenced by a host of other factors, including user demographics and health, past care experiences, expectations, and potentially courtesy bias. 202 This might explain some of the counterintuitive findings on user satisfaction. For example, satisfaction is often high for demonstrably poor-quality services, particularly for users with lower education or less experience with high-quality health care (panel 3). Conversely, people might express dissatisfaction when they expect, but do not receive services that are not indicated, such as antibiotics for the common cold. Improved health literacy can reduce this mismatch. Although user satisfaction gives an important perspective, other measures should be considered that might capture people’s confidence more directly. These could include trust in the health system, confidence that people can get the care they need, endorsement of the system as is (vs requiring major reform), and metrics that reveal preference, such as bypassing and loss to follow-up. 37 The development and validation of measures for trust in the health system relevant for LMICs should be part of the global research agenda. The links between health system quality and economic gains were detailed previously. The effects of health system quality on economic gains are largely mediated by health status (eg, incidence of surgical site infection or antibiotic-resistant disease and ability to function for work or school) and confidence in the health system. Measurement should focus on health and confidence themselves, while research quantifying links between these outcomes and economic impact is undertaken. Direct pathways include affordability that shapes individual costs of care and low system competence generating wasteful, unnecessary procedures. Measurement of cost has advanced notably in the SDG era: catastrophic out-of-pocket spending on health-care costs is the indicator for SDG 3.8.2, financial protection within UHC, 126,128 and medical impoverishment provides an indication of how well financial protection for health services has been linked with poverty alleviation. 126,128 Indicators of health system waste, such as excess caesarean sections, might signal poor system quality, although few measures have been defined for this with adequate benchmarks for national assessment in LMICs to date. Measures of system competence are a key area for innovation, both in identification of essential indicators and in use of these to produce a coherent view of system function. Elements of system competence include safety, prevention and detection, continuity and integration, timely action, and population health management. Platforms within the health system—community outreach, primary care, hospital care, emergency medical services, and referral systems—can similarly be assessed for overall functionality. Work published in 2016 from the Institute for Healthcare Improvement 203 proposed system measures for consideration in high-income settings. These measures include childhood immunisations, timely ambulatory care, preventable hospitalisations, hospital-acquired conditions, and serious reportable events (serious harm or death of a patient due to a healthcare error). In lower-income countries, consistent and accurate measurement of hospital mortality for selected services would be an important advance. 204 One approach for system competence measurement is to consider conditions or procedures that require functional integration within a health-care platform and identify process or outcome measures therein as tracer indicators. For example, indicators such as blood transfusion delay, surgical site infection, and perioperative mortality rate provide insight into hospital care quality as a whole. 73,102,205 Although perioperative mortality rates are collectable in countries of all income levels, virtually no LMICs have outcome surveillance in place. A focus on bellwether procedures and definition of standard methods in collection and reporting of both perioperative mortality rates and surgical site infections would reduce heterogeneity in measurements and facilitate their uptake into existing health system measurement. 73,104 Similarly, timely trauma care is an indicator of prehospital care, such as emergency medical services and hospital functioning. Multiple studies have assessed time from injury to admission or admission to surgery, but measurement remains heterogeneous. 83 Efforts to improve measures of system competence should include their potential use for accountability and triggering action. Opportunity 3: Invest for country-led quality measurement The current fragmented approach to health system measurement results in substantial efforts and investments expended for little data use. 176,206,207 Progress on the measurement challenges and proposals described will require a shift to country-led quality measurement. 208 This Commission calls on global, regional, and national donors to invest in national institutions for health system quality measurement. Such national bodies should be tasked with assessing available measurement against national priorities for health system quality, refining the measurement toolkit to better address the full high-quality health system needs, creating an annual public dashboard of health system quality performance, and assisting with policy translation of the results. Building such an institution or arrangement requires enriching human capacity at all levels of the health system and concentrating advanced capacity in data science at the national level. Without improved numeracy at local levels and data management capacity at district or subnational levels, data quality will not be sufficient to support the activities of the national institution. Building more advanced measurement capacity—including more masters-level and doctoral-level researchers—within such a national institution will be necessary to address the current challenges of health system measurement and future ones, as population health and health systems evolve. Investing in a central institution with the authority to translate a national policy on health system quality into priorities for measurement and to both centralise data and disseminate findings is crucial to make measurements responsive, relevant, and efficient, particularly for countries with increasingly decentralised health systems. Having a single source of knowledge of quality deficits can also provide a clear basis for accountability of system failures and patient safety lapses. 209 A truly national view of health system quality requires measurement from the private sector. The exclusion of private providers restricts health system assessments, particularly in countries with substantial private sectors, such as India. For example, an analysis of population coverage of first-level hospitals in Karnataka state, India, found that 45% of the population had access to at least one public hospital within a 25 km catchment area, whereas 91% had access when private hospitals 210 were included in the analysis (two-step floating catchment area method; appendix 2). Nonetheless, information on the capacity and quality of private facilities—or even their number and location—is scarce. 204 Some health system assessments, such as the District Level Household Survey 4 in India, are restricted to public facilities, and routine health information systems can be compulsory for public providers only. A review done for this Commission identified multiple mechanisms for measuring private sector quality, including regulation, national information systems and surveys; purchaser-driven, consumer-driven, or network-driven measurement; and voluntary external assessments. Private-sector providers sometimes express a willingness to share data, but without strong mechanisms and incentives, little sharing occurs in practice. 211 Future research on models for integrating data across the public and private sectors to enhance efficiency, transparency, and accountability is warranted. The development of a national policy and strategy for health system quality is a prerequisite to country-led measurement and is discussed further in the next section. 212 Assessing measurement approaches against the standards defined in the national strategy will provide insight on gaps and inefficiencies in measuring quality. Another responsibility of a national institution for health system measurement is the development of the quality measurement toolkit. The toolkit can differ by context and resource availability, but should include three tiers: foundational systems, routine data, and targeted studies. The first tier consists of vital registries to track population births and deaths, supply chain management, and human resources information systems, including provider payment tracking. These elements are fundamental for a sound understanding of the population and the capacity of the health system. The second tier is routine data collection through electronic health records or health information systems; many measures for effective coverage and system competence can be derived from routine health information systems. Accuracy and parsimony are essential to these measurements, because of not only their importance, but also their high potential burden. The third tier consists of targeted health system studies, which include health facility and population surveys and patient registries to probe more deeply into health needs and system performance. Facility assessments must be more agile and responsive to national priorities, with increased emphasis on measures that might be hard to capture in a routine system, including timeliness and accuracy of care delivery and patient experience. Patient registries can be developed as a subset of facility assessments to provide information on health outcomes and patient perspectives over time for priority groups or conditions. 213 Population surveys, ideally linked to health facility assessments or routine health system data, can be broadened to address the range of conditions reflected in the SDG agenda and to provide the voice of users and non-users on their needs and outcomes. These surveys will continue to be instrumental in providing data for equity assessment, particularly in lower-income countries. When optimised, the combination of these data sources has powerful potential to advance the quality of health systems. The matrix of tools will differ by context, because one of the aims of the SDG era is for all countries to own their data systems and to define their data needs within a common framework. National ownership of tools at all tiers is important for the results to be integrated and used. 208 Regional and global partners can facilitate and catalyse this work by providing public goods of centralised evidence and tools. These can include repositories for available indicators, evidence and guidance on the role of measurement platforms and methods for triangulating across them (eg, in effective coverage estimation), and tools for synthesising insight for dissemination. Regional collaborations might prove beneficial for sharing learning and avoiding duplication of efforts, particularly for small countries. Initiatives such as the Quality of Care Network and the Health Data Collaborative are important steps in this direction. Finally, this Commission recommends that countries compile an open-access health system dashboard for monitoring progress towards a high-quality health system. The dashboard would track health system quality with use of data from multiple sources. The dashboard would be people-facing and should reflect what matters most to people: health and wellbeing, user experience, system competence, confidence, and economic benefit. An example dashboard is shown in figure 14, featuring these recommended areas and illustrative indicators of each domain to show how such information might be presented. Effective coverage indicators can signal areas of underperformance by geography, condition, or vulnerability, whereas care cascades for conditions that illustrate overall system functioning can be used to identify strengths and failure points. Indicators should be selected and adapted to each country as described previously. The dashboard should evolve to reflect changing health and health system priorities. Efforts are already under way to contribute elements to such a dashboard, from real-time views of staff absenteeism in facilities in India 214 to open data platforms in Kenya. 215 Providing information is not in and of itself sufficient; information must be accompanied by appropriate context for public consumption and clear mechanisms for engagement and response by all people, whether they are members of the public, the press, or health system actors, such as medical associations. 216 Figure 14 Sample high-quality health system dashboard with illustrative indicators Countries should present overall dashboard results and results disaggregated by subnational regions and dimensions of vulnerability (settings of care, disease type, or demographics, as discussed in Section 3), as well as results for public and private sectors. This Commission recommends that the dashboard be released from 2021 onwards. It could reflect gaps in data availability and quality particularly early in its usage, before measurement platforms are realigned to provide a full system perspective. Missing information should not prevent the public release of what is available as input into mechanisms of social accountability. Public release of health system quality information is an important way of building trust in health system transparency, in addition to providing means for self-scrutiny by health system agents. 217 A high-quality health system dashboard is an essential step in a cycle of accountability and a trigger towards universal action for improvement. Section 5: Improving health systems at scale The key findings of this section are shown in panel 12. Expanding the solution space Despite some impressive health gains in LMICs in the past several decades, this Commission’s analysis showed that health systems are beset by poor-quality care. The pervasiveness of poor quality suggests that the cause is not a few weak providers or clinics, but rather that whole health systems are underperforming. To successfully address the endemic nature of poor-quality care and to give providers the right support to deliver the competent and respectful care that people deserve, this Commission calls for an ambitious improvement agenda that moves beyond targeting the manifestations of poor quality and aims to transform health systems. However, strategies for quality improvement in LMICs have generally focused on a narrow set of solutions, such as increasing health system inputs and changing people’s behaviours and routines at the point of care—ie, the lowest (micro) level of the health system. A 2018 review of primary care quality found that, globally, 72% of strategies targeted the micro level (figure 15; appendix 1). Although interventions aimed directly at facilities and staff can be motivational and promote local commitment to quality, 218 people tend to revert to entrenched ways of doing things, especially when surrounding systems do not support transformation. 23 The application of multiple micro-level interventions might lead to deleterious effects, with interventions clashing at the point of care because implementing them consumes a large amount of attention from managers, potentially detracting from other priorities. 219,220 This raises the challenge of how to situate micro-level efforts as part of broader reforms that will improve health systems. Figure 15 Types of interventions and levels targeted to improve quality of primary health care according to published literature from 2008 to 2017 Panel 12: Section 5 key findings Addressing the quality deficits in many countries today will require expanding the solution space— the feasible set of solutions that satisfy the constraints of the problems—for improvement to include macro-level, meso-level, and micro-level interventions. Countries should invest in the foundations of high-quality health systems by considering four universal actions: governing for quality, redesigning service delivery to maximise quality; transforming the health workforce to provide high-quality, respectful care; and igniting people’s demand for high-quality care. Several commonly used approaches, such as accreditation and performance-based financing, have not been consistently effective in improving quality. District-led collaborative learning has the potential to foster improved quality through better system functioning and communication, but more research on most effective models is needed. Research on strategies to directly improve health worker and facility performance found that most micro-level solutions have modest effect sizes. Studies tend to be small and brief, limiting conclusions about sustainability and effects at scale. This Commission recommends that selected meso-level and micro-level interventions be implemented alongside efforts to improve the foundations of health systems. Development partners should support health system reforms that improve the foundations of high-quality care. Monitoring and evaluation of the impact of all improvement efforts at national and subnational level is needed to drive learning and improvement. A transformative quality improvement agenda is based on the recognition that health systems are complex adaptive systems, defined as systems in which many component parts interact in unexpected ways and often produce unanticipated results. 221,222 Complex adaptive systems are resistant to change, and diffuse and isolated interventions, especially at the micro-level, are unlikely to result in large-scale improvements. 221,223 An example of this is the proliferation of point-of-care technologies for health, few of which have been taken to scale or shown to have had an effect on health in LMICs. At the same time, evidence 23 from health and other sectors shows that complex adaptive systems can thrive if actors within the system have a shared vision, clear rules, and space to allow evolution and learning. Research 224 in behavioural economics noted that successful systems create a choice architecture that supports intended goals and reduces harmful variation. Choice architecture comprises the elements of a system that influence choices and behaviour, including information flow, incentives, presentation of choices, and decision-making contexts. 224 Nudging, or steering people in a particular direction while preserving their choice, is a common behavioural economics strategy, but the broader notion is to align motivations, incentives, oversight, and management across levels to promote the best actions. We propose a new improvement approach that addresses the scope of the quality challenge and recognises the complex adaptive nature of health systems. This approach emphasises macro-level reforms—what we call universal actions—that can not only establish and cascade systemic change across all levels of the health system, but also include a role for targeted meso-level and micro-level strategies. Macro-level strategies are best able to directly tackle the social, political, economic, and organisational structures that shape a health system. Meso-level (subnational) interventions address quality of care through the coordination and management of a network of facilities and communities. Interventions at this level are also well positioned to improve communication and learning between facilities and across levels of the health system. Micro-level interventions aim to directly influence the performance of the staff or the operations of a facility. Appendix 2 includes examples of interventions at the three levels of the health system. System-wide improvements in quality of care will require effort from providers, health system administrators, and communities, but they begin with a political commitment from heads of state and ministers. Global development partners can and should assist, but they should not drive this agenda. Contributions from across the health system, including the private sector, and from sectors outside of health will be crucial. Early gains in quality are likely to be visible within a few years, though meaningful improvement might take longer. People everywhere have a right to receive effective and respectful care—the time to get started is now. Universal actions for improving quality This Commission recommends four universal actions to improve health system quality: governing for quality, redesigning service delivery to optimise quality, transforming the health workforce, and igniting people’s demand for quality (figure 16). These actions are based on successes and failures from all countries, best practices from high-performing health systems, research and evaluation, and the experience and deliberation of the Commissioners. This Commission sees these universal actions as the start of a paradigm shift towards a more ambitious health system improvement agenda. Beyond the universal actions, countries can select additional targeted opportunities that fit their needs and context. All universal and targeted actions are predicated on having adequate health system inputs, such as staff, medication, and equipment. The optimal composition, design, and implementation of the improvement agenda will vary by country, because approaches that work in one setting might not work in another. Countries need to monitor the implementation of this agenda to permit adaptation and assess the effects on health and other valued outcomes. Figure 16 Universal actions for improving quality of care Universal action 1: Govern for quality Health-system-wide change demands that the improvement and maintenance of quality be woven into the fabric of a health system. Governing for quality means reframing the pursuit of quality health care from a peripheral activity to the mandate of a health system, and making sure that a commitment to quality is actually translated from paper to actions that improve the health of people. 225,226 Governing for quality includes several elements: adopting a national quality policy and strategy, improving capacity for management at all levels of the health system, strengthening regulation and accountability, and collecting and learning from health system data. Governing for quality requires high-level political commitment to a shared vision for improving quality of care and translating this commitment into action across the health system (panel 13). Well aligned policies and strategies should be based on this vision, locally accepted definitions of quality, and national goals for improved outcomes. 212 In response to requests from countries for guidance on how to design and implement these healthcare policies and strategies, WHO produced the National Quality Policy and Strategy Handbook. 212 The handbook outlines eight elements of the strategy and argues that quality must be elevated nationally and become a priority across sectors. These policies, and the strategy linked to them, should ideally outline the roles and responsibilities of the organisational bodies and actors that participate in sustaining and improving quality of care. A plan for coordinating these elements is also needed, so that quality improvement programmes are harmonised to maximise learning and results at the system level. 227 For example, an analysis 228 of surveys from 310 health system leaders in Mexico identified insufficient coordination of quality improvement agendas and an unclear system of roles and responsibilities as key barriers to the translation of federal policies into improved quality of care. The successful development of shared vision, policies, strategy, coordination, and implementation are needed to design a choice architecture for health systems that directs patients and providers towards decisions that produce quality care and good health outcomes. Improving the quality of the health system requires action from multiple sectors and stakeholders. Governing for quality includes managing these relationships and convening stakeholders under the shared vision of making large-scale sustainable improvements in quality and health outcomes. 229 Inclusive processes that bring a diversity of voices together to solve problems are complex and difficult to manage, but they help to make action on quality possible, they foster innovation, and they lead to more comprehensive solutions. 230 Building partnerships means aligning all stakeholders, including international donors, with national needs and priorities, which is a challenging goal. For example, in 2016, only 16% of development assistance for health went to the strengthening of health systems, despite evidence showing that condition-specific funding can compromise overall quality of health care and crowd out existing health services. 231–233 Adopting a national quality policy and strategy, and engaging stakeholders around it, requires not only strong leadership skills, but also good management at all levels of the health system to effectively use available resources to realise the vision of high-quality health care. 234 Middle management at the district or regional level could play an important role at the intersection of policy and implementation, although management capacity interventions at all levels have been linked to better health sector performance. 235,236 Although the literature consistently points towards the importance of good management across health system levels, insufficient attention has been paid to creating the capacity for health-care management in LMICs. 235,236 Data from multiple LMIC settings showed that management is a key factor that differentiates between high-performing and low-performing facilities. 237,238 Bradley and colleagues 235 outlined eight key management competencies and recommended designing training programmes for management professionals to achieve them. These key management competencies are: strategic thinking and problem solving, human resource management, financial management, operations management, performance management and accountability, governance and leadership, political analysis and dialogue, and community and user assessment and engagement. Examples of effective training programmes 239,240 exist in various settings, including Ethiopia, 239 where hospital performance improved under the management of graduates of the Masters in hospital and health care administration programme. Panel 13: Governing for quality: lessons from Nepal and Argentina* Absence of multistakeholder commitment leads to minimal quality improvement in Nepal In 2007, Nepal endorsed the Policy on Quality Assurance in Health-care Services, with the objective of ensuring “quality of services provided by governmental, non-governmental, and private sector according to set standards” and to establish an “autonomous body to ensure impartial decision regarding health services.” 11 years later, the success of this policy remains mixed. Why did the policy have low impact? The policy was created without a shared vision and buy-in from stakeholders, including the Ministry of Health. Important partners, such as the Ministry of Education, did not provide critical inputs. The policy designers also did not create consensus on a definition of quality or agree on indicators against which to measure progress. A centrepiece of the policy—to establish an autonomous body for quality of care— never materialised. A quality assurance section was established in the Department of Health Services, but it has little leverage over other units in the Ministry. The absence of a political commitment and involvement of all stakeholders has meant that the objectives of the Policy on Quality Assurance in Health-care Services have been largely unrealised, and health institutions continue to deliver subpar care quality.A90 Governing for quality through strong accountability in Argentina In 2005, Argentina implemented a public supplementary insurance program, SUMAR, designed to increase access to quality health care for uninsured children and pregnant women and to address large disparities in infant and maternal mortality rates.A91,A92 The programme is credited with decreasing the probability of low birthweight among beneficiaries by 19%.A91 In the setting of Argentina’s national decentralised health system, SUMAR’s success was dependent on high-level political commitments, buy-in from provincial governments, and well designed reporting pathways to ensure accountability. A presidential decree established the programme and provincial governments confirmed it under a collaborative agreement with their respective providers. The agreement is renewed yearly with review of procedures for expenditures and goals to be achieved. Federal commitments and provincial implementations were aligned through clear standards, and multidisciplinary oversight bodies monitored performance. A provincial level programmatic office regularly reported to the federal level. Local accountability was increased through the centralised monitoring of transferred funds to the provinces. The provinces were then responsible for enrolling beneficiaries, organising the provision of services, and paying providers. Source: Amit Aryal and Franziska Fuerst. Source: Programme SUMAR, Argentina. *Panel references can be found in appendix 1. To improve and guarantee quality care, good leadership and management competences must be buttressed by regulatory structures that create accountability. Strong regulatory mechanisms, ie, so-called regulation with teeth—and transparency through good monitoring, measurement, and reporting practices—support accountability both internally within the health sector and externally with civil society and citizens. 225,241 The accountability mechanisms, in turn, should be operated by leadership and management that can pull together a complex array of regulatory domains (eg, workforce, facilities, products, and service delivery) that might be administered by multiple institutions. Lessons from the regulation of medicines suggest that multipronged collaborative approaches that include a suite of regulations, mechanisms for legal redress, and training of inspectors in the public and private sector are most likely to be effective in mixed health systems. 242 These accountability mechanisms should also include monitoring of the flow of providers between private and public practice. 243 Two first steps that are yet to be taken in many LMICs are gathering accurate descriptive data about private health care (see Section 4) and maintaining the capacity for ongoing monitoring. Local regulations that apply to private health care vary considerably and need to be explored in detail. Finally, regulatory bodies that can enforce compliance across public and private sector institutions are often severely under-resourced, do not have basic capacity, and will need to be strengthened. 244 Governing for quality also means recognising the importance of, and making space for, civil society in regulating the quality of care. Professional organisations that regulate their members have an important role to play in health system quality by promoting high-quality performance of their members and by sanctioning them when they fail to meet minimum standards. Self-regulation is underused in LMICs, where professional organisations mainly advocate for their membership. Experience in high-income health systems has shown that the privilege and responsibility of self-regulation promotes professionalism, the sense of accountability among professionals to people, and reduces transaction costs for governments. For example, in Canada, 245 physicians successfully self-govern all aspects of the profession, from setting nationally uniform entrance exams to monitoring and remediating substandard clinical practice among practising physicians. However, self-regulation is not without its challenges, as exemplified by the UK, 246 which has moved towards joint government–professional oversight because of a series of widely publicised physician scandals. When professional groups have primary fiduciary responsibility, care should be taken to involve both practising clinicians and citizens in governance and to avoid unnecessary fragmentation of regulatory responsibilities. 247 Professional organisations can also promote quality through continuing medical education and engaging directly with governments to address quality concerns. For example, the Philippine Medical Association has more than a century of experience in agitating for improvements in medical education, health facility infrastructure, and the regulation of pharmaceuticals. 248 Social participation in health care, especially for the most marginalised, has intrinsic value as a human right and instrumental value in improving health care and keeping systems accountable. 249 People and communities are experts in their local experience and, with skilled support, can wield this knowledge to help create highly valued solutions to health-care problems. 250,251 Social participation can also increase the uptake and sustainability of services. 252,253 Although the composition of civil society varies by country, it is their diversity of perspectives, the opportunities for participation and action, and the availability of accurate and understandable information that will make this sector effective in holding governments accountable for high-quality health care. 243,249,252,253 Civil society can be particularly powerful when adopting a human rights framework for advocacy. 253 For example, in Uganda, 254 the Center for Health, Human Rights, and Development regularly uses legal avenues to challenge policy makers on issues such as essential medicines, safe and respectful maternity care, and fair treatment of patients with disabilities. Institutional accreditation uses external evaluators to assess facility performance against health-care standards. Although frequently cited as a quality accountability mechanism, a scoping review of reviews done by this Commission found that the direct effect of institutional accreditation on quality of care is uncertain (appendix 1). In a systematic review of improvement strategies, median effect sizes for institutional accreditation were modest: 7·1 percentage point improvements in quality outcomes were reported (appendix 1). 255 However, accreditation can indirectly affect quality through improved management, professional development, and capacity of facilities to promote change. 256 Improvement entails the continuous production of relevant data, which measures performance and outcomes, and the translation of those data into action—a learning system. 226,257 This learning system facilitates the development of programmes and reforms based on the best available evidence (whether global, regional, or local data) and best practices. New initiatives should embed measurement, evaluation, and plans for how the results could be disseminated effectively to the people responsible for ongoing data use to inform adaptation of services. Learning systems should also identify best performers, as discussed in Section 2, and determine the basis for their success. This set of intentional processes for actively learning and improving the health system is a goal that should be articulated and demonstrated first by the actions of senior leadership and subsequently echoed by middle management and the front-line staff. This system goal should become the primary guiding principle that creates the motivation for system improvement over time and for which health system actors hold themselves accountable. 258 Planners should design better systems on the basis of lessons learned and then link back to system managers, supervisors, and front-line staff to support improvement. Developing well functioning learning systems is especially important because of the imperfect evidence base for quality improvement interventions and the large variation in effect sizes found between studies and contexts. Learning systems ensure that planners can make course corrections based on context-specific data. A meso-level strategy that illustrates this approach is the quality improvement collaborative, which we describe in the following subsection. An analysis done by this Commission regarding five country experiences on governing for quality revealed practical lessons for operationalising the described principles (methods are described in appendix 1). District and facility-level health workers might be unaware of national quality policies and strategies or might not understand the implications of those on their daily work. The dissemination and translation of policies and strategies needs to be formally assigned, built into the job descriptions of public sector administrators, and included in performance reviews of these individuals. Additionally, the workforce might experience distracting and overwhelming policy crowding, with poorly coordinated and sometimes conflicting mandates. Countries are encouraged to review all policies affecting front-line workers; overlapping or conflicting policies can then be pruned, leaving a policy set that is coherent from the perspective of the service provider. For example, a nurse in primary care seeing a patient with diabetes and latent tuberculosis would benefit from having a single quality policy, not separate documents on diabetes and tuberculosis. Informants from all levels of the health system discussed the challenges of good system-wide data use in the Commission analysis. Data generation and translation must start at the local level, but for system-wide improvements to occur, these data need to be coordinated centrally. We suggest the creation of planned spaces for information exchange, such as district-led meetings to learn from the evidence generated. Success stories of improvements made possible by accurate data collection and skilled data translation can be shared with front-line health workers to motivate continued quality care and improvement. Universal action 2: Redesign service delivery to optimise quality Most LMIC health systems were originally designed to provide basic episodic care, especially for infectious diseases. Many systems have not adapted to the changing landscape and challenges of caring for people with chronic diseases, mental health conditions, and more complex injuries and illnesses. 20 Hospitals and healthcare facilities with advanced diagnostic and treatment capabilities are overcrowded with stable patients who could be treated in primary care facilities, whereas many first-level health clinics are expected to handle cases that are beyond their scope, with slow or non-functioning referral for emergencies. 259,260 Poorly organised health systems lose lives, waste scarce resources, and squander the good will of populations. To address this, this Commission calls for a quality-focused service delivery redesign: a reorganisation of services within the health system to efficiently maximise health outcomes and user confidence, rather than only geographic access to clinics. Service delivery redesign capitalises on existing health system assets to provide services at the appropriate level and achieve the highest quality of care possible. First, some services should be shifted to primary care. Reflecting the core principles of continuity, coordination, comprehensiveness, and first contact, competent primary care is ideal for treatment of chronic and stable conditions that require sustained engagement with the health system (eg, non-communicable diseases and stable HIV or tuberculosis infection), preventive care (eg, immunisation, antenatal or routine child care, and growth monitoring), and low acuity and algorithmic services (eg, care of minor child and adult illnesses and injuries). 20,36 Palliative care can also be expertly delivered close to home by primary care and in partnership with families, community caregivers, and spiritual supporters. 36 Examples of the partial implementation of quality-focused service delivery in LMICs reveal the benefits of shifting these services to primary levels. In HIV care, stable patients are managed in primary care clinics with impressive results, and new patients can initiate treatment in their own communities. 261 As a result, centralised specialty centres are less crowded, allowing higher-skilled providers to focus on more complicated cases, such as HIV treatment failures. 260 A multicountry meta-analysis 262 of 39 090 patients with HIV showed that patients in primary care were half as likely to be lost to follow-up than patients treated at a centralised HIV clinic. In tuberculosis care, community-based models are also substantially less costly to implement. 263 Uncomplicated non-communicable diseases are especially well suited for care at the primary level, where providers can more effectively monitor chronic disease over time and build relationships that form the foundation for effective communication and counselling regarding crucial lifestyle modifications. 20 An important caveat is that current primary care models in many LMICs are outdated and ill-suited for these new tasks. New thinking is needed on primary care functions, capacities, and connections with specialised services, especially in urban settings. 20,264 For example, experience from high-income settings suggests that non-visit care, in the form of virtual or phone visits, has the potential to extend the reach of primary care for low-acuity conditions. 265 Acute or chronic conditions with higher risk of mortality or severe morbidity are best assessed at a hospital with emergency capacity. The correct health system level for some surgeries should be determined on the basis of availability of specific technical skills, laboratory, imaging, and intensive care infrastructure, acuity of the condition and projected procedure volume. Complex or rare conditions are ideally managed in tertiary, highly specialised, care centres. Childbirth is one situation that benefits from care at hospitals with surgical and specialised newborn care services, because complications can arise without warning and require rapid, highly skilled care. 266 However, in low-income countries, a substantial proportion of obstetric and newborn care is provided in primary care facilities without adequate expertise or surgical capacity. 267 For women and newborn babies who develop complications in primary care clinics, poorly functioning referral and transport to a higher level facility mean a much greater risk of morbidity and mortality. 267,268 Guided by this logic, many high-income and middle-income countries mandate that all women deliver in, or next to, hospitals with surgical and advanced newborn care services. 269 The structural deficits in highly skilled health workers and surgery at primary care levels might explain why the Better Birth trial, 39,270 a large randomised controlled study, found that implementing a safe childbirth checklist and coaching for nurses and midwives at primary care centres in India did not reduce maternal and newborn morbidity or mortality. We examined the practical implications of shifting delivery care to hospitals in a geographic modelling that linked facilities with pregnant women in six LMICs (Malawi, Haiti, Tanzania, Kenya, Namibia, and Nepal; methods are described in appendix 1). We found that delivery care redesign would result in substantial gains in technical quality for care of pregnant women without reducing interpersonal quality and with minimal reductions in 2 h access to care. For example, in Tanzania, hospitals score twice as high as primary care facilities on a basic measure of childbirth quality and, therefore, quality of care would improve by moving all deliveries to hospitals. Although this would increase the average distance from a delivery facility for rural dwellers, only 27% of pregnant women would live more than 2 h away from a delivery facility in Tanzania, compared with a current 17%. In the remaining countries, 1% to 7% of women lost 2 h access to care. This redesign can also produce efficiency gains because resources could be redirected from providing obstetric care in thousands of facilities to improving quality in fewer hospitals, promoting care integration across facilities, working with communities, and enabling transport to hospitals. Strong interfacility communication and referral networks are crucial to the success of quality-focused redesign, along with investments and participation from non-health-care sectors. Tools to facilitate redesign that warrant consideration include improved transportation (eg, community taxi services and ambulances), 271 communication (district-led learning, discussed in the following subsection), measures to reduce access barriers to high-quality facilities (eg, vouchers and maternity waiting homes), 272,273 and public education to enhance population understanding of the right place for care. 274 Local context, with a focus on facilitating access to high-quality care for the most marginalised subpopulations, should drive the mix of interventions and incentives. Planning for quality-focused service delivery redesign in any country would require analyses of patient volumes, bed and surgical capacity, provider competence in existing hospital facilities, and potential upgrades to existing health-care centres to permit high-quality care, as well as attention to transport, costs, and building community demand. 275 Universal action 3: Transform the health workforce The data in Section 2 showed that providers often do less than half of recommended evidence-based care measures and that rates of diagnostic accuracy are low across health conditions and countries. A Commission analysis showed that this is also true of providers in their first 3 years of practice, suggesting a probable role of poor preservice education in provider performance (appendix 1). 276 Low knowledge and competence of the health workforce is at risk of worsening over the coming years because of the rapid expansion of health workforce training institutions, resulting in dilution of already insufficient faculty and curricular resources. 277,278 Despite this threat to health-care quality in LMICs, improving the education of health-care professionals has not been a central part of the improvement discourse. 279 In the previously mentioned review of primary care quality improvement, only 16 of 379 articles addressed the preservice education of health professionals (figure 15). Fixing these gaps through in-service training is not an effective antidote, 280 and reforms in professional education are required to adequately equip these professionals to provide high-quality care. The Lancet commissions 277,281 on health professionals for a new century and on the future of health in sub-Saharan Africa highlighted key steps to address the quality gap of the health-care workforce. First, the education of health professionals should focus on achieving competence through active learning, early clinical exposure, and problem-based learning. Competency should be defined by the gaps and needs of each individual country and include domains beyond the technical skills of providers. Ethical, respectful, and compassionate care, and the fundamentals of systems thinking and quality improvement should be additional core competencies. Dysfunctional systems will continue unless the workforce is prepared to improve them. Second, the chronic understaffing of many health-care professional schools in LMICs must be addressed, along with support of high-quality teaching, for the quality of clinical education to improve. 278 Possible solutions include increasing salaries, expanding professional development opportunities, using state policy levers to require practising clinicians to teach trainees, and providing small incentives, such as free housing or telecommunications. 278 Finally, health education institutions should establish student recruitment and retention policies to increase the representativeness of the student population. 252,282,283 Evidence has shown that care interactions between providers and patients who are racially, culturally, ethnically, or linguistically similar are associated with higher perceived quality of care, satisfaction, and improved medical communication. 284,285 These changes within institutions of higher learning must be supported by good governance and quality-informed policy making. Intersectoral coordination between ministries of health and education would create a more direct link between the production of a health workforce and the needs of the health system. 281 Third, health-care providers also need a work environment in which they can succeed beyond graduation. Many health-care providers face challenging conditions, including inadequate and delayed salaries, heavy workloads, ambiguous responsibilities, no opportunities for growth, and poor treatment by colleagues and patients. 276,286,287 Not only do these conditions result in burnout, mental distress, and poor retention for providers, but they also result in poorer quality care. 287–289 Motivated providers are less likely to make poor decisions or medical errors and are more likely to be empathic towards patients. 290 Good working conditions, regular pay, clinical support, and opportunities to learn and grow are essential to maintain a workforce that is motivated and committed to providing high-quality care. 286,291,292 WHO recommended a set 293 of decent employment policies to support providers, including ensuring occupational health and safety, fair terms for workers, merit-based career development, and a positive practice environment. In addition to broader policies, a review 294 published in 2017 recommended a set of steps for facilities to foster joy and engagement in their own workforce. These include an initial process of inquiry to understand workforce priorities, followed by identifying and removing the primary annoyances, initiating simple fixes, and using improvement science methods to spur larger-scale change to create a fundamentally more satisfying and happier work environment. Although early reports suggest that sense of purpose can be strengthened through these approaches, much of this work has started in the past few years and the effectiveness of these interventions on improving quality of care in LMICs remains to be determined. Universal action 4: Ignite population demand for high-quality care High-quality health systems respond to people’s expectations, but if those expectations have been dampened by a history of disempowerment and poor-quality care, that response will not translate into better health care. 295 Section 2 shows that when expectations are low, quality ratings of objectively poor care are high. This discrepancy lets health systems disregard issues of quality. Beyond putting pressure on systems to improve, generating demand for quality through information sharing would increase health system accountability (see universal action 1) and has an ethical foundation: for patients to be autonomous decision makers, they must have access to usable information about the quality of their care. 296 This is imperative because of the information and power asymmetry that exists between patients and providers. Finally, this Commission’s recommendation is based on evidence that people who already demand higher quality in LMICs and actively make decisions can extract higher quality care from their health systems. 118,119,297,298 National quality improvement strategists are encouraged to explore demand-side approaches that raise people’s expectations of quality. Very few improvement programmes are explicitly designed to raise demand for quality care. We used those few programmes to draw lessons on this understudied improvement opportunity. Participatory women’s groups are a well documented 299 example, and improved outcomes for women and children in communities with these groups are believed to be partly due to participants demanding better care, such as safe hygienic practices during childbirth. Community monitoring programmes can generate demand for quality, although few high-quality studies exploring this outcome exist (see Section 3). 300 A programme 301,302 in rural Uganda, for example, combined information sharing about quality care at local facilities with community participation and found reductions in neonatal deaths and improvements in measures of facility process quality 4 years after implementation. A study 303 in Uttar Pradesh, India, showed that quality during prenatal visits was improved by sharing information about health and social service entitlements with pregnant women. A preliminary body of qualitative research 304 also suggested that demand generation for quality might be especially well suited to improving user experience. Panel 14 includes examples of the use of advocacy to generate demand for high-quality care from the White Ribbon Alliance. These interventions are based on sharing information with people and treating them as active agents in the health system. They are unlikely to work without system-level support that encourages patient-centredness, power-sharing, communication, and inclusion. 300 Importantly, this supporting of people to be active agents should be done with careful attention to marginalised populations. The intersection of multiple sources of vulnerability is likely to make some groups less able and prepared to act on quality information than others. To prevent the exacerbation of existing disparities, particular attention must be paid to rural, less educated, and impoverished populations (see Section 3). Interventions that might raise expectations and demand for quality often include social interaction through groups, committees, or meetings; this component is supported by social network science and evidence showing that people learn about quality from each other. 305,306 This insight from social network science also suggests that demand generation interventions might take advantage of the increasing presence of interactive social media platforms in LMICs. Figure 14 gives an example of a people-facing dashboard that can be used to share information with populations. More country examples of improvement through the four universal actions can be found in appendix 2. Targeted opportunities In conjunction with system-wide reform through the universal actions just described, countries will likely require additional context-specific interventions, which we call targeted opportunities. Beyond increasing health system inputs, this subsection reviews the most commonly used quality improvement interventions, but does not aim to present a comprehensive list. As mentioned in Section 2, the cost of interventions is not addressed in this report. We used several sources, including the Health Care Provider Performance Review (HCPPR)—a systematic review of health worker performance improvement strategies in LMICs (appendix 1). 255 HCPPR was designed to develop evidence-based guidance on strategies to improve health worker performance and includes published and unpublished studies in any language from the 1960s to early 2016. Although the HCPPR is the most comprehensive and up-to-date review on the subject, it has limitations common to all reviews, such as implementation strength that varies across studies, and the unknown degree to which results from controlled study settings can be generalised to real-world programmes. The full review reported effect sizes for combinations of strategies, which are not discussed in detail here. Macro level improvement: financing for quality Health financing and provider payment can be used to leverage greater quality from the health system. Of the four core financing functions (revenue mobilisation, pooling, purchasing, and benefit design), purchasing—or the allocation of funds to providers—has the greatest direct influence on quality of care and we focus on it here. 252 Strategic purchasing refers to funding providers on the basis of information about populations and providers to achieve performance goals. Examples include provider payment strategies and selective contracting of facilities on the basis of quality. 307 Although most doctors and nurses are assumed to be motivated by altruism, they also seek a competitive wage. Input-based (eg, salary and capitation) and output-based (eg, fees-for-service, per case, or pay-for-performance) payments tend to exert opposite effects on providers’ intensity of effort, with input-based payments disincentivising and output-based payments promoting the number and intensity of services, leading to the duelling challenges of under-treatment and overtreatment described in Section 2. A mix of input and output financing might therefore be the best strategy to prevent undue attention to incentivised elements. Panel 14: Lessons in generating community demand for quality care from the White Ribbon Alliance (WRA) Uganda In 2011, WRA Uganda mobilised local advocacy teams to bring attention to the poor quality of obstetric services in the country in three underserved districts. The teams, comprising district leaders, health officers, community members, and midwives, assessed the status of facilities and found that none of the districts met the minimum requirements for treating complications: they had insufficient lifesaving commodities, skilled health workers, and infrastructure. On the basis of similar findings, WRA launched the Act Now to Save Mothers campaign to educate citizens on their rights and responsibilities related to quality health care. Community members participated in district planning and budget hearings and town halls. In one town hall, more than 2000 community members signed and presented a petition to district representatives and parliament demanding improvement. Community members served as citizen journalists, reporting on progress and budget allocations. The campaign resulted in increased procurement of essential medicines and equipment, increases in salary for—and recruitment of—additional health workers, and the reconstruction of dilapidated facilities. Tanzania After a woman died in childbirth in 2013, in Rukwa region, Tanzania, because of no available blood supply, communities demonstrated in protest of the poor-quality care. WRA Tanzania brought together citizens and decision makers to ensure that these concerns were heard. They worked with religious leaders and village health teams to raise awareness among community members, and they supported district and regional policy makers to respond and act on citizen demands. The Parliamentarian Group for Safe Motherhood was mobilised to add their support to the citizens’ voices. In 2015, Rukwa leaders expanded emergency maternal health services from only 10% to 50% of health centres. On the basis of this success, WRA Tanzania expanded their efforts nationally, resulting in the government approving an historic 50% increase in funding for maternal, newborn, and child health to support expansion of facilities, blood banks, and the recruitment of health workers throughout the country. Lessons Mobilise around existing political commitments for improvement Educate citizens about rights, responsibilities, and how to advocate Use data to create pressure for accountability Identify champions to amplify the voices of people Support decision makers to respond to citizen demand and collaborate with them to make change Source: Kristy Kade, Betsy McCallon, Rose Mlay, Robina Biteyi. Aligning financing and provision arrangements is crucial to the success of strategic purchasing. For example, facilities subject to selective contracting should be able to make the necessary purchasing and hiring decisions for improvement. In some countries, facilities might not have sufficient managing authority and legal changes will be required. Output-based financing is also data intensive and can be a substantial burden for providers. To align different payment methods and incentives, a strong data system to capture information on provider payments is crucial. 308 Such systems produce information with multiple uses beyond strategic purchasing. For example, in high-income countries, insurance claims data offer information on services rendered, fees received, and diagnoses made that can be used by payers, insurers, and researchers. One prominent form of strategic purchasing that has been widely implemented in LMICs is performance-based financing. Performance-based financing describes a set of approaches designed to improve health care by paying providers and facilities for the quantity and quality of care, though many programmes complement the financial incentives with direct improvement elements, such as training or supervision. 184,309–311 Most performance-based financing programmes in LMICs incentivise primarily the quantity of services and, although they appear to increase utilisation of care and service volume, the effect of performance-based financing on quality is less clear. 184,310 Several impact evaluations are forthcoming from the World Bank’s Health Results Innovation Trust Fund, a large funder of performance-based financing. 312,313 Overall, evidence to date suggests that performance-based financing has insufficient potential as a standalone strategy for system-wide quality improvement. Performance-based financing might modestly improve quality compared with no intervention in some contexts, but does not always outperform unconditional financing; the effect appears to be driven not by the financing mechanism as much as the equipment and workforce interventions. 314–317 Compared with alternative interventions, performance-based financing incurs unique costs for performance verification that can account for 10% to 15% of operating costs, including the cost of staff time. 318 There can also be unintended consequences of performance-based financing programmes, including reports that providers have threatened patients to report positive outcomes, but these programmes can also improve unrewarded dimensions of quality, including patient satisfaction. 314,316,319 Overall, performance-based financing does not appear to be highly cost-effective, especially vis-à-vis unconditional additional financing. 316,317 The HCPPR showed that interventions that include financial incentives have a median increase on quality of 7·6 percentage points. Panel 15: Case studies of learning at the district level* Midwifery Coordination Alliance Teams (MCAT) in Cambodia The MCAT programme is an area-based approach that started in select provinces in Cambodia, in 2009, and has since spread to all midwives, health centres and hospitals in all 98 districts of the country. The programme aims to strengthen collaboration between primary health care and hospital providers. All primary care midwives in a meso-level area meet doctors and midwives from the local hospital every 3 months for data review, updates, problem solving, and refresher trainings. The sessions are duplicated over 2 consecutive days to accommodate all health centre midwives without closing any services. Providers view and discuss data, such as maternal and perinatal deaths or near-misses, contraceptive uptake and mix, case-fatality of common conditions, and supervision results. The meetings include participatory learning sessions with simulations and discussions of current clinical procedures and guidelines. Supply chain issues, for example, surface and are solved with feedback and joint problem solving. The teams also foster local innovations. For example, midwives in one province instituted a clinical hotline, which enabled health centre midwives to call a hospital midwife for advice on emergency referrals and follow-up. This idea has spread to other districts. Another MCAT team suggested and instituted an extension of the livebirth incentive to also include appropriate emergency referrals. Although a formal evaluation has not been done, the MCAT programme provides a team approach to gradually improving care of maternal and newborn complications in the district, and it is believed to be a factor in Cambodia’s large health gains for women and children.A93,A94 Area-based planning and vertical integration in meso-America The Salud Mesoamerica Initiative (SMI) aims to reduce maternal and child health inequities through a results-based funding model to improve quality and effective coverage in seven Central American countries and in Chiapas state, Mexico. The programme features area-based plans within each health district to translate national plans to local teams. These plans include locally-tailored targets, activities, and timelines. Local implementers review their progress with national stakeholders every 3 months, fostering an experience-based learning environment.A95 SMI provides technical assistance to countries to create quality improvement strategies and standards through problem identification, prioritisation of areas for improvement, and development of improvement plans. Countries developed tools for data collection and analysis to support learning. Each country has adapted implementation to fit their priorities and systems. In Belize, the process started at the comprehensive emergency obstetric and newborn care level, then gradually added basic and ambulatory levels of care to allow teams to have a more holistic view of the health network. Teams collect data and review their own progress each month, and every quarter, a quality improvement officer reviews the performance to allocate a small incentive to teams through a Quality Innovation Fund. The quality improvement officer also offers supportive supervision, shares challenges and best practices, and helps teams to develop and test new ideas. Independent evaluation results showed that all indicators across levels of care have improved relative to baseline, with gains ranging from 30 to 85 percentage points. Source: Som Hun and Jerker Liljestrand. Source: Emma Margarita Iriarte and Jennifer Nelson. *Panel references can be found in appendix 1. For financial incentives, including performance-based financing programmes, to be successful, economic theory and research suggest that rewards will have smaller effects than penalties because of the tendency to avoid losses, and incentives are most effective when closely linked to processes under the direct control of providers. 320,321 Extrinsic incentives might crowd out intrinsic motivation, underlining the importance of aligning incentives with professional expectations and work norms. 320 Finally, aligning provider incentives with specific goals of care coordination or effective treatment, as opposed to inputs, offers potential for quality improvement. 322 Meso-level interventions: district-led learning District administrations and networks of facilities can be harnessed into learning systems that accelerate improvements in health-care performance with the potential for scale. This level of the health system is well positioned to facilitate systematic group learning among facilities of similar types and across tiers of the health system. District-led, area-based learning and planning brings together providers and administrators responsible for a catchment area to solve clinical and system problems, harmonise approaches, maximise often scarce resources, and create better communication and referral between facilities (panel 15). 323 Formal quality improvement collaboratives involve the use of teams from multiple health-care sites that work to improve performance on a specific topic by collecting and using data to test ideas with so-called plan-do-study-act cycles supported by coaching and learning sessions. Systematic reviews 324,325 of quality improvement collaboratives in predominantly high-income countries showed modest improvements, particularly when addressing a clear gap between evidence and practice on straightforward aspects of care. Evidence from LMICs is more scarce, leading this Commission to undertake a subanalysis of quality improvement collaboratives based on the HCPPR systematic review. Overall, the quality of evidence on quality improvement collaboratives from LMICs is low. Effect sizes for these collaboratives combined with clinical training were very large (mean range 52·4 to 111·7 percentage points) although how generalisable they are is uncertain, as three of the four reports targeted the same clinical outcome (postpartum haemorrhage), which was amenable to simple changes. The effectiveness of quality improvement collaboratives was more variable when implemented without training and when addressing other areas of care. Results on improving health worker practices ranged from modestly to highly effective (4·3 percentage points for continuous outcomes and 30·2 percentage points for percentage outcomes). For patient health outcomes, quality improvement collaboratives had no effect (1·4 percentage points for continuous outcomes and 0·3 percentage points for percentage outcomes). 255 Quality improvement collaboratives are not static structures, and they have been implemented and adapted in several ways to achieve their stated aims. Some common adaptations include their use for the generation of new ideas and for empowerment of health-care workers. In addition to understanding the effect of district-led learning on clinical practice and patient outcomes, the effects of this approach on communication, health worker motivation, and team dynamics are currently being explored. 326 Micro-level interventions: directly improving provider and facility performance Strategies that target the micro-level are presented here as opportunities to complement and extend broader systems-level reforms. For example, the improved education of health professionals can be reinforced through facility-level refresher training. This Commission recommends that, where possible, micro-level interventions should not be implemented in isolation or instead of strategies that assess and improve the foundations of health systems. We present in table 3 results of approaches to improve health worker performance, focusing on the six strategies tested by the largest number of studies and that included at least four low or moderate risk-of-bias studies. 255 All six strategies target the health system at the micro-level. Moderate effect sizes were found for training (9·7 percentage points) and supervision (11·2 percentage points). The combination of training and supervision had larger improvement effects, at 17·8 percentage points. Providing only printed information or job aids to health workers and only implementing mHealth (a mobile wireless technology) strategies tended to be largely ineffective. Table 3 Selected results of strategies to improve health worker performance from the Health Care Provider Performance Review 255 Training only Training plus supervision * Supervision * only Printed information or job aid for health workers only Information communication technology (mHealth) only Training plus supervision plus strengthening infrastructure * Median effect size for percentage outcomes, percentage points (IQR)   9·7 (5·5–21·3) 17·8 (5·5–25·9) 11·2 (5·8–25·6) 1·5 (–4·5 to 6·1) 1·0 (–2·8 to 10·3) 9·4 (–0·1 to 40·5) Study comparisons for percentage outcomes (comparisons with low or moderate risk of bias) 76 (32) 26 (11) 16 (8)   8 (5)   4 (4)   4 (1) Median effect size for continuous outcomes, percentage points (IQR) 17·5 (0·1–23·7) 11·1 (7·3–60·4) –3·0 (no IQR) –3·4 (no IQR) –38·9 (no IQR) 64·3 (31·9–88·7) Study comparisons for continuous outcomes (comparisons with low or moderate risk of bias) 16 (8)   8 (3)   3 (1)   3 (1)   1 (1)   4 (4) Countries in which studies were done for both outcomes types (number of WHO regions)   2 (6) 17 (5) 13 (5)   7 (3)   4 (1)   6 (3) Three most common categories of study outcomes for both continuous and percentage measures Treatment, counselling, assessment Treatment, counselling, assessment Treatment, counselling, universal precautions Treatment, documentation, case management† Counselling, case management,† treatment Treatment, diagnosis, referral Median baseline outcome value for percentage outcomes, % (IQR) 43·0 (19·0–70·0) 25·2 (9·2–52·5) 53·8 (36·0–63·5) 32·8 (24·8–58·6) 36·5 (5·4–60·6) 31·2 (7·7–55·6) Median number of health facilities in intervention group for percentage outcomes (IQR)   6 (1–20)   7 (2–32)   7 (5–24)   8 (5–10) 38 (35–47) 11 (2–21) Median duration of study follow-up‡ for percentage outcomes, in months (IQR) 4·0 (1·3–6·0) 4·5 (2·0–6·0) 5·0 (2·0–9·0) 1·9 (1·0–4·5) 7·3 (4·1–9·4) 1·6 (1·0–2·4) Strategies for health facility-based health workers were tested by at least four studies with low or moderate risk of bias, from at least one outcome group (percentage outcomes or continuous outcomes). Percentage outcomes are expressed as a percentage (eg, percentage of patients treated correctly) and continuous outcomes are outcomes that could not be expressed as a percentage (eg, average number of medicines prescribed per patient); for details, see appendix 1. * Supervision is either strengthened routine supervision visits (in terms of frequency or supervision quality) or other supervision-like strategies, such as audit with feedback; strengthening infrastructure is the provision of medicines or equipment, or otherwise improvement of conditions in health facilities. † The case management group of outcomes reflect multiple steps of the case-management process (eg, percentage of patients correctly diagnosed and treated). ‡ Study follow-up time was defined as the time from when the strategy was initially implemented to the last eligible follow-up measure; for most study comparisons, the follow-up time was the same for all study outcomes; when follow-up time varied among outcomes for a given study comparison, the longest follow-up time was used in the analysis. Most strategies that focused on improving the practices of lay or community health workers were tested by a single study and the quality of evidence was generally low. Again, training alone tended to have modest effect sizes (median of 7·3 percentage points). Strategies that included mHealth had a median effect size of 8·7 percentage points. Strategies that included training and supervision had a median effect size of 9·6 percentage points, and strategies that included training and community support approaches, such as patient education, had a median effect size of 22·7 percentage points. Despite the scope and range of the studies, it is difficult to draw conclusions about how generalisable these strategies are. Studies tended to only include small numbers of health facilities in the intervention group (often less than ten) and short post-intervention follow-up times (median of 7 months or less). Effect sizes for strategies tested by multiple studies included in the HCPPR also varied considerably, which might be due to study biases, random variation, and considerable heterogeneity of study methods and context. For example, the effectiveness of complex, multifaceted interventions with at least four strategy components varied substantially (from nearly 0 to 61 percentage points), which suggests that complex strategies are sometimes, but not always, more effective than simpler ones, and clearly more work is needed in designing and testing these approaches. The variability of effect sizes for a given strategy also shows the difficulty in predicting how effective a strategy will be in a given context. Therefore, it is important for programmes to monitor the effectiveness of any strategy implemented in the field, not just in the context of research. Additionally, the duration of strategy effectiveness is uncertain. Using HCPPR data, we modelled the effect of follow-up time on strategy effectiveness, using random-effects models (or fixed-effect models, if dealing with less than ten studies or outcomes) adjusted for baseline performance (appendix 1). We found that the effect of supervision appeared to increase over time, but we found no evidence of a time trend for group problem solving. Results for training were inconclusive. Our ability to examine time trends was limited by the small number of studies per strategy with repeated post-intervention measures. Section 5 conclusion All reforms for quality will need to be country-led, starting with a vision for quality health care shared and actively supported by heads of state and their ministers. Many of these political leaders have already made commitments to UHC, but without improving quality, that promise is an empty one. Sequencing improvement efforts to first target populations who have the worst quality of care and health outcomes will also be important to realising high-quality UHC. 135,327 Global partners are encouraged to support these efforts by aligning with each country’s priorities, not funding flotillas of small-scale interventions over short project-cycles, and instead selectively investing in fewer health system reforms over a longer time period. Reorienting research priorities to support country-led efforts for health system quality improvements is also sorely needed. Every country will decide how to implement the systemic changes needed for high-quality care. This Commission recommends a careful assessment of the foundations of the national health system, consideration of the four universal actions we have presented, and a tailored strategy that addresses quality gaps and maximises existing assets. Finally, countries should not expect that every improvement initiative will succeed, but leaders should not be discouraged. The process of iteratively adapting and developing effective solutions for a given context takes time. The key is to get started, monitor progress, and learn from both successes and failures. Section 6: Recommendations In this Section, we identify opportunities for national governments, civil society, global partners, and researchers to contribute to a global effort towards high-quality health systems. National governments (1) Invest in health systems and make them more accountable to people. National governments need to invest in high-quality health systems for their own people, and they must also be accountable to people for their performance. This requires legislating for people’s right to quality health care, educating the population and health system stakeholders about these rights, enacting strong regulation and standard setting, sharing actionable information on health system performance, and creating mechanisms for remedy and redress. These actions should be complemented by social accountability mechanisms that promote the participation of the population in health system decisions. Countries will know they are on the way towards a high-quality, accountable health system when policy makers choose to receive their health care in their own public institutions. (2) Look beyond the government health sector. Building high-quality health systems requires strong primary, secondary, and professional education, solid road and transport networks, and reliable communication infrastructure. Partnering with other sectors will be essential to create the conditions for health system reform. Involvement of private health-care providers and institutions can expand people’s choices and might spur the system to improve user focus; these private providers will need to be effectively regulated and incentivised to produce desired impacts. (3) Embed quality of care in UHC. Quality should be at the core of UHC initiatives, alongside coverage and financial protection. For this, countries should begin by establishing a national quality guarantee for services provided through UHC that specifies the level of competence and user experience that people can expect in the health system. To ensure that poor people benefit from improved services, expansion should start with them. Progress on UHC should be measured through effective (quality-corrected) coverage. (4) Measure better. Quality measurement should be parsimonious, timely, and transparent. Health systems should report their performance to the public annually using dashboards on health and wellbeing, user experience, system competence, and population confidence. Data should further be disaggregated across regions and vulnerable groups. Countries need to update their health system data toolkits, and they can begin by shedding uninformative indicators and instruments and improving data quality in existing systems. The high-quality health systems toolkit should include vital registries and real-time health system intelligence systems on supply chain and human resources, reliable routine information systems, and targeted studies, such as rapid facility surveys and updated population surveys. Investing in national institutions and expertise for measurement and translation of evidence to policy is crucial for making use of the data. Health and data literacy are also crucial for health-care users. (5) Improve quality by starting with four universal actions. This Commission recommends that countries consider four universal actions to shift the trajectory toward high-quality health systems. Additional targeted opportunities in areas such as health financing, district-level learning, and others can complement these efforts. All strategies need careful monitoring and evaluation to measure their effect and allow local adaptations. The first action is to govern for quality; this means creating a shared vision for a high-quality and learning health system with a national quality policy strategy and mechanisms for implementation and accountability. These should be developed in partnership with the private sector, civil society, and in collaboration with non-health sectors. A learning system needs accurate and timely data and a health system leadership committed to improvement. Improving data literacy for health workers and consumers will be needed to make use of the data. The second action is quality-focused service delivery redesign; this requires reorganising health services to maximise health outcomes rather than solely geographic access to clinics. Primary care clinics should not tackle serious or rare health needs with elevated risk of mortality, such as deliveries, but should instead expand on their core competencies: integrated and continuous care for stable patients and community outreach and prevention. Governments and civil society need to work together to ensure that people can reach the care they need, when they need it, and that they receive respectful care; a range of strategies within and beyond the health sector are available. The third action is transforming the workforce, starting with a move to competency-based clinical education that includes active learning, early clinical exposure, and problem-based learning. The curriculum should include ethics, respectful care, and core quality concepts. Classroom instruction needs to be buttressed with role-modelling and supervision in practice settings. The workforce should be supported with good working conditions, regular pay, and clinical mentorship and be provided with opportunities to learn and grow. Health workers and their professional associations must redouble efforts to maintain and enforce high standards of practice to earn and keep the public’s trust. The fourth action is igniting demand for quality, which requires educating people about their health entitlements according to national resources and use targeted quality reporting, social networks, and mobile technologies to empower people to become active patients who seek and motivate health workers to provide good quality care. Civil society and non-governmental organisations (1) Demand more from providers and health systems. People need to inform themselves on their rights and entitlements in the health system, including the right to competent care, respect, information, privacy, consent, and confidentiality. People enrolled in UHC programmes need to understand their benefit packages, care options, and communicate their needs and preferences to their providers. They should make use of redress options when care falls below the quality standard. (2) Agitate for change and hold systems to account. Civil society should insist on transparent sharing of health system capacity and performance. They should press for greater social accountability through citizen report cards, community monitoring, social audits, participatory budgeting, citizen charters, and health committees. However, social accountability is not a replacement for government-led accountability; they are most successful in improving health system performance when combined. Global bilateral, multilateral, and foundation partners (1) Invest in national institutions to produce evidence on health system quality. Many LMICs do not have effective institutions to do the functions in metrics, research, and evidence-based planning required for a learning health system. Global partners should support the international and national training of data scientists and help build institutional capability through mentoring and sharing of organisational best practices. Policy uptake of analytical findings from local institutions can in turn build confidence in and demand for locally generated evidence. (2) Support development of health system quality measures. LMICs do not have measures that capture the elements of health system performance that matter to people and can inform improvement. Continued support of vital registries and health information systems to measure health and impacts is crucial. Agile facility surveys and real-time measures that capture quality of care and people’s voice and that can be linked to population health data are needed. Global repositories of validated comparable measures, instruments, and best practices in analysis can be a valuable resource. Panel 16: High-quality health systems research agenda Measuring and analysing quality Develop and validate quality measures suitable for resource-constrained settings for: health outcomes that can be attributed to health systems and patient-reported outcomes; competent care for mothers and newborn babies, cardiovascular diseases, chronic respiratory diseases, diabetes, cancer, mental health, and injuries; user experience, including respect, dignity, and autonomy; system and platform competence (eg, timeliness, safety, and integration), including quality of community outreach, primary care, and hospital care Understand the extent and causes of variations in quality: identify best performing countries, regions, and facilities and determine the factors contributing to their higher-quality care; explore causes of poor quality across different contexts Assess equity of quality care across dimensions of vulnerability, including setting of care, demographics, and disease type Analyse the effect of quality care on health, confidence, and economic outcomes, including patient-reported outcomes, demand for health care and bypassing, health system waste, and catastrophic and impoverishing expenditures Improving quality Test the effect of innovations in the preservice education of health professionals on delivery of competent and respectful care Evaluate effects of quality-centred health service design on health, user experience, equity of care, and health system function Explore individual and combinations of interventions to generate community demand for quality, including dissemination of locally relevant information and innovations that use new technologies Refine the best design for district-level learning strategies (eg, quality improvement collaboratives and other approaches) Analyse the effects of legal, performance, and social mechanisms to promote accountability in low-income and middle-income countries Test management innovations and intrinsic and extrinsic approaches to motivate providers Measure the costs and cost-effectiveness of improvement approaches and their sustainability Methods and tools Develop an agile facility survey for rapid measurement of health system quality that focus on measures that matter: competent care and systems and user experience Update population surveys to measure a broader range of health conditions Explore new technologies to improve accuracy and reduce the burden of process and outcome measurements (eg, wearable trackers, big data analytics) Expand and validate methods for measuring effective coverage Develop new methods to test system competence over time, such as tracer patients Incorporate implementation science in assessments of health system improvement strategies to understand what works, why, and in what contexts Expand the use of qualitative methods and approaches from social sciences, such as political and management science, in describing and diagnosing quality failures and successes Expand sample sizes and extend length of time in studies of all improvement strategies, to characterise the generalisability and sustainability of these approaches Include patient experience and patient-reported outcomes in improvement research (3) Include quality in tracking progress of global initiatives. Progress on UHC and the SDGs should include both coverage and quality, and effective (quality-corrected) coverage brings these concepts together. Several appropriate indicators are available while others require validation; more work is needed to build these indicators into global measure sets. (4) Channel donor funding to universal actions for improvement. Large-scale improvements such as health education reform or service delivery redesign are costly, and some low-income countries will require external financing to undertake it. More generally, funders should align their support with country strategies that promote the evolution toward a higher-quality health system and avoid funding a multitude of small scale or vertical initiatives, which contributes to policy and programming confusion and reduces resources for large-scale action. (5) Fund research on system-wide improvement strategies. Rigorous evaluation of improvement reforms is needed to gauge the effect of investing in reforms on health. Research can inform future national investment, help develop local capacity, and benefit other countries with similar contexts. Creating platforms for regional learning through networks and meetings can promote dissemination of success and avoid the replication of failed ideas. Researchers (1) Measure quality and evaluate quality improvement. Research is not a luxury: mismeasurement, reliance in assumptions rather than evidence, and the replication of failed ideas costs lives, squanders trust, and wastes resources. Data on health-care quality in LMICs do not reflect the current disease burden; for example, we know little about quality of care for diabetes and cardiovascular diseases and almost nothing about respiratory disease, cancer, mental health, injuries, and surgery. Adolescents and older adults are less visible in the available data than other age groups. Available data give a better picture of episodic, routine care than of longitudinal services or treatment of acute events, such as maternal or newborn complications or medical and surgical emergencies. Filling in these gaps will require better routine data collection and new research. In assessing quality improvement, we found that the evidence base for many popular approaches is surprisingly weak. Rigorous assessments of all improvement strategies, ideally with implementation science methods, will be essential to justify their scale-up. This Commission’s research priorities are shown in panel 16. Conclusion Although health systems will look different in different settings, all people should be able to count on receiving high-quality care that will improve their health and earn their trust. It is time to rethink our past approaches: to ask more from and invest more in this crucial determinant of health.
                Bookmark

                Author and article information

                Contributors
                Karen.zamboni@lshtm.ac.uk
                ulrika.baker@ki.se
                mukta.tyagi@iiphh.org
                joanna.schellenberg@lshtm.ac.uk
                z.hill@ucl.ac.uk
                Claudia.hanson@ki.se
                Journal
                Implement Sci
                Implement Sci
                Implementation Science : IS
                BioMed Central (London )
                1748-5908
                4 May 2020
                4 May 2020
                2020
                : 15
                : 27
                Affiliations
                [1 ]GRID grid.8991.9, ISNI 0000 0004 0425 469X, Department of Disease Control, , London School of Hygiene and Tropical Medicine, ; Keppel Street, London, WC1E 7HT UK
                [2 ]GRID grid.4714.6, ISNI 0000 0004 1937 0626, Department of Public Health Sciences, , Karolinska Institutet, ; Stockholm, Sweden
                [3 ]GRID grid.10595.38, ISNI 0000 0001 2113 2211, Department of Family Medicine, College of Medicine, , University of Malawi, ; Blantyre, Malawi
                [4 ]Public Health Foundation, Kavuri Hills, Madhapur, Hyderabad, India
                [5 ]GRID grid.83440.3b, ISNI 0000000121901201, Institute for Global Health, , University College London, ; London, UK
                Author information
                http://orcid.org/0000-0003-3478-8636
                Article
                978
                10.1186/s13012-020-0978-z
                7199331
                32366269
                7e23f6e2-a2db-4303-9e28-0e196fe7bc81
                © The Author(s). 2020

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 22 June 2019
                : 2 March 2020
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100010409, Children's Investment Fund Foundation;
                Award ID: G-1601-00920
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100000265, Medical Research Council;
                Award ID: MR/N013638/1
                Award Recipient :
                Categories
                Systematic Review
                Custom metadata
                © The Author(s) 2020

                Medicine
                quality improvement,evaluation,realist synthesis,context,mechanism of change
                Medicine
                quality improvement, evaluation, realist synthesis, context, mechanism of change

                Comments

                Comment on this article