Assessments of studies meant to evaluate the effectiveness of interventions, programs, and policies can serve an important role in the interpretation of research results. However, evidence suggests that available quality assessment tools have poor measurement characteristics and can lead to opposing conclusions when applied to the same body of studies. These tools tend to (a) be insufficiently operational, (b) rely on arbitrary post-hoc decision rules, and (c) result in a single number to represent a multidimensional construct. In response to these limitations, a multilevel and hierarchical instrument was developed in consultation with a wide range of methodological and statistical experts. The instrument focuses on the operational details of studies and results in a profile of scores instead of a single score to represent study quality. A pilot test suggested that satisfactory between-judge agreement can be obtained using well-trained raters working in naturalistic conditions. Limitations of the instrument are discussed, but these are inherent in making decisions about study quality given incomplete reporting and in the absence of strong, contextually based information about the effects of design flaws on study outcomes.
See how this article has been cited at scite.ai
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.