Automatic Determination of the Need for Intravenous Contrast in Musculoskeletal MRI Examinations Using IBM Watson’s Natural Language Processing Algorithm
There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
<p class="first" id="Par1">Magnetic resonance imaging (MRI) protocoling can be time-
and resource-intensive,
and protocols can often be suboptimal dependent upon the expertise or preferences
of the protocoling radiologist. Providing a best-practice recommendation for an MRI
protocol has the potential to improve efficiency and decrease the likelihood of a
suboptimal or erroneous study. The goal of this study was to develop and validate
a machine learning-based natural language classifier that can automatically assign
the use of intravenous contrast for musculoskeletal MRI protocols based upon the free-text
clinical indication of the study, thereby improving efficiency of the protocoling
radiologist and potentially decreasing errors. We utilized a deep learning-based natural
language classification system from IBM Watson, a question-answering supercomputer
that gained fame after challenging the best human players on
<i>Jeopardy!</i> in 2011. We compared this solution to a series of traditional machine
learning-based
natural language processing techniques that utilize a term-document frequency matrix.
Each classifier was trained with 1240 MRI protocols plus their respective clinical
indications and validated with a test set of 280. Ground truth of contrast assignment
was obtained from the clinical record. For evaluation of inter-reader agreement, a
blinded second reader radiologist analyzed all cases and determined contrast assignment
based on only the free-text clinical indication. In the test set, Watson demonstrated
overall accuracy of 83.2% when compared to the original protocol. This was similar
to the overall accuracy of 80.2% achieved by an ensemble of eight traditional machine
learning algorithms based on a term-document matrix. When compared to the second reader’s
contrast assignment, Watson achieved 88.6% agreement. When evaluating only the subset
of cases where the original protocol and second reader were concordant (
<i>n</i> = 251), agreement climbed further to 90.0%. The classifier was relatively
robust
to spelling and grammatical errors, which were frequent. Implementation of this automated
MR contrast determination system as a clinical decision support tool may save considerable
time and effort of the radiologist while potentially decreasing error rates, and require
no change in order entry or workflow.
</p><div class="section">
<a class="named-anchor" id="d5115546e130">
<!--
named anchor
-->
</a>
<h5 class="section-title" id="d5115546e131">Electronic supplementary material</h5>
<p id="d5115546e133">The online version of this article (10.1007/s10278-017-0021-3)
contains supplementary
material, which is available to authorized users.
</p>
</div>
Developers of health care software have attributed improvements in patient care to these applications. As with any health care intervention, such claims require confirmation in clinical trials. To review controlled trials assessing the effects of computerized clinical decision support systems (CDSSs) and to identify study characteristics predicting benefit. We updated our earlier reviews by searching the MEDLINE, EMBASE, Cochrane Library, Inspec, and ISI databases and consulting reference lists through September 2004. Authors of 64 primary studies confirmed data or provided additional information. We included randomized and nonrandomized controlled trials that evaluated the effect of a CDSS compared with care provided without a CDSS on practitioner performance or patient outcomes. Teams of 2 reviewers independently abstracted data on methods, setting, CDSS and patient characteristics, and outcomes. One hundred studies met our inclusion criteria. The number and methodologic quality of studies improved over time. The CDSS improved practitioner performance in 62 (64%) of the 97 studies assessing this outcome, including 4 (40%) of 10 diagnostic systems, 16 (76%) of 21 reminder systems, 23 (62%) of 37 disease management systems, and 19 (66%) of 29 drug-dosing or prescribing systems. Fifty-two trials assessed 1 or more patient outcomes, of which 7 trials (13%) reported improvements. Improved practitioner performance was associated with CDSSs that automatically prompted users compared with requiring users to activate the system (success in 73% of trials vs 47%; P = .02) and studies in which the authors also developed the CDSS software compared with studies in which the authors were not the developers (74% success vs 28%; respectively, P = .001). Many CDSSs improve practitioner performance. To date, the effects on patient outcomes remain understudied and, when studied, inconsistent.
Narrative reports in medical records contain a wealth of information that may augment structured data for managing patient information and predicting trends in diseases. Pertinent negatives are evident in text but are not usually indexed in structured databases. The objective of the study reported here was to test a simple algorithm for determining whether a finding or disease mentioned within narrative medical reports is present or absent. We developed a simple regular expression algorithm called NegEx that implements several phrases indicating negation, filters out sentences containing phrases that falsely appear to be negation phrases, and limits the scope of the negation phrases. We compared NegEx against a baseline algorithm that has a limited set of negation phrases and a simpler notion of scope. In a test of 1235 findings and diseases in 1000 sentences taken from discharge summaries indexed by physicians, NegEx had a specificity of 94.5% (versus 85.3% for the baseline), a positive predictive value of 84.5% (versus 68.4% for the baseline) while maintaining a reasonable sensitivity of 77.8% (versus 88.3% for the baseline). We conclude that with little implementation effort a simple regular expression algorithm for determining whether a finding or disease is absent can identify a large portion of the pertinent negatives from discharge summaries.
Recently developed methods for learning sparse classifiers are among the state-of-the-art in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learning-theoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a component-wise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsity-promoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmark data sets attest to the accuracy, sparsity, and efficiency of the proposed methods.
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.