Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming task typically requires expert radiologists to read the images, leading to fatigue-based diagnostic error and lack of diagnostic expertise in areas of the world where radiologists are not available. Recently, deep learning approaches have been able to achieve expert-level performance in medical image interpretation tasks, powered by large network architectures and fueled by the emergence of large labeled datasets. The purpose of this study is to investigate the performance of a deep learning algorithm on the detection of pathologies in chest radiographs compared with practicing radiologists.

Methods and findings

We developed CheXNeXt, a convolutional neural network to concurrently detect the presence of 14 different pathologies, including pneumonia, pleural effusion, pulmonary masses, and nodules in frontal-view chest radiographs. CheXNeXt was trained and internally validated on the ChestX-ray8 dataset, with a held-out validation set consisting of 420 images, sampled to contain at least 50 cases of each of the original pathology labels. On this validation set, the majority vote of a panel of 3 board-certified cardiothoracic specialist radiologists served as reference standard. We compared CheXNeXt’s discriminative performance on the validation set to the performance of 9 radiologists using the area under the receiver operating characteristic curve (AUC). The radiologists included 6 board-certified radiologists (average experience 12 years, range 4–28 years) and 3 senior radiology residents, from 3 academic institutions. We found that CheXNeXt achieved radiologist-level performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies. The radiologists achieved statistically significantly higher AUC performance on cardiomegaly, emphysema, and hiatal hernia, with AUCs of 0.888 (95% confidence interval [CI] 0.863–0.910), 0.911 (95% CI 0.866–0.947), and 0.985 (95% CI 0.974–0.991), respectively, whereas CheXNeXt’s AUCs were 0.831 (95% CI 0.790–0.870), 0.704 (95% CI 0.567–0.833), and 0.851 (95% CI 0.785–0.909), respectively. CheXNeXt performed better than radiologists in detecting atelectasis, with an AUC of 0.862 (95% CI 0.825–0.895), statistically significantly higher than radiologists' AUC of 0.808 (95% CI 0.777–0.838); there were no statistically significant differences in AUCs for the other 10 pathologies. The average time to interpret the 420 images in the validation set was substantially longer for the radiologists (240 minutes) than for CheXNeXt (1.5 minutes). The main limitations of our study are that neither CheXNeXt nor the radiologists were permitted to use patient history or review prior examinations and that evaluation was limited to a dataset from a single institution.

Conclusions

In this study, we developed and validated a deep learning algorithm that classified clinically important abnormalities in chest radiographs at a performance level comparable to practicing radiologists. Once tested prospectively in clinical settings, the algorithm could have the potential to expand patient access to chest radiograph diagnostics.

Abstract

In their study, Pranav Rajpurkar and colleagues test a deep learning algorithm that classifies clinically important abnormalities in chest radiographs.

Author summary

Why was this study done?

Chest radiographs are the most common medical imaging test in the world and critical for diagnosing common thoracic diseases.
Radiograph interpretation is a time-consuming task, and there is shortage of qualified trained radiologists in many healthcare systems.
Deep learning algorithms that have been developed to provide diagnostic chest radiograph interpretation have not been compared to expert human radiologist performance.

What did the researchers do and find?

We developed a deep learning algorithm to concurrently detect 14 clinically important pathologies in chest radiographs.
The algorithm can also localize parts of the image most indicative of each pathology.
We evaluated the algorithm against 9 practicing radiologists on a validation set of 420 images for which the majority vote of 3 cardiothoracic specialty radiologists served as ground truth.
The algorithm achieved performance equivalent to the practicing radiologists on 10 pathologies, better on 1 pathology, and worse on 3 pathologies.
Radiologists labeled the 420 images in 240 minutes on average, and the algorithm labeled them in 1.5 minutes.

What do these findings mean?

Deep learning algorithms can diagnose certain pathologies in chest radiographs at a level comparable to practicing radiologists on a single institution dataset.
After clinical validation, algorithms such as the one presented in this work could be used to increase access to rapid, high-quality chest radiograph interpretation.

Related collections

Most cited references 32

Record: found
Abstract: not found
Conference Proceedings: not found

Learning Deep Features for Discriminative Localization

Aditya Khosla, Antonio S. Torralba, Aude Oliva … (2016)

0 comments Cited 813 times – based on 0 reviews

Bookmark

Record: found
Abstract: found
Article: not found

Training and Validating a Deep Convolutional Neural Network for Computer-Aided Detection and Classification of Abnormalities on Frontal Chest Radiographs.

Mark Cicero, Alexander Bilbily, Errol Colak … (2017)

Convolutional neural networks (CNNs) are a subtype of artificial neural network that have shown strong performance in computer vision tasks including image classification. To date, there has been limited application of CNNs to chest radiographs, the most frequently performed medical imaging study. We hypothesize CNNs can learn to classify frontal chest radiographs according to common findings from a sufficiently large data set.

0 comments Cited 93 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A deep learning-based prediction model for gamma evaluation in patient-specific quality assurance.

Seiji Tomori, Noriyuki Kadoya, Yoshiki Takayama … (2018)

Patient-specific quality assurance (QA) measurement is conducted to confirm the accuracy of dose delivery. However, measurement is time-consuming and places a heavy workload on the medical physicists and radiological technologists. In this study, we proposed a prediction model for gamma evaluation, based on deep learning. We applied the model to a QA measurement dataset of prostate cancer cases to evaluate its practicality.

0 comments Cited 61 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Pranav Rajpurkar:

ORCID: http://orcid.org/0000-0002-8030-3727

Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Project administrationRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing

Jeremy Irvin:

ORCID: http://orcid.org/0000-0002-0395-4403

Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing

Robyn L. Ball: Role: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing

Kaylie Zhu: Role: Data curationRole: MethodologyRole: SoftwareRole: VisualizationRole: Writing – original draft

Brandon Yang: Role: Data curationRole: MethodologyRole: SoftwareRole: VisualizationRole: Writing – original draft

Hershel Mehta: Role: Data curationRole: MethodologyRole: SoftwareRole: VisualizationRole: Writing – original draft

Tony Duan: Role: Data curationRole: MethodologyRole: SoftwareRole: VisualizationRole: Writing – original draft

Daisy Ding: Role: Data curationRole: MethodologyRole: SoftwareRole: VisualizationRole: Writing – original draft

Aarti Bagul: Role: Data curationRole: MethodologyRole: SoftwareRole: VisualizationRole: Writing – original draft

Curtis P. Langlotz:

ORCID: http://orcid.org/0000-0002-8972-8051

Role: ConceptualizationRole: SupervisionRole: Writing – review & editing

Bhavik N. Patel: Role: Data curationRole: InvestigationRole: Writing – review & editing

Kristen W. Yeom:

ORCID: http://orcid.org/0000-0001-9860-3368

Role: Data curationRole: Writing – review & editing

Katie Shpanskaya:

ORCID: http://orcid.org/0000-0003-2741-4046

Role: ConceptualizationRole: Data curationRole: Project administrationRole: Writing – original draftRole: Writing – review & editing

Francis G. Blankenberg: Role: Data curationRole: Writing – review & editing

Jayne Seekins: Role: Data curationRole: Writing – review & editing

Timothy J. Amrhein:

ORCID: http://orcid.org/0000-0002-9354-9486

Role: Data curationRole: Writing – review & editing

David A. Mong: Role: Data curationRole: Writing – review & editing

Safwan S. Halabi:

ORCID: http://orcid.org/0000-0003-1317-984X

Role: Data curationRole: Writing – review & editing

Evan J. Zucker: Role: Data curationRole: Writing – review & editing

Andrew Y. Ng: Role: ConceptualizationRole: Project administrationRole: SupervisionRole: Writing – review & editing

Matthew P. Lungren: Role: ConceptualizationRole: Data curationRole: SupervisionRole: VisualizationRole: Writing – original draftRole: Writing – review & editing

Aziz Sheikh: Role: Academic Editor

Journal

Journal ID (nlm-ta): PLoS Med

Journal ID (iso-abbrev): PLoS Med

Journal ID (publisher-id): plos

Journal ID (pmc): plosmed

Title: PLoS Medicine

Publisher: Public Library of Science (San Francisco, CA USA )

ISSN (Print): 1549-1277

ISSN (Electronic): 1549-1676

Publication date (Electronic): 20 November 2018

Publication date Collection: November 2018

Volume: 15

Issue: 11

Electronic Location Identifier: e1002686

Affiliations

[1 ] Department of Computer Science, Stanford University, Stanford, California, United States of America

[2 ] Department of Medicine, Quantitative Sciences Unit, Stanford University, Stanford, California, United States of America

[3 ] Department of Radiology, Stanford University, Stanford, California, United States of America

[4 ] Department of Radiology, Duke University, Durham, North Carolina, United States of America

[5 ] Department of Radiology, University of Colorado, Denver, Colorado, United States of America

Edinburgh University, UNITED KINGDOM

Author notes

I have read the journal's policy and the authors of this manuscript have the following competing interests: CPL holds shares in whiterabbit.ai and Nines.ai, is on the Advisory Board of Nuance Communications and on the Board of Directors for the Radiological Society of North America, and has other research support from Philips, GE Healthcare, and Philips Healthcare. MPL holds shares in and serves on the Advisory Board for Nines.ai. None of these organizations have a financial interest in the results of this study.

‡These authors share first authorship on, and contributed equally to, this work.

* E-mail: pranavsr@ 123456cs.stanford.edu

Author information

Pranav Rajpurkar http://orcid.org/0000-0002-8030-3727

Jeremy Irvin http://orcid.org/0000-0002-0395-4403

Curtis P. Langlotz http://orcid.org/0000-0002-8972-8051

Kristen W. Yeom http://orcid.org/0000-0001-9860-3368

Katie Shpanskaya http://orcid.org/0000-0003-2741-4046

Timothy J. Amrhein http://orcid.org/0000-0002-9354-9486

Safwan S. Halabi http://orcid.org/0000-0003-1317-984X

Article

Publisher ID: PMEDICINE-D-18-01880

DOI: 10.1371/journal.pmed.1002686

PMC ID: 6245676

PubMed ID: 30457988

SO-VID: c2c9d382-a39f-4da5-b185-759ca0f8d773

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 29 May 2018

Date accepted : 3 October 2018

Page count

Figures: 3, Tables: 1, Pages: 17

Funding

The authors received no specific funding for this work. This study was made possible via infrastructure support from the Stanford Center for Artificial Intelligence in Medicine and Imaging ( AIMI.stanford.edu).

Custom metadata

Data Availability The data used in this study is third party and is publicly hosted by the National Institutes of Health Clinical Center at https://nihcc.app.box.com/v/ChestXray-NIHCC. The test set annotations are not made publicly available to preserve the integrity of the test results when hosting public model evaluation. All other data is included in the paper, its Supporting Information files, and at the following Box link (which contains code as well): https://stanfordmedicine.box.com/s/b3gk9qnanzrdocqge0pbuh07mreu5x7y.

ScienceOpen disciplines: Medicine

Data availability:

ScienceOpen disciplines: Medicine

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists

Read this article at

Abstract

Background

Methods and findings

Conclusions

Abstract

Author summary

Why was this study done?

What did the researchers do and find?

What do these findings mean?

Related collections

Karger: Dermatology

Most cited references 32

Learning Deep Features for Discriminative Localization

Training and Validating a Deep Convolutional Neural Network for Computer-Aided Detection and Classification of Abnormalities on Frontal Chest Radiographs.

A deep learning-based prediction model for gamma evaluation in patient-specific quality assurance.

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 138

Cited by 353

Most referenced authors 690