1. INTRODUCTION
Interstitial lung disease (ILD) comprises diverse conditions characterized by inflammation and fibrosis of the interstitium [1]. Because these disorders exhibit overlapping clinical, radiological, and pathological features, differential diagnosis is challenging, even for experienced physicians [2]. Recent data have indicated rising morbidity and mortality from ILD. Between 2000 and 2017, the mortality rate due to idiopathic pulmonary fibrosis (IPF) in the United States increased by 9.85% (from 18.81 per 100,000 to 20.66 per 100,000) [3]. Diagnosing ILD in early stages is critical for determining appropriate treatment strategies. In contrast, missed diagnosis can give rise to potentially life-threatening complications.
In recent decades, high-resolution computed tomography (HRCT) has emerged as the primary modality used for diagnosing ILD, particularly fibrotic lung diseases. Trained radiologists rely on visual evaluation of medical images to detect, characterize, and diagnose diseases. Nevertheless, this assessment is somewhat subjective and differs depending on physicians’ education and experience. For instance, even experienced chest radiologists may achieve only basic agreement in detecting cellular tissue [4, 5], an essential component of usual interstitial pneumonia (UIP).
Since the 1990s, image analysis using artificial intelligence (AI) has rapidly advanced, driven by the introduction of deep learning (DL) algorithms based on neural networks and increased computer processing power. DL techniques have been applied in lesion detection, segmentation, and classification of ILD in HRCTs. Instead of using qualitative reasoning, DL algorithms recognize intricate patterns in imaging data and can automatically offer quantitative evaluations. The prominent DL algorithms used for image analysis are particularly adept at HRCT image analysis in ILD. Increasing evidence indicates that CNNs exhibit performance comparable or even superior to that of experts in diagnosing and managing ILD. By integrating AI into clinical processes as a tool to assist physicians, more precise and reproducible radiological evaluations can be achieved. Consequently, to achieve objective and timely diagnosis of ILD, AI assistance is highly desirable.
The objective of this review is to summarize recent advancements in DL applications for ILD classification and prognosis evaluation. Moreover, we explore barriers to translating these findings into clinical practice, and provide insights and recommendations.
2. ARTIFICIAL INTELLIGENCE OVERVIEW
AI has revolutionized medical image analysis and has shown promising performance in various computer vision tasks. Machine learning (ML) and DL are subcategories of AI that have become the state of the art in the field of image analysis. Unlike traditional ML methods, DL algorithms can automatically retrieve crucial information on characteristics without a need for manual definition by human experts. Such algorithms usually use multiple layers of processing that enhance feature extraction and characterization [6]. On the basis of neural networks and increased computer processing power, the performance of DL is continually improving and is currently considered comparable to or better than that of humans in image classification tasks (e.g., pneumonia recognition) [7]. Table 1 summarizes five frequently used DL algorithms.
Summary of five frequently used DL algorithms.
Algorithm | Typical Applications | Advantages | Disadvantages |
---|---|---|---|
Convolutional neural networks | Image classification, object detection, image segmentation, face recognition | ||
Recurrent neural networks | Language modeling, text generation, speech recognition | ||
Long short-term memory | Machine translation, natural language processing, complex time series analysis | ||
Generative adversarial networks | Image generation, style transfer, data augmentation | ||
Deep belief networks | Feature extraction, image recognition, video recognition, motion capture. |
The primary deep neural networks, CNNs, were proposed by Lecun et al. [8] in the 1990s, and are structured on the basis of a neural system. CNNs have shown remarkable capabilities in image analysis, surpassing all other image classification algorithms in the ImageNet Large Scale Visual Recognition Challenge [9]. The basic CNNs comprise three essential components: convolution layers, pooling layers, and fully connected networks. CNNs consist of extensive data and computational units (neurons) that communicate with each other through data transmission connections (axons). The AI algorithm is run multiple times on a training dataset, thereby adjusting the importance of the data connected to each axon to minimize errors in the algorithm’s outputs. After completion of the training process, the algorithm is tested on an independent dataset to gauge its performance [10]. Numerous AI systems using CNN algorithms have been developed to diagnose various diseases, including lung nodules [11, 12]. AI-aided interpretation has been found to achieve 6.4% greater detection accuracy than that of nine radiologists in pulmonary nodule detection on chest radiographs [13].
Because ILD images usually have repeating patterns, CNNs have been developed to exploit repetitive patterns [14]. Therefore, CNNs are being specifically used for the analysis of HRCT images in ILD. CNNs divide the intricate task of interpreting ILD images into several simplified tasks, including measuring organs or lesions (segmentation), identifying abnormal regions throughout the image (detection), diagnosing detected lesions (classification), and predicting pathology or prognosis on the basis of images.
3. CLASSIFICATION
3.1 Classifications of ILD patterns
DL is becoming the approach of choice for classifying ILDs patterns in radiological data [15]. Anthimopoulos et al. [16] first designed and assessed a CNN model specifically for differentiating between healthy tissues and typical ILD patterns, including ground glass opacity, micronodules, cementum, reticulum, fovea, and combinations of ground glass opacity and reticulum. This model exhibited a classification performance as high as 85.5%, thus demonstrating the potential of CNNs in ILD classification. The same research team subsequently introduced a CNN architecture that captures textural variations inherent to ILD patterns. In comparison to previous outcomes, this model, by leveraging transfer learning from several non-medical source databases, has achieved a 2% enhancement in performance [17]. Despite unsatisfactory results, the training approaches of the network are as important as the structural design. Transfer learning has been shown to effectively address data scarcity issues, and the ensemble and model compression used in this method are relatively intricate. One customized CNN architecture proposed by Huang et al. [18] has recently shown favorable benchmark performance in the classification of ILD patterns, exceeding that of most state-of-the-art models. Additionally, the researchers have further enhanced performance by using a novel two-stage transfer learning strategy that effectively transfers knowledge acquired from both the source and intermediate domains.
Extensive research has attempted to increase the accuracy of algorithms in distinguishing ILD patterns, particularly similar patterns. For instance, Kim’s study has increased the CNN’s classification accuracy from 81.27% to 95.12% with the expansion of the convolutional layer [19]. Notably, the incidence of misclassification substantially decreases in instances of pathological ambiguity, such as differentiation between normal and emphysema cases. Therefore, more complicated DL algorithms should be implemented to improve diagnostic capabilities for ILD.
Most previous studies have used patch-based image representations, which are generally effective for ILD classification [16, 19, 20]. Gao et al. [21] have presented a novel method that uses whole lung images as a holistic input to classify ILD patterns, and can capture visual details and spatial context that might be ignored in image patch-based characterization. Thet study used three attenuation ranges to detect abnormal lung patterns, thereby achieving enhanced visibility or visual separation among all six ILD disease categories.
Radiologic assessment of ILD requires experience and expertise, and inter-observer variability is high, even among experienced radiologists. Recently, Chaoe et al. [22] have conducted a pioneering investigation applying content-based image retrieval (CBIR) to ILD diagnosis. In that study, the top three reference CT images with diagnostic significance were extracted from the database through comparison of the extent and distribution of disease patterns in different regions quantified by the DL algorithm. After implementation of CBIR, the results demonstrated enhanced diagnostic accuracy among radiologists, across varying levels of experience and inter-reader agreement. This method increased confidence in the final diagnosis of ILD through reliance on not only the radiologist’s perceptions and experience, but also support from AI algorithms. Additionally, rather than relying solely on radiological data as individual inputs, modern models incorporating clinical and laboratory information are being developed [23]. Mei et al. have built a model based on initial CT images and associated clinical data, thus yielding a more comprehensive algorithm for classifying five types of ILD accurately [24]. Five models were devised, and the joint CNN model achieved the highest proficiency in the classification of ILD subtypes. This model precisely predicted five ILD subtypes and demonstrated superior diagnostic performance to that of a senior thoracic radiologist and a senior pulmonologist in identifying UIP in the test set. Therefore, although clinical information and relevant CT scans are accessible, this DL system can aid clinicians in diagnosis and classification of patients with ILD.
3.2 Classification of pulmonary fibrosis
Precise diagnosis of IPF, a chronic and progressive ILD, is crucial to facilitate timely commencement of antifibrotic therapy and, when applicable, enrollment in clinical trials. A confident diagnosis of IPF may be made in the correct clinical context when the CT shows a pattern of definite or probable UIP [25, 26]. However, radiological evaluation of IPF remains challenging, primarily because of significant inter-observer variability, even among experienced radiologists [4, 27].
DL algorithms can apply specialized expertise to particular issues. Walsh et al. [28] first conducted a case-cohort study to develop and evaluate a DL method for IPF classification based on criteria specified by two international idiopathic pulmonary fibrosis guideline statements. The algorithm (73.3%) performed more accurately than most chest radiologists (70.7%) in classifying cases according to the 2011 ATS/ERS/JRS/ALAT IPF guidelines. Moreover, on the basis of the 2018 Fleischner Society criteria for UIP, the algorithm was further retrained and achieved performance comparable to that of thoracic radiologists. Christe et al. [29] have introduced a machine learning-assisted computer-aided detection algorithm capable of classifying IPF with accuracy similar to that of radiologists, in accordance with the 2018 Fleischner Society criteria. Shaish et al. [30] first designed a DL model for the classification of ILD by using histopathology as a reference standard instead of relying on radiologists’ interpretation classifications. The researchers used virtual lung wedge resection as input to a CNN, and observed that this method achieved moderate accuracy in predicting histopathologic UIP pattern, with performance comparable to that of humans. Likewise, in a retrospective study in 198 patients with biopsy-confirmed ILD conducted by Bratt et al. [31], a DL model was used to enhance the noninvasive evaluation of atypically presenting IPF through predicting UIP histopathology from CT images. This DL model achieved superior diagnostic performance to that of visual assessment (AUC, 0.87 vs 0.80, P = 0.03) and exhibited higher reproducibility.
Most recently, intensive studies have focused on increasing the potential of the DL model in differentiating between IPF and non-IPF on HRCT images. Yu et al. [32] have built a two-stage model integrating a multi-scale, domain knowledge-guided attention model to ensure explainability and a random forest model to increase accuracy in making the final decision. In another study, Refaee et al. [33] developed three models involving handcrafted radiomics, DL, and ensemble models for the classification of IPF and non-IPF on HRCTs. The ensemble models achieved better performance than the radiologists. Hence, the combination of DL and handcrafted radiomics models may be a promising approach for supporting radiologists in diagnosing IPF.
The gold standard for diagnosing ILD is a dynamic and comprehensive approach involving multidisciplinary discussion (MDD). This approach emphasizes close collaboration among clinicians, radiologists, and pathologists to ensure accurate diagnosis. However, few DL studies have integrated CT images and clinical information to diagnose IPF. Furukawa et al. [34] first developed a multimodal AI for differentiating IPF from all ILDs. This algorithm used CT findings and clinical data to increase the accuracy of IPF diagnosis, because the MDD teams arrived at a diagnosis by integrating these data. The model showed higher diagnostic agreement in IPF diagnosis (κ = 0.67) than international MDD teams (κ = 0.53) and respiratory physicians (κ = 0.41). Future multi-center research is warranted to develop a more robust algorithm. This algorithm may serve as a promising tool for IPF diagnosis by furnishing reproducible, nearly instantaneous reports with human-level accuracy.
4. PREDICTING ILD PROGNOSIS
Given the unpredictability of progression and the short median survival (2–5 years), identifying patients with ILD who exhibit rapid disease progression is crucial. Nevertheless, predicting the future of patients poses a formidable challenge.
Most current guidelines suggest that pulmonary function tests and chest HRCT are essential for ILD patient follow-up. Early studies proposed several multidimensional indexes for the initial stratification of patients with IPF according to possible prognosis. The Composite Physiologic Index has a straightforward structure composed primarily of spirometric volumes and the diffusing capacity of the lung for carbon monoxide [35]. Ley et al. [36] have introduced a gender, age, physiology model to predict mortality in people with IPF, which incorporates four common variables: sex, age, and two lung physiology variables (forced vital capacity and diffusing capacity of the lung for carbon monoxide). Over the past decade, visual scoring has been the most frequently used approach for evaluating either overall disease status or the extent of specific CT patterns. Nevertheless, the primary constraint of visual scoring lies in the considerable interobserver variability [37]. Jacob et al. have reported the first evidence that ML is superior to radiologists in predicting the mortality of patients with IPF [38]. Kim et al. [39] have used recent quantitative texture-based scores to access initial alterations in HRCT scans to predict IPF progression. They have used a threshold from visual confirmation and have found that structural alterations of 4% or more in paired HRCT images from patients with IPF can potentially predict the decline in lung function in 1–2 years.
In the past several years, DL algorithms have been widely used as an important technique in evaluating ILD prognosis. For instance, Walsh et al. [40] have evaluated the prognostic precision of the DL algorithm Systematic Objective Fibrotic Imaging Analysis Algorithm (SOFIA). This algorithm has better prognostic predictive ability for individuals with progressive fibrotic lung disease than either assessments performed by expert radiologists or outcomes derived from guideline-based histologic patterns. The success of SOFIA in this context underscores the potential of DL algorithms to process complex medical data and extract valuable insights that can aid in clinical decision-making. However, although DL algorithms can provide valuable assistance, they should always be used in conjunction with medical expertise and human judgment, to ensure accurate and safe diagnoses and prognoses. In a recent study by Chassagnon et al. [41], lung atrophy was identified through the elastic alignment of CT images integrated with a DL classifier. This combined method was used to evaluate deterioration due to ILD in individuals with systemic sclerosis, and achieved an accuracy of 80% and 83% in depicting morphologic and functional worsening, respectively. This study fills gaps in previous longitudinal follow-up studies, which focused predominantly on the appraisal of ILD extent while disregarding the potential effect of lung shrinkage. Similarly, Si-Mohamed et al. [42] have discovered that the median annual lung volume loss on CT is more significant in individuals with rather than without IPF (155.7 mL vs 50.7 mL, P < 0.0001), and a relative annual CT volume loss higher than 9.4% is associated with a reduced mean survival time (2.0 years vs. 2.8 years) in IPF patients. Nam et al. [43] have applied commercial DL software to quantify the extent of pulmonary fibrosis in chest CT scans from patients with IPF. Additionally, assessment of the prognostic significance of the CT volumetric parameters has revealed that normal and fibrotic lung volume proportions can serve as independent predictors of overall survival after adjustment for clinical and physiological factors. Most recently, Mei et al. [24] have used two distinct time-series models, formulated by using retrospectively gathered clinical data and quantitative CT scans, to comprehensively analyze 3-year survival rates. The researchers incorporated medication use and additional therapeutic details into the patients’ clinical histories to further enhance the prediction of 3-year survival rate. The Transformer model was chosen to train on patient data within 1 year, 2 years, and 3 years. The model’s sensitivity increased from 54.55% at the end of the first year to 72.73% by the end of the third year. The model, through providing 3-year survival predictions, can dynamically furnish personalized insights regarding current and prospective patient treatment outcomes.
5. DISCUSSION AND PERSPECTIVES
ILD comprises a diverse spectrum of conditions that are major causes of morbidity and mortality. Tables 2 and 3 summarize the literature applying CNN techniques in ILD classification and prognosis evaluation on HRCT. Numerous studies applying DL to ILD have reported superior performance to conventional techniques or better performance than radiologists in diagnosing IPF or predicting ILD prognosis. However, several challenges persist in this field. First, developing DL models for ILD with high accuracy requires a substantial image sample size [46]. Normal lungs and various ILD patterns may exhibit similar appearance, and the same ILD pattern can present significant variations across different participants. The volume of training data plays a crucial role in enhancing the precision and reliability of DL algorithms. Moreover, the development of public databases could expand the use of CBIR, to enable applications providing ILD diagnostic assistance [22]. Nonetheless, acquiring a sufficient quantity of medical images to train these algorithms poses several challenges. Unlike the common images used in mainstream image analysis datasets (e.g., ImageNet, AlexNet, GoogLeNet, and VGGNet), securing high-quality radiological images is both problematic and costly, thus posing a substantial bottleneck in the field of medical image analysis.
Applications of DL in classifying ILD patterns.
Application | Authors | Year | Data source | Model and method | Key findings |
---|---|---|---|---|---|
Classification of ILD patterns and subtypes | Anthimopoulos et al. [16] | 2016 | HUG database + proprietary | A CNN consists of five convolutional layers with 2×2 kernels and LeakyReLU activations, followed by just one average pooling, with the size equal to that of the final feature maps and three dense layers. | Pattern-sensitivity ranged from 69% (honeycombing) to 99% (consolidation). |
Christodoulidis et al.[17] | 2017 | HUG database + proprietary | Multi-source transfer learning is used. | The method improved performance by an absolute 2% over the previous performance 0.8557 of the same CNN in [16]. | |
Kim et al. [19] | 2018 | Proprietary | A CNN with six learnable layers consists of four convolution layers and two fully connected layers. | As the convolution layer increased, the classification accuracy of the CNN showed better performance, from 81.27% to 95.12%. | |
Wang et al. [20] | 2018 | HUG database | A multi-scale and rotation-invariant convolutional neural network is used. | All tissue categories achieved >85% classification rates. | |
Gao et al. [21] | 2018 | HUG database | One algorithm achieves the entire image as a holistic input. | In the holistic image classification, the overall accuracy was 68.6%. | |
Huang et al. [18] | 2020 | HUG database | One new CNN architecture with a novel two-stage transfer learning strategy is used. | The performance was improved by the proposed two-stage transfer learning method. | |
Choe et al. [22] | 2022 | Proprietary | This CBIR system for chest CT images uses DL. | Diagnostic accuracy improved in all readers after application of CBIR (before vs after CBIR, 46.1% vs 60.9%, respectively). | |
Mei et al. [24] | 2023 | Proprietary | A CNN model is built via transfer learning by using pre-trained weights from a RadImageNet CNN Inception-ResNet-V2 (IRV2). | The joint model outperformed a senior thoracic radiologist and a senior pulmonologist in diagnosing UIP. It also performed as well as all human readers in sensitivity in diagnosing CHP, sarcoidosis, NSIP, and other ILD. | |
Classification of pulmonary fibrosis | Walsh et al.[28] | 2018 | Proprietary | One DL algorithm developed using TensorFlow on a 3XS DL G10 computer is used. | The model achieved greater accuracy (73.3%) than most thoracic radiologists (70.7%). |
Christe et al. [29] | 2019 | Proprietary | The INTACT system was designed by biomedical engineers and trained by chest radiologists and pulmonologists. | Reader 1, reader 2, and INTACT achieved similar accuracy for classifying pulmonary fibrosis into the original four categories: 0.6, 0.54, and 0.56, respectively (P > 0.45). | |
Shaish et al. [30] | 2021 | Proprietary | Virtual lung wedge resection in patients with ILD can be used as input to a CNN. | The model achieved a sensitivity of 74% and specificity of 58% in the testing cohort. | |
Yu et al. [44] | 2021 | Proprietary | This time/memory-efficient IPF diagnosis model uses axial chest CT and DK. | Incorporating DK in the training of DL models increased the overall accuracy from 0.89 to 0.91 for the baseline CNN model. | |
Refaee et al. [33] | 2022 | Proprietary | An HCR model, a DL model, and an ensemble of HCR and DL model are used. | The ensemble (85.3%) models performed better than two radiologists and one pulmonologist (66.7%) on the external test set. | |
Bratt et al. [31] | 2022 | Proprietary | This DL model was trained on a heterogeneous data set of scans from multiple institutions. | The model performance was superior to that of radiologists in predicting histopathologic diagnosis (AUC, 0.87 vs 0.80, respectively). | |
Yu et al. [32] | 2023 | Proprietary | This two-stage model combines explainability achieved by a DL approach, and accuracy achieved by a machine learning technique. | When both high- and moderate-resolution attention were included, under certain hyperparameter settings, the model achieved the highest AUC among all experiments (AUC ± SD = 0.99 ± 0.01). |
Abbreviations: DL, deep learning; CNN, convolutional neural network; ILD, interstitial lung disease; IPF, idiopathic pulmonary fibrosis; UIP, unusual interstitial pneumonitis; CHP, chronic hypersensitivity pneumonitis; NSIP, nonspecific interstitial pneumonia; HUG, Hospitals of Geneva; CBIR, content-based image retrieval; HCR, handcrafted radiomics; DK, domain knowledge; AUC, area under curve; SD, standard deviation.
Applications of DL in predicting ILD prognosis.
Authors | Year | Objective | No. of patients | Model | Key findings |
---|---|---|---|---|---|
Jacob et al. [38] | 2017 | Prediction of IPF mortality | 283 | CALIPER computer algorithm | CALIPER-derived parameters (pulmonary vessel volume and honeycombing) had greater prognostic accuracy than traditional visual CT scores. |
Walsh et al. [40] | 2022 | Prediction of progressive fibrotic lung disease | 504 | SOFIA, a deep CNN loosely based on the InceptionResNet-v2 architecture | SOFIA achieved better outcome prediction than expert radiologist evaluation or guideline-based histologic patterns. |
Chassagnon et al. [41] | 2020 | Diagnosis of lung shrinkage and functional worsening of ILD in patients with systemic sclerosis | 212 | Combination of elastic registration of CT scans with a DL classifier | The DL classifier depicted morphologic and functional worsening with an accuracy of 80% and 83%, respectively. |
Si-Mohamed, et al. [42] | 2022 | Exploration of prognostic value of annual CT volume loss in IPF | 560 | Commercially available software, a U-net-based DL algorithm | A relative annual CT volume loss above 9.4% was associated with a significantly diminished mean survival time at 2.0 years versus 2.8 years in IPF. |
Nam et al. [43] | 2022 | Prediction of IPF overall survival | 161 | Fully automatic, commercial DL software | CT-Norm% and CT-Fib% were independent prognostic factors for overall survival in IPF. |
Handa et al. [45] | 2021 | Prediction of IPF prognosis | 465 | AI-based quantitative CT image analysis software | Bronchial volumes and normal lung volumes were independently associated with survival after adjustment for sex, age, and lung physiology stage of IPF. |
Mei et al. [24] | 2023 | Evaluation of 3-year survival rate | 234 | Transformer model | The model became more sensitive when more follow up information was available, increasing in sensitivity from 54.55% to 72.73% at the end of year 1 and the end of year 3, respectively. |
Abbreviations: AI, artificial intelligence; DL, deep learning; CNN, convolutional neural network; ILD, interstitial lung disease; IPF, idiopathic pulmonary fibrosis; SOFIA, systematic objective fibrotic imaging analysis algorithm; CT-Norm%, normal lung volume proportion; CT-Fib%, fibrotic lung volume proportion.
Although several large-scale radiological imaging databases are currently available, most are specific to various conditions. Examples include MR-Net, which focuses on magnetic resonance images of the knee; MURA, designed for musculoskeletal radiographs; and Chest X-ray, which specializes in chest radiographs [47]. HUG database, one of the few public ILD databases, contains only 128 CT studies [48]. Therefore, international efforts will be critical to construct large-scale, balanced databases specifically designed for ILD. These databases should include comprehensive information, encompassing images and clinical data, and should adhere to uniform imaging standards, such as HRCT, to ensure effective training of DL models for ILD.
Several technological solutions can compensate for data scarcity to some extent. These methods include techniques such as transfer learning, data augmentation, and generative adversarial networks (GANs). For instance, transfer learning can leverage pre-existing knowledge from extensive datasets, thereby avoiding the need for large-scale data specific to the given task. Fine-tuning, a common technique for transfer learning, enables the nuanced adaptation of generalized models to specific requirements, and decreases the time required to develop and process new DL models. Notably, ensuring the similarity of the datasets is an essential prerequisite in considering fine-tuning methods [49]. Another promising technique expected to be increasingly used in the future creates synthetic medical images from GANs to potentially supply unlimited numbers of images created from one or more image databases [50]. Ensuring that the quality of generated data matches that of real data is critical.
DL algorithms are becoming progressively more efficient and complex, and capable of executing more tasks. However, standardized methods for validation are lacking. DL algorithms are increasingly acknowledged to need to undergo testing on publicly available datasets before clinical deployment. A public ILD dataset would enable the validation of diverse research models, thereby facilitating identification of the most effective models, given that many DL models developed for ILD diagnosis and prognosis are trained on small nonpublic datasets. Another major concern for DL algorithms is their generalizability [51]. The performance of DL algorithms may be robust on the datasets on which they were initially trained and tested, but may show marked deterioration of performance on new data from other sources. Consequently, developing a large and diverse dataset could enhance the training process and improve the algorithms’ generalization ability, thereby making them more reliable and effective in real-world clinical applications. Medical experts should establish multifaceted evaluation criteria to assess the clinical utility of these algorithms, because accuracy does not necessarily indicate clinical efficacy. Finally, deploying DL models at scale requires consideration of costs and energy consumption. Lightweight models may address these challenges by reducing model parameters and complexity.
In future work, ensemble learning may offer a promising approach for more accurate and efficient ILD management. This method combines multiple models to leverage their strengths and mitigate individual weaknesses, thus improving model detection accuracy and decreasing training time [52]. This method can also effectively handle imbalanced datasets, thereby increasing sensitivity to rare ILD patterns. Additionally, DL is occasionally considered a “black box” because of challenges associated with interpretability. Decoding the image features used by deep neural networks for prediction is critical for biomarker development in patients with established pulmonary fibrosis [40].
In conclusion, ILD is a clinically significant and difficult-to-manage problem, and DL offers distinctive advantages in diagnosing and predicting ILD prognosis. Further research efforts should focus on developing a high-performance DL architecture that could be deployed on any computer station and be made available to non-academic centers. Prospective studies to validate the clinical relevance of these tools are warranted before their use in routine clinical practice.