37
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Prediction of myopia development among Chinese school-aged children using refraction data from electronic medical records: A retrospective, multicentre machine learning study

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Electronic medical records provide large-scale real-world clinical data for use in developing clinical decision systems. However, sophisticated methodology and analytical skills are required to handle the large-scale datasets necessary for the optimisation of prediction accuracy. Myopia is a common cause of vision loss. Current approaches to control myopia progression are effective but have significant side effects. Therefore, identifying those at greatest risk who should undergo targeted therapy is of great clinical importance. The objective of this study was to apply big data and machine learning technology to develop an algorithm that can predict the onset of high myopia, at specific future time points, among Chinese school-aged children.

          Methods and findings

          Real-world clinical refraction data were derived from electronic medical record systems in 8 ophthalmic centres from January 1, 2005, to December 30, 2015. The variables of age, spherical equivalent (SE), and annual progression rate were used to develop an algorithm to predict SE and onset of high myopia (SE ≤ −6.0 dioptres) up to 10 years in the future. Random forest machine learning was used for algorithm training and validation. Electronic medical records from the Zhongshan Ophthalmic Centre (a major tertiary ophthalmic centre in China) were used as the training set. Ten-fold cross-validation and out-of-bag (OOB) methods were applied for internal validation. The remaining 7 independent datasets were used for external validation. Two population-based datasets, which had no participant overlap with the ophthalmic-centre-based datasets, were used for multi-resource validation testing. The main outcomes and measures were the area under the curve (AUC) values for predicting the onset of high myopia over 10 years and the presence of high myopia at 18 years of age. In total, 687,063 multiple visit records (≥3 records) of 129,242 individuals in the ophthalmic-centre-based electronic medical record databases and 17,113 follow-up records of 3,215 participants in population-based cohorts were included in the analysis. Our algorithm accurately predicted the presence of high myopia in internal validation (the AUC ranged from 0.903 to 0.986 for 3 years, 0.875 to 0.901 for 5 years, and 0.852 to 0.888 for 8 years), external validation (the AUC ranged from 0.874 to 0.976 for 3 years, 0.847 to 0.921 for 5 years, and 0.802 to 0.886 for 8 years), and multi-resource testing (the AUC ranged from 0.752 to 0.869 for 4 years). With respect to the prediction of high myopia development by 18 years of age, as a surrogate of high myopia in adulthood, the algorithm provided clinically acceptable accuracy over 3 years (the AUC ranged from 0.940 to 0.985), 5 years (the AUC ranged from 0.856 to 0.901), and even 8 years (the AUC ranged from 0.801 to 0.837). Meanwhile, our algorithm achieved clinically acceptable prediction of the actual refraction values at future time points, which is supported by the regressive performance and calibration curves. Although the algorithm achieved balanced and robust performance, concerns about the compromised quality of real-world clinical data and over-fitting issues should be cautiously considered.

          Conclusions

          To our knowledge, this study, for the first time, used large-scale data collected from electronic health records to demonstrate the contribution of big data and machine learning approaches to improved prediction of myopia prognosis in Chinese school-aged children. This work provides evidence for transforming clinical practice, health policy-making, and precise individualised interventions regarding the practical control of school-aged myopia.

          Abstract

          Therapies to control myopia progression confer significant side effects and should be targeted to those at highest risk. Here, Yizhi Liu and colleagues report a machine learning algorithm that predicts the progression of myopia, into early adulthood, among Chinese school-aged children.

          Author summary

          Why was this study done?
          • Myopia has reached epidemic levels among young adults in East and Southeast Asia, affecting an estimated 80%–90% of high school graduates, with approximately 20% of them having high myopia. Various interventions, including atropine eyedrops and orthokeratology, have been proposed to control myopia progression; however, these approaches confer significant side effects. Identifying those at greatest risk who should undergo targeted therapy is the most important clinical challenge faced by ophthalmologists and optometrists.

          • Electronic medical records provide large-scale real-world clinical data for use in developing clinical decision systems. Taking school-aged myopia, the most prevalent eye disease, as an example, it would be of great value to use ophthalmic-centre-based electronic medical records to develop a big-data-driven clinical prediction algorithm based on machine learning algorithms.

          What did the researchers do and find?
          • This study analysed 687,063 longitudinal electronic medical records from the largest ophthalmic centres in China and developed and validated individualised prediction models for myopia prediction based on machine learning techniques.

          • Our model predicted spherical equivalent and onset of high myopia at 18 years of age at a clinically acceptable accuracy and as early as 8 years in advance.

          What do these findings mean?
          • The algorithm, which was trained and validated using a large real-world dataset, was able to predict the presence of high myopia with clinically acceptable accuracy among Chinese school-aged populations.

          • Large-scale, long-term electronic medical records and machine learning algorithms provide unique opportunities for the development of prediction models for progressive diseases, such as myopia in school-aged children.

          • Our findings have great potential to change current approaches used to manage school myopia by paediatric and general ophthalmologists as well as general practitioners and optometrists, who are often the first point of care.

          Related collections

          Most cited references18

          • Record: found
          • Abstract: found
          • Article: not found
          Is Open Access

          Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature

          Both the root mean square error (RMSE) and the mean absolute error (MAE) are regularly employed in model evaluation studies. Willmott and Matsuura (2005) have suggested that the RMSE is not a good indicator of average model performance and might be a misleading indicator of average error, and thus the MAE would be a better metric for that purpose. While some concerns over using RMSE raised by Willmott and Matsuura (2005) and Willmott et al. (2009) are valid, the proposed avoidance of RMSE in favor of MAE is not the solution. Citing the aforementioned papers, many researchers chose MAE over RMSE to present their model evaluation statistics when presenting or adding the RMSE measures could be more beneficial. In this technical note, we demonstrate that the RMSE is not ambiguous in its meaning, contrary to what was claimed by Willmott et al. (2009). The RMSE is more appropriate to represent model performance than the MAE when the error distribution is expected to be Gaussian. In addition, we show that the RMSE satisfies the triangle inequality requirement for a distance metric, whereas Willmott et al. (2009) indicated that the sums-of-squares-based statistics do not satisfy this rule. In the end, we discussed some circumstances where using the RMSE will be more beneficial. However, we do not contend that the RMSE is superior over the MAE. Instead, a combination of metrics, including but certainly not limited to RMSEs and MAEs, are often required to assess model performance.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Some Comments on the Evaluation of Model Performance

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              The problem of overfitting.

                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Funding acquisitionRole: InvestigationRole: Project administrationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: ResourcesRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: MethodologyRole: ValidationRole: VisualizationRole: Writing – review & editing
                Role: Project administrationRole: ResourcesRole: Writing – review & editing
                Role: Formal analysisRole: MethodologyRole: SoftwareRole: ValidationRole: Writing – review & editing
                Role: Data curationRole: InvestigationRole: MethodologyRole: SoftwareRole: Writing – review & editing
                Role: Formal analysisRole: MethodologyRole: ValidationRole: Writing – review & editing
                Role: MethodologyRole: Writing – review & editing
                Role: Data curationRole: MethodologyRole: Writing – review & editing
                Role: Formal analysisRole: InvestigationRole: SupervisionRole: Writing – review & editing
                Role: Project administrationRole: Writing – review & editing
                Role: Data curationRole: ResourcesRole: Writing – review & editing
                Role: MethodologyRole: Writing – review & editing
                Role: ResourcesRole: Writing – review & editing
                Role: MethodologyRole: Writing – review & editing
                Role: SupervisionRole: Writing – review & editing
                Role: SupervisionRole: Writing – review & editing
                Role: SupervisionRole: Writing – review & editing
                Role: ResourcesRole: Writing – review & editing
                Role: SupervisionRole: Writing – review & editing
                Role: ResourcesRole: Writing – review & editing
                Role: SupervisionRole: Writing – review & editing
                Role: SupervisionRole: Writing – review & editing
                Role: SupervisionRole: Writing – review & editing
                Role: SupervisionRole: Writing – review & editing
                Role: ConceptualizationRole: SupervisionRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: InvestigationRole: Project administrationRole: ResourcesRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: Funding acquisitionRole: InvestigationRole: Project administrationRole: SupervisionRole: ValidationRole: Writing – review & editing
                Role: Academic Editor
                Journal
                PLoS Med
                PLoS Med
                plos
                plosmed
                PLoS Medicine
                Public Library of Science (San Francisco, CA USA )
                1549-1277
                1549-1676
                6 November 2018
                November 2018
                : 15
                : 11
                : e1002674
                Affiliations
                [1 ] State Key Laboratory of Ophthalmology, Clinical Research Center for Ocular Disease, Zhongshan Ophthalmic Centre, Sun Yat-sen University, Guangzhou, China
                [2 ] School of Public Health, Sun Yat-sen University, Guangzhou, China
                [3 ] School of Mathematics, Sun Yat-sen University, Guangzhou, China
                [4 ] Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
                [5 ] UCL Institute of Ophthalmology, University College London and Moorfields Eye Hospital, London, United Kingdom
                [6 ] First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
                [7 ] Laboratory of Immunology, National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States of America
                [8 ] ARC Centre of Excellence in Vision Science, Research School of Biology, College of Medicine, Biology and Environment, Australian National University, Canberra, Australian Capital Territory, Australia
                [9 ] Centre for Eye Research Australia, University of Melbourne, Royal Victorian Eye and Ear Hospital, East Melbourne, Victoria, Australia
                University of California San Francisco, UNITED STATES
                Author notes

                The authors have no conflicts of interest to declare.

                Author information
                http://orcid.org/0000-0003-4672-9721
                http://orcid.org/0000-0001-7158-3782
                http://orcid.org/0000-0002-9398-4330
                http://orcid.org/0000-0002-6664-3442
                http://orcid.org/0000-0003-4108-9593
                Article
                PMEDICINE-D-18-01484
                10.1371/journal.pmed.1002674
                6219762
                30399150
                3f5d4f50-1d33-4a3a-b7d7-a18201db41e1
                © 2018 Lin et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 2 May 2018
                : 13 September 2018
                Page count
                Figures: 5, Tables: 3, Pages: 17
                Funding
                Funded by: National Key R&D Program of China
                Award ID: 2018YFC0116500
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/501100001809, National Natural Science Foundation of China;
                Award ID: 91546101
                Award Recipient :
                Funded by: Youth Pearl River Scholar in Guangdong
                Award ID: 2016
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/501100001809, National Natural Science Foundation of China;
                Award ID: 81822010
                Award Recipient :
                This study was funded by the National Key R&D Program of China (2018YFC0116500), the National Natural Science Foundation of China (91546101, 81822010), the Guangdong Science and Technology Innovation Leading Talents (2017TX04R031), and Youth Pearl River Scholar in Guangdong (2016). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Medicine and Health Sciences
                Ophthalmology
                Visual Impairments
                Myopia
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Machine Learning Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Machine Learning Algorithms
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Machine Learning Algorithms
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Research and Analysis Methods
                Database and Informatics Methods
                Health Informatics
                Electronic Medical Records
                Research and Analysis Methods
                Mathematical and Statistical Techniques
                Mathematical Functions
                People and Places
                Population Groupings
                Ethnicities
                Chinese People
                People and Places
                Population Groupings
                Age Groups
                Children
                People and Places
                Population Groupings
                Families
                Children
                Custom metadata
                This study uses electronic medical record data, which cannot be shared according to Personal Information Protection Law in People's Republic of China. This study also uses two population-based cohorts: data from the Guangzhou Outdoor Activity Longitudinal (GOAL) Trial and data from the Refractive Error Longitudinal Study (RELS), which are presented in S1 Data. The source code of this study is presented in S1 Code. All input and output parameters required for replication of this study are described in the paper and Supporting Information files.

                Medicine
                Medicine

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content219

                Cited by68

                Most referenced authors528