35
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules.

          Related collections

          Most cited references35

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Sample Size and Statistical Power Calculation in Genetic Association Studies

          A sample size with sufficient statistical power is critical to the success of genetic association studies to detect causal genes of human complex diseases. Genome-wide association studies require much larger sample sizes to achieve an adequate statistical power. We estimated the statistical power with increasing numbers of markers analyzed and compared the sample sizes that were required in case-control studies and case-parent studies. We computed the effective sample size and statistical power using Genetic Power Calculator. An analysis using a larger number of markers requires a larger sample size. Testing a single-nucleotide polymorphism (SNP) marker requires 248 cases, while testing 500,000 SNPs and 1 million markers requires 1,206 cases and 1,255 cases, respectively, under the assumption of an odds ratio of 2, 5% disease prevalence, 5% minor allele frequency, complete linkage disequilibrium (LD), 1:1 case/control ratio, and a 5% error rate in an allelic test. Under a dominant model, a smaller sample size is required to achieve 80% power than other genetic models. We found that a much lower sample size was required with a strong effect size, common SNP, and increased LD. In addition, studying a common disease in a case-control study of a 1:4 case-control ratio is one way to achieve higher statistical power. We also found that case-parent studies require more samples than case-control studies. Although we have not covered all plausible cases in study design, the estimates of sample size and statistical power computed under various assumptions in this study may be useful to determine the sample size in designing a population-based genetic association study.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            A modified formula for calculating low-density lipoprotein cholesterol values

            Background The Friedewald formula (FF) is useful for calculating serum low-density lipoprotein cholesterol (LDL-C) values, but has a remarkable deviation and limitation especially in hypertriglyceridemia. We modify the formula which is now more suitable for LDL-C calculation. Methods 2180 cases were classified into three groups according to their TG concentrations (A: < 200 mg/dl, n = 1220; B: 200-400 mg/dl, n = 480; C: 400-1000 mg/dl, n = 480). The concentrations of LDL-C were measured or estimated by 1) a direct measurement (DM); 2) the FF; and 3) our modified Friedewald formula (MFF): LDL-C (mg/dl) = Non-HDL-C × 90% - TG × 10%. Results Linear regression showed a significant correlation (P < 0.001) between the measured and calculated LDL-C values. Bland-Altman plots indicated that the methods (DM/MFF) were in better agreement than those (DM/FF). The LDL-C/Non-HDL-C ratio in FF calculated values was significantly lower (P < 0.05) than that in MFF or DM values, while no significant difference between MFF and DM was found. In Group A and Group B, 4.26% and 14.79% of the MFF calculated values had more than 20% deviation from those measured by DM. These percentages were significantly lower than those calculated by FF, where 7.30% and 25.63% were observed, respectively (P < 0.01 and P < 0.001). The MFF calculated values were all positive even in Group C. Conclusions Compared with the FF calculation, serum LDL-C values estimated by our modified formula are closer to those measured by a direct assay. The modification significantly diminishes the interference caused by hypertriglyceridemia.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The impact of low serum triglyceride on LDL-cholesterol estimation.

              Most clinical laboratories directly measure serum triglyceride, total cholesterol, and high- density lipoprotein cholesterol. They indirectly calculate low-density lipoprotein cholesterol value using the Friedewald equation. Although high serum triglyceride (>400 mg/dL or 4.52 mmol/L) devaluates low-density lipoprotein cholesterol calculation by using this formula, effects of low serum triglyceride (<100 mg/dL or 1.13 mmol/L) on its accuracy is less defined.Two hundred thirty serum samples were assayed during a one-year period. In 115 samples, the triglyceride level was below 100 mg/dL and in 115 samples from age- and sex-matched patients the triglyceride level was 150 - 350 mg/dL (1.69 - 3.95 mmol/L). In both groups total cholesterol was above 250 mg/dL (6.46 mmol/L). On each sample, total cholesterol, high-density lipoprotein cholesterol, and triglyceride were directly measured in duplicate and low-density lipoprotein cholesterol measured directly and calculated with Friedewald equation as well. Statistical analysis showed that when triglyceride is <100 mg/dL, calculated low- density lipoprotein cholesterol is significantly overestimated (average :12.17 mg/dL or 0.31 mmol/L), where as when triglyceride is between 150 and 300 mg/dL no significant difference between calculated and measured low-density lipoprotein cholesterol is observed. In patients with low serum triglyceride and undesirably high total cholesterol levels, Friedewald equation may overestimate low-density lipoprotein cholesterol concentration and it should be either directly assayed or be calculated by a modified Friedewald equation. Using linear regression modeling, we propose a modified equation.
                Bookmark

                Author and article information

                Journal
                Bioinform Biol Insights
                Bioinform Biol Insights
                Bioinformatics and Biology Insights
                Bioinformatics and Biology Insights
                Libertas Academica
                1177-9322
                2015
                09 May 2016
                : 9
                : Suppl 3
                : 43-54
                Affiliations
                [1 ]Physiology Department, Morehouse School of Medicine, Atlanta, GA, USA.
                [2 ]Director of Cardiovascular Research Institute (CVRI), Morehouse School of Medicine, Atlanta, GA, USA.
                Author notes
                Article
                bbi-suppl.3-2015-043
                10.4137/BBI.S29473
                4862746
                27199552
                595da849-8910-4bbe-ae2c-b61070d00f21
                © 2015 the author(s), publisher and licensee Libertas Academica Ltd.

                This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 license.

                History
                : 05 June 2015
                : 21 September 2015
                : 23 September 2015
                Categories
                Original Research

                Bioinformatics & Computational biology
                artificial neural network,data imputation,machine learning,hypertension

                Comments

                Comment on this article

                scite_
                21
                0
                5
                0
                Smart Citations
                21
                0
                5
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content377

                Most referenced authors624