Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
51
views
0
recommends
+1 Recommend
0 collections
    3
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Effective use of latent semantic indexing and computational linguistics in biological and biomedical applications

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Text mining is rapidly becoming an essential technique for the annotation and analysis of large biological data sets. Biomedical literature currently increases at a rate of several thousand papers per week, making automated information retrieval methods the only feasible method of managing this expanding corpus. With the increasing prevalence of open-access journals and constant growth of publicly-available repositories of biomedical literature, literature mining has become much more effective with respect to the extraction of biomedically-relevant data. In recent years, text mining of popular databases such as MEDLINE has evolved from basic term-searches to more sophisticated natural language processing techniques, indexing and retrieval methods, structural analysis and integration of literature with associated metadata. In this review, we will focus on Latent Semantic Indexing (LSI), a computational linguistics technique increasingly used for a variety of biological purposes. It is noted for its ability to consistently outperform benchmark Boolean text searches and co-occurrence models at information retrieval and its power to extract indirect relationships within a data set. LSI has been used successfully to formulate new hypotheses, generate novel connections from existing data, and validate empirical data.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text.

            ABNER (A Biomedical Named Entity Recognizer) is an open source software tool for molecular biology text mining. At its core is a machine learning system using conditional random fields with a variety of orthographic and contextual features. The latest version is 1.5, which has an intuitive graphical interface and includes two modules for tagging entities (e.g. protein and cell line) trained on standard corpora, for which performance is roughly state of the art. It also includes a Java application programming interface allowing users to incorporate ABNER into their own systems and train models on new corpora.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Gene clustering by latent semantic indexing of MEDLINE abstracts.

              A major challenge in the interpretation of high-throughput genomic data is understanding the functional associations between genes. Previously, several approaches have been described to extract gene relationships from various biological databases using term-matching methods. However, more flexible automated methods are needed to identify functional relationships (both explicit and implicit) between genes from the biomedical literature. In this study, we explored the utility of Latent Semantic Indexing (LSI), a vector space model for information retrieval, to automatically identify conceptual gene relationships from titles and abstracts in MEDLINE citations. We found that LSI identified gene-to-gene and keyword-to-gene relationships with high average precision. In addition, LSI identified implicit gene relationships based on word usage patterns in the gene abstract documents. Finally, we demonstrate here that pairwise distances derived from the vector angles of gene abstract documents can be effectively used to functionally group genes by hierarchical clustering. Our results provide proof-of-principle that LSI is a robust automated method to elucidate both known (explicit) and unknown (implicit) gene relationships from the biomedical literature. These features make LSI particularly useful for the analysis of novel associations discovered in genomic experiments. The 50-gene document collection used in this study can be interactively queried at http://shad.cs.utk.edu/sgo/sgo.html.
                Bookmark

                Author and article information

                Journal
                Front Physiol
                Front Physiol
                Front. Physio.
                Frontiers in Physiology
                Frontiers Media S.A.
                1664-042X
                29 October 2012
                30 January 2013
                2013
                : 4
                : 8
                Affiliations
                [1] 1Laboratory of Neuroscience, Receptor Pharmacology Unit, National Institute on Aging, National Institutes of Health Baltimore, MD, USA
                [2] 2Laboratory of Clinical Investigation, Metabolism Unit, National Institute on Aging, National Institutes of Health Baltimore, MD, USA
                Author notes

                Edited by: Firas H. Kobeissy, University of Florida, USA

                Reviewed by: Bilal Fadlallah, University of Florida, USA; Fadi A. Zaraket, American University of Beirut, Lebanon; Dan Xia, Harvard Medical School, USA

                *Correspondence: Stuart Maudsley, Laboratory of Neuroscience, Receptor Pharmacology Unit, National Institute on Aging, National Institutes of Health, 251 Bayview Blvd., Baltimore, MD 21224, USA. e-mail: maudsleyst@ 123456mail.nih.gov

                This article was submitted to Frontiers in Systems Biology, a specialty of Frontiers in Physiology.

                Article
                10.3389/fphys.2013.00008
                3558626
                23386833
                f26a2841-8de9-4e5a-a2d3-9c22e67d2194
                Copyright © 2013 Chen, Martin, Daimon and Maudsley.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.

                History
                : 12 October 2012
                : 09 January 2013
                Page count
                Figures: 2, Tables: 0, Equations: 0, References: 45, Pages: 6, Words: 4958
                Categories
                Physiology
                Mini Review Article

                Anatomy & Physiology
                latent semantic indexing,data mining,computational linguistics,molecular interactions,drug discovery

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content307

                Cited by9

                Most referenced authors672