There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
Natural language processing (NLP) systems have been developed to provide access to
the tremendous body of data and knowledge that is available in the biomedical domain
in the form of natural language text. These NLP systems are valuable because they
can encode and amass the information in the text so that it can be used by other automated
processes to improve patient care and our understanding of disease processes and treatments.
Zellig Harris proposed a theory of sublanguage that laid the foundation for natural
language processing in specialized domains. He hypothesized that the informational
content and structure form a specialized language that can be delineated in the form
of a sublanguage grammar. The grammar can then be used by a language processor to
capture and encode the salient information and relations in text. In this paper, we
briefly summarize his language and sublanguage theories. In addition, we summarize
our prior research, which is associated with the sublanguage grammars we developed
for two different biomedical domains. These grammars illustrate how Harris' theories
provide a basis for the development of language processing systems in the biomedical
domain. The two domains and their associated sublanguages discussed are: the clinical
domain, where the text consists of patient reports, and the biomolecular domain, where
the text consists of complete journal articles.