Transcriptomic and cellular decoding of regional brain vulnerability to neurogenetic disorders

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Neurodevelopmental disorders have a heritable component and are associated with region specific alterations in brain anatomy. However, it is unclear how genetic risks for neurodevelopmental disorders are translated into spatially patterned brain vulnerabilities. Here, we integrated cortical neuroimaging data from patients with neurodevelopmental disorders caused by genomic copy number variations (CNVs) and gene expression data from healthy subjects. For each of the six investigated disorders, we show that spatial patterns of cortical anatomy changes in youth are correlated with cortical spatial expression of CNV genes in neurotypical adults. By transforming normative bulk-tissue cortical expression data into cell-type expression maps, we link anatomical change maps in each analysed disorder to specific cell classes as well as the CNV-region genes they express. Our findings reveal organizing principles that regulate the mapping of genetic risks onto regional brain changes in neurogenetic disorders. Our findings will enable screening for candidate molecular mechanisms from readily available neuroimaging data.

Abstract

How neurodevelopmental disorder-associated risk genes are translated into spatially patterned brain vulnerabilities is unclear. Here, the authors show that disorder-specific patterns of neuroanatomical changes are aligned to brain expression maps of disease risk genes in healthy subjects.

Related collections

Most cited references 44

Record: found
Abstract: found
Article: not found

Discovering Motifs in Ranked Lists of DNA Sequences

Eran Eden, Doron Lipson, Sivan Yogev … (2007)

Introduction Background This paper examines the problem of discovering “interesting” sequence motifs in biological sequence data. A widely accepted and more formal definition of this task is: given a target set and a background set of sequences (or a background model), identify sequence motifs that are enriched in the target set compared with the background set. The purpose of this paper is to extend this formulation and to make it more flexible so as to enable the determination of the target and background set in a data driven manner. Discovery of sequences or attributes that are enriched in a target set compared with a background set (or model) has become increasingly useful in a wide range of applications in molecular biology research. For example, discovery of DNA sequence motifs that are overabundant in a set of promoter regions of co-expressed genes (determined by clustering of expression data) can suggest an explanation for this co-expression. Another example is the discovery of DNA sequences that are enriched in a set of promoter regions to which a certain transcription factor (TF) binds strongly, inferred from chromatin immuno-precipitation on a microarray (ChIP–chip) [1] measurements. The same principle may be extended to many other applications such as discovery of genomic elements enriched in a set of highly methylated CpG island sequences [2]. Due to its importance, this task of discovering enriched DNA subsequences and capturing their corresponding motif profile has gained much attention in the literature. Any approach to motif discovery must address several fundamental issues. The first issue is the way by which motifs are represented. There are several strategies for motif representation: using a k-mer of IUPAC symbols where each symbol represents a fixed set of possible nucleotides at a single position (examples of methods that use this representation include REDUCE [3], YMF [4,5], ANN-SPEC [6], and a hypergeometric-based method [7]) or using a position weight matrix (PWM), which specifies the probability of observing each nucleotide at each motif position (for example MEME [8], BioProspector [9], MotifBooster [10], DME-X [11], and AlignACE [12]). Both representations assume base position independence. Alternatively, higher order representations that capture positional dependencies have been proposed (e.g., HMM and Bayesian networks motif representations [13]). While these representations circumvent the position independence assumption, they are more vulnerable to overfitting and lack of data for determining model parameters. The method described in this paper uses the k-mer model with symbols above IUPAC. The second issue is devising a motif scoring scheme. Many strategies for scoring motifs have been suggested in the literature. One simple yet powerful approach uses the hypergeometric distribution for identifying enriched motif kernels in a set of sequences and then expanding these motifs using an EM algorithm [7]. The framework described in this paper is a natural extension of the approach of [7]. YMF [4,5] is an exhaustive search algorithm which associates each motif with a z-score. AlignACE [12] uses a Gibbs sampling algorithm for finding global sequence alignments and produces a MAP score. This score is an internal metric used to determine the significance of an alignment. MEME [8] uses an expectation maximization strategy and outputs the log-likelihood and relative entropy associated with each motif. Once a scoring scheme is devised, a defined motif search space is scanned (either heuristically or exhaustively) and motifs with significantly high scores are identified. To determine the statistical significance of the obtained scores, many methods resort to simulations or ad hoc thresholds. Several excellent reviews narrate the different strategies for motif detection and use quantitative benchmarking to compare their performance [14–18]. A related aspect of motif discovery, which is outside the scope of this paper, focuses on properties of clusters and modules of TF binding sites (TFBS). Examples of approaches that search for combinatorial patterns and modules underlying TF binding and gene expression include [19–23]. Open Challenges in Motif Discovery One issue of motif discovery that is often overlooked concerns the partition of the input set of sequences into target and background sets. Many methods rely on the user to provide these two sets and search for motifs that are overabundant in the former set compared with the latter. The question of how to partition the data into target and background sets is left to the user. However, the boundary between the sets is often unclear and the exact choice of sequences in each set arbitrary. For example, suppose that one wishes to identify motifs within promoter sequences that constitute putative TFBS. An obvious strategy would be to partition the set of promoter sequences into target and background sets according to the TF binding signal (as measured by ChIP–chip experiments). The two sets would contain the sequences to which the TF binds “strongly” and “weakly,” respectively. A motif detection algorithm could then be applied to find motifs that are overabundant in the target set compared with the background set. In this scenario, the positioning of the cutoff between the strong and weak binding signal is somewhat arbitrary. Obviously, the final outcome of the motif identification process can be highly dependent on this choice of cutoff. A stringent cutoff will result in the exclusion of informative sequences from the target set while a promiscuous cutoff will cause inclusion of nonrelevant sequences—both extremes hinder the accuracy of motif prediction. This example demonstrates a fundamental difficulty in partitioning most types of data. Several methods attempt to circumvent this hurdle. For example, REDUCE [3] uses a regression model on the entire set of sequences. However, it is difficult to justify this model in the context of multiple motif occurrence (as explained below). In other work, a variant of the Kolmogorov-Smirnov test was used for motif discovery [24]. This approach successfully circumvents arbitrary data partition. However, it has other limitations such as the failure to address multiple motif occurrences in a single promoter, and the lack of an exact characterization of the null distribution. Overall, the following four major challenges in motif discovery still require consideration: (c1) the cutoff used to partition data into a target set and background set of sequences is often chosen arbitrarily; (c2) lack of an exact statistical score and p-value for motif enrichment. Current methods typically use arbitrarily set thresholds or simulations, which are inherently limited in precision and costly in terms of running time; (c3) a need for an appropriate framework that accounts for multiple motif occurrences in a single promoter. For example, how should one quantify the significance of a single motif occurrence in a promoter against two motif occurrences in a promoter? Linear models [3] assume that the weight of the latter is double that of the former. However, it is difficult to justify this approach since biological systems do not necessarily operate in such a linear fashion. Another issue related to motif multiplicity is low complexity or repetitive regions. These regions often contain multiple copies of degenerate motifs (e.g., CA repeats). Since the nucleotide frequency underlying these regions substantially deviates from the standard background frequency, they often cause false-motif discoveries. Consequently, most methods mask these regions in the preprocessing stage and thereby lose vital information that might reside therein; (c4) criticism has been made over the fact that motif discovery methods tend to report presumably significant motifs even when applied on randomly generated data [25]. These motifs are clear cases of false positives and should be avoided. Data Lends Itself to Ranking in a Natural Manner In this paper we describe a novel method that attempts to solve the above-mentioned four challenges in a principled manner. It exploits the following observation: data often lends itself to ranking in a natural manner, e.g., ranking sequences according to TF binding signal: ranking according to CpG methylation signal, ranking according to distance in expression space from a set of co-expressed genes, ranking according to differential expression, etc. We exploit this inherent ranking property of biological data in order to circumvent the need for an arbitrary and difficult-to-justify data partition. Consequently, we propose the following formulation of the motif finding task: given a list of ranked sequences, identify motifs that are overabundant at either end of the list. Our solution employs a statistical score termed mHG (minimal hypergeometric) [26]. It is related to the concept of rank-imbalanced motifs, which are sequence motifs that tend to appear at either end of a ranked sequence list. In previous work [26], the authors used mHG to identify sequence motifs in expression data. We use this simple yet powerful approach as the starting point for our study. Overview The rest of this paper is divided into two main parts, each of which is self-contained: in the Results we briefly outline our method and describe new biological findings that were obtained by applying this method to biological data. We address challenge (c4) by testing the algorithm on randomly ranked real genomic sequences. In the Methods, we describe the mHG probabilistic and algorithmic framework and explain how we deal with challenges (c1)–(c3). Results Statistics and Algorithms in a Nutshell Based on the mHG framework, we developed a software tool termed DRIM (discovery of rank imbalanced motifs) for motif identification in DNA sequences. A flow chart of DRIM is provided in Figure 1. The formal introduction and details of the mHG statistics are given in Methods. However, to facilitate the explanation and interpretation of our biological results, we begin with a brief description of the method. Figure 1 DRIM Flow Chart DRIM receives a list of DNA sequences as input and a criterion by which the sequences should be ranked, for example, TF binding signals as measured by ChIP ChIP–chip: (i) The sequences are ranked according to the criterion. (ii) A “blind search” is performed over all the motifs that reside in the restricted motif space (in this study the restricted motif space contains ∼100,000 motifs, see Methods, The DRIM software). For each motif an occurrence vector is generated. Each position in the vector is the number of motif occurrences in the corresponding sequence, (the figure shows the vector for the motif CACGTGW). (iii) The motif significance is computed using the mHG scheme, and the optimal partition into target and background sets in terms of motif enrichment is identified. The promising motif seeds are passed as input to the heuristic motif search model and the rest are filtered out. (iv,v) The motif seeds are expanded in an iterative manner (the mHG is computed in each lap), until a local optimum motif is found. (vi) The exact mHG p-value of the motif is computed. If it has a p-value p. The (0,0) → (N,B) path representing λ visits N distinct grid points (excluding the point (0,0)), representing the N different HGT scores that are considered when calculating its mHG score: mHG(λ) = min1≤n

0 comments Cited 340 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Integrative functional genomic analysis of human brain development and neuropsychiatric risks

Mingfeng Li, Gabriel Santpere, Yuka Imamura Kawasawa … (2019)

To broaden our understanding of human neurodevelopment, we profiled transcriptomic and epigenomic landscapes across brain regions and/or cell types for the entire span of prenatal and postnatal development. Integrative analysis revealed temporal, regional, sex, and cell type–specific dynamics. We observed a global transcriptomic cup-shaped pattern, characterized by a late fetal transition associated with sharply decreased regional differences and changes in cellular composition and maturation, followed by a reversal in childhood-adolescence, and accompanied by epigenomic reorganizations. Analysis of gene coexpression modules revealed relationships with epigenomic regulation and neurodevelopmental processes. Genes with genetic associations to brain-based traits and neuropsychiatric disorders (including MEF2C , SATB2 , SOX5 , TCF4 , and TSHZ3 ) converged in a small number of modules and distinct cell types, revealing insights into neurodevelopment and the genomic basis of neuropsychiatric risks.

0 comments Cited 289 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

BigBrain: an ultrahigh-resolution 3D human brain model.

Katrin Amunts, Claude Lepage, Louis Borgeat … (2013)

Reference brains are indispensable tools in human brain mapping, enabling integration of multimodal data into an anatomically realistic standard space. Available reference brains, however, are restricted to the macroscopic scale and do not provide information on the functionally important microscopic dimension. We created an ultrahigh-resolution three-dimensional (3D) model of a human brain at nearly cellular resolution of 20 micrometers, based on the reconstruction of 7404 histological sections. "BigBrain" is a free, publicly available tool that provides considerable neuroanatomical insight into the human brain, thereby allowing the extraction of microscopic data for modeling and simulation. BigBrain enables testing of hypotheses on optimal path lengths between interconnected cortical regions or on spatial organization of genetic patterning, redefining the traditional neuroanatomy maps such as those of Brodmann and von Economo.

0 comments Cited 272 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Jakob Seidlitz:

ORCID: http://orcid.org/0000-0002-8164-7476

jakob.seidlitz@nih.gov

Armin Raznahan: raznahana@mail.nih.gov

Journal

Journal ID (nlm-ta): Nat Commun

Journal ID (iso-abbrev): Nat Commun

Title: Nature Communications

Publisher: Nature Publishing Group UK (London )

ISSN (Electronic): 2041-1723

Publication date (Electronic): 3 July 2020

Publication date PMC-release: 3 July 2020

Publication date Collection: 2020

Volume: 11

Electronic Location Identifier: 3358

Affiliations

[1 ]ISNI 0000 0004 0464 0574, GRID grid.416868.5, Developmental Neurogenomics Unit, , National Institute of Mental Health, ; Bethesda, MD USA

[2 ]ISNI 0000000121885934, GRID grid.5335.0, Department of Psychiatry, , University of Cambridge, ; Cambridge, UK

[3 ]ISNI 0000 0001 2171 1133, GRID grid.4868.2, School of Mathematical Sciences, , Queen Mary University of London, ; London, UK

[4 ]ISNI 0000 0004 5903 3632, GRID grid.499548.d, The Alan Turing Institute, ; London, UK

[5 ]ISNI 0000 0004 0646 3639, GRID grid.416102.0, McConnell Brain Imaging Centre, , Montreal Neurological Institute and Hospital, ; Montreal, QC Canada

[6 ]ISNI 0000 0004 1936 8649, GRID grid.14709.3b, McGill Centre for Integrative Neuroscience, , McGill University, ; Montreal, QC Canada

[7 ]ISNI 0000 0000 9632 6718, GRID grid.19006.3e, Department of Neurology, Center for Autism Research and Treatment, , Semel Institute, David Geffen School of Medicine, UCLA, ; Los Angeles, CA USA

[8 ]ISNI 0000 0000 9632 6718, GRID grid.19006.3e, Department of Psychiatry and Biobehavioral Sciences, , Semel Institute, David Geffen School of Medicine, UCLA, ; Los Angeles, CA USA

[9 ]ISNI 0000 0000 9632 6718, GRID grid.19006.3e, Department of Human Genetics, , David Geffen School of Medicine, UCLA, ; Los Angeles, CA USA

[10 ]ISNI 0000 0004 0386 9246, GRID grid.267301.1, Departments of Pediatrics and Physiology, , University of Tennessee Health Science Center and Le Bonheur Children’s Foundation Research Institute, ; Memphis, TN USA

[11 ]ISNI 0000 0004 0464 0574, GRID grid.416868.5, Pediatrics and Developmental Neuropsychiatry Branch, , National Institute of Mental Health, NIH, ; Bethesda, MD USA

[12 ]Unit on Metabolism and Neuroendocrinology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD USA

[13 ]ISNI 0000 0001 2181 3113, GRID grid.166341.7, Department of Psychology, , Drexel University, ; Philadelphia, PA USA

[14 ]ISNI 0000 0001 2322 6764, GRID grid.13097.3c, Institute of Psychiatry, , King’s College London, ; London, UK

[15 ]ISNI 0000 0004 0412 9303, GRID grid.450563.1, Cambridgeshire and Peterborough NHS Foundation Trust, ; Huntingdon, UK

Author information

Jakob Seidlitz http://orcid.org/0000-0002-8164-7476

Sarah E. Morgan http://orcid.org/0000-0002-1261-5884

Rafael Romero-Garcia http://orcid.org/0000-0002-5199-4573

François M. Lalonde http://orcid.org/0000-0002-4945-0032

Casey Paquola http://orcid.org/0000-0002-0190-4103

Daniel H. Geschwind http://orcid.org/0000-0003-2896-3450

Declan G. Murphy http://orcid.org/0000-0002-6664-7451

Article

Publisher ID: 17051

DOI: 10.1038/s41467-020-17051-5

PMC ID: 7335069

PubMed ID: 32620757

SO-VID: 233230cd-2bab-40f9-b66b-e32ced680e6c

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 15 November 2019

Date accepted : 11 May 2020

Funding

Funded by: FundRef https://doi.org/10.13039/100000025, U.S. Department of Health & Human Services | NIH | National Institute of Mental Health (NIMH);

Award ID: 89-M-006

Award Recipient : Jakob Seidlitz

Funded by: FundRef https://doi.org/10.13039/100009633, U.S. Department of Health & Human Services | NIH | Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD);

Award ID: 08-CH-213

Award Recipient : Jakob Seidlitz

Custom metadata

ScienceOpen disciplines: Uncategorized

Keywords: cognitive neuroscience,developmental disorders,molecular neuroscience

Data availability:

ScienceOpen disciplines: Uncategorized

Keywords: cognitive neuroscience, developmental disorders, molecular neuroscience

Transcriptomic and cellular decoding of regional brain vulnerability to neurogenetic disorders

Read this article at

Abstract

Abstract

Related collections

NeuroImaging Methods

Most cited references 44

Discovering Motifs in Ranked Lists of DNA Sequences

Integrative functional genomic analysis of human brain development and neuropsychiatric risks

BigBrain: an ultrahigh-resolution 3D human brain model.

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 132

Cited by 72

Most referenced authors 1,750