Exome sequencing of 20,979 individuals with epilepsy reveals shared and distinct ultra-rare genetic risk across disorder subtypes#

Authors#

Epi25 Collaborative, Patrick May

Abstract#

Identifying genetic risk factors for highly heterogeneous disorders such as epilepsy remains challenging. Here we present, to our knowledge, the largest whole-exome sequencing study of epilepsy to date, with more than 54,000 human exomes, comprising 20,979 deeply phenotyped patients from multiple genetic ancestry groups with diverse epilepsy subtypes and 33,444 controls, to investigate rare variants that confer disease risk. These analyses implicate seven individual genes, three gene sets and four copy number variants at exome-wide significance. Genes encoding ion channels show strong association with multiple epilepsy subtypes, including epileptic encephalopathies and generalized and focal epilepsies, whereas most other gene discoveries are subtype specific, highlighting distinct genetic contributions to different epilepsies. Combining results from rare single-nucleotide/short insertion and deletion variants, copy number variants and common variants, we offer an expanded view of the genetic architecture of epilepsy, with growing evidence of convergence among different genetic risk loci on the same genes. Top candidate genes are enriched for roles in synaptic transmission and neuronal excitability, particularly postnatally and in the neocortex. We also identify shared rare variant risk between epilepsy and other neurodevelopmental disorders. Our data can be accessed via an interactive browser, hopefully facilitating diagnostic efforts and accelerating the development of follow-up studies.

Code availability#

Code availability No custom code was used in this study. For sequence data genera- tion, we used GATK version 3.4 and version 3.6 (GATK nightly-2015-07- 31-g3c929b0, 3.4-89-ge494930 and 3.6-0-g89b7209), Picard version 1.1431 and VerifyBamlD version 1.0.0. Sample and variant QC was per- formed using functions in Hail 0.1 and 0.2 (website: https://www.hail.is; documentation: https://hail.is/docs/0.1/ and https://hail.is/docs/0.2/; GitHub repository: https://github.com/hail-is/hail). Variant annotation was performed using the Ensembl Variant Effect Predictor (VEP) version 85 tool as implemented in Hail 0.1 with the LOFTEE annotation provided as default (https://github.com/konradjk/loftee/tree/27b0040f524348baa7f3257flce58993529e09ef/). For phenotyping data, case record forms were hosted on the REDCap platform version 14 and entered into the Epi25 data repository (https://github.com/Epi25/epi25-edc). For gene burden analysis, we used the R (version 3.6.1) package logistf version 1.26.0 (https://cran.r-project.org/web/packages/logistf/index.html) to implement the Firth regression model. Additional processing and visualization were performed using R functions in the tidyverse library version l.3.0 (https://www.tidyverse.org/packages/).

Data availability#

We provide summary-level data at the variant and gene level in an online browser for visualization and download (https://epi25.broadinstitute.org/). There are no restrictions on the aggregated data released on the browser. Full results from the exome-wide burden analysis are also available in Supplementary Data 1 and 4. WES data from Epi25 cohorts are available via the NHGRI’s controlled-access AnVIL platform (https://anvilproject.org/; dbGaP accession number: phs001489). Data avail- ability of non-Epi25 control cohorts is provided in the Supplementary Information. Source data are provided with this paper. Publicly available datasets analyzed in this study include: Gene family: https://zenodo.org/records/3582386 CORUM protein complexes: https://mips.helmholtz-muenchen.de/corum/ Protein Data Bank: https://www.rcsb.org/ (Structure analyzed in Fig. 3c: https://www.rcsb.org/structure/6x3z) BrainSpan: https://www.brainspan.org/ Gene Ontology: https://geneontology.org/ ChEA3: https://maayanlab.cloud/chea3/