Trans-ancestry polygenic models for the prediction of LDL blood levels: an analysis in UK Biobank and Taiwan Biobank#
Authors#
Emadeldin Saeed Fathy Sayed Hassanin, Ko-Han Lee, Tzung-Chien Hsieh, Rana Aldisi, Yi-Lun Lee, Dheeraj Reddy Bobbili, Peter Krawitz, Patrick May, Chien-Yu Chen, Carlo Maj
Abstract#
Background Polygenic risk scores (PRS) are proposed for use in clinical and research settings for risk stratification. PRS predictions often show bias toward the population of available genome-wide association studies, which is typically of European ancestry. This study aims to assess the performance differences of ancestry-specific PRS and test the implementation of multi-ancestry PRS to enhance the generalizability of LDL cholesterol predictions in the East Asian population
Methods We computed ancestry-specific and multi-ancestry PRS for LDL using data from the global lipid consortium while accounting for population-specific linkage disequilibrium patterns using PRS-CSx method. We first conducted an ancestry-wide analysis using the UK Biobank dataset (n=423,596) and then applied the same models to the Taiwan Biobank dataset (TWB, n=68,978). PRS performances were based on linear regression with adjustment for age, sex, and principal components. PRS strata were considered to assess the extent to which a PRS categorization can stratify individuals for LDL cholesterol levels in East Asian samples.
Results Population-specific PRS better predicted LDL levels within the target population but multi-ancestry PRS were more generalizable. In the TWB dataset, covariate-adjusted R2 values were 9.3% for ancestry-specific PRS, 6.7% for multi-ancestry PRS, and 4.5% for European-specific PRS. Similar trend (8.6%, 7.8%, 6.2%) were observed in the smaller East Asian population of the UK Biobank (n=1,480). Consistently with the R2 values, PRS stratification in East Asians (TWB) effectively captured an heterogenous variability in LDL blood cholesterol levels across PRS strata. The mean difference in LDL levels between the lowest and highest EAS_PRS deciles was 0.82, compared to 0.59 for EUR_PRS and 0.76 for multi-ancestry PRS. Notably, the mean LDL values in the top decile of multi-ancestry PRS were comparable to those of EAS_PRS (3.543 vs. 3.541, P=0.86).
Conclusions Our analysis of the PRS prediction model for LDL cholesterol further supports the issue of PRS generalizability across populations. Our targeted analysis on the East Asian (EAS) population revealed that integrating non-European genotyping data, accounting for population-specific linkage disequilibrium, and considering meta-analyses of non-European-based GWAS alongside with powerful European-based GWAS, can enhance the generalizability of LDL PRS.
Source code#
The scripts used to analyse the data are available from LCSB GitLab.
Data availability#
The generated ancestry-specific and multi-ancestry PRS weights for LDL (excluding UK Biobank samples) are available on Zenodo.