MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
MagicalRsq-X : A cross-cohort transferable genotype imputation quality metric. / Sun, Quan; Yang, Yingxi; Rosen, Jonathan D; Chen, Jiawen; Li, Xihao; Guan, Wyliena; Jiang, Min-Zhi; Wen, Jia; Pace, Rhonda G; Blackman, Scott M; Bamshad, Michael J; Gibson, Ronald L; Cutting, Garry R; O'Neal, Wanda K; Knowles, Michael R; Kooperberg, Charles; Reiner, Alexander P; Raffield, Laura M; Carson, April P.; Rich, Stephen S.; Rotter, Jerome I.; Loos, Ruth J F; Kenny, Eimear; Jaeger, Byron C; Min, Yuan-I; Fuchsberger, Christian; Li, Yun.
In: American Journal of Human Genetics, Vol. 111, No. 5, 2024, p. 990-995.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - MagicalRsq-X
T2 - A cross-cohort transferable genotype imputation quality metric
AU - Sun, Quan
AU - Yang, Yingxi
AU - Rosen, Jonathan D
AU - Chen, Jiawen
AU - Li, Xihao
AU - Guan, Wyliena
AU - Jiang, Min-Zhi
AU - Wen, Jia
AU - Pace, Rhonda G
AU - Blackman, Scott M
AU - Bamshad, Michael J
AU - Gibson, Ronald L
AU - Cutting, Garry R
AU - O'Neal, Wanda K
AU - Knowles, Michael R
AU - Kooperberg, Charles
AU - Reiner, Alexander P
AU - Raffield, Laura M
AU - Carson, April P.
AU - Rich, Stephen S.
AU - Rotter, Jerome I.
AU - Loos, Ruth J F
AU - Kenny, Eimear
AU - Jaeger, Byron C
AU - Min, Yuan-I
AU - Fuchsberger, Christian
AU - Li, Yun
N1 - Copyright © 2024 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.
AB - Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.
KW - Humans
KW - Polymorphism, Single Nucleotide
KW - Software
KW - Genotype
KW - Gene Frequency
KW - Cohort Studies
KW - Linkage Disequilibrium
KW - Genome-Wide Association Study/methods
KW - Genome, Human
KW - Quality Control
KW - Machine Learning
KW - Whole Genome Sequencing/standards
U2 - 10.1016/j.ajhg.2024.04.001
DO - 10.1016/j.ajhg.2024.04.001
M3 - Journal article
C2 - 38636510
VL - 111
SP - 990
EP - 995
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
SN - 0002-9297
IS - 5
ER -
ID: 392988387