MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

MagicalRsq-X : A cross-cohort transferable genotype imputation quality metric. / Sun, Quan; Yang, Yingxi; Rosen, Jonathan D; Chen, Jiawen; Li, Xihao; Guan, Wyliena; Jiang, Min-Zhi; Wen, Jia; Pace, Rhonda G; Blackman, Scott M; Bamshad, Michael J; Gibson, Ronald L; Cutting, Garry R; O'Neal, Wanda K; Knowles, Michael R; Kooperberg, Charles; Reiner, Alexander P; Raffield, Laura M; Carson, April P.; Rich, Stephen S.; Rotter, Jerome I.; Loos, Ruth J F; Kenny, Eimear; Jaeger, Byron C; Min, Yuan-I; Fuchsberger, Christian; Li, Yun.

In: American Journal of Human Genetics, Vol. 111, No. 5, 2024, p. 990-995.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Sun, Q, Yang, Y, Rosen, JD, Chen, J, Li, X, Guan, W, Jiang, M-Z, Wen, J, Pace, RG, Blackman, SM, Bamshad, MJ, Gibson, RL, Cutting, GR, O'Neal, WK, Knowles, MR, Kooperberg, C, Reiner, AP, Raffield, LM, Carson, AP, Rich, SS, Rotter, JI, Loos, RJF, Kenny, E, Jaeger, BC, Min, Y-I, Fuchsberger, C & Li, Y 2024, 'MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric', American Journal of Human Genetics, vol. 111, no. 5, pp. 990-995. https://doi.org/10.1016/j.ajhg.2024.04.001

APA

Sun, Q., Yang, Y., Rosen, J. D., Chen, J., Li, X., Guan, W., Jiang, M-Z., Wen, J., Pace, R. G., Blackman, S. M., Bamshad, M. J., Gibson, R. L., Cutting, G. R., O'Neal, W. K., Knowles, M. R., Kooperberg, C., Reiner, A. P., Raffield, L. M., Carson, A. P., ... Li, Y. (2024). MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric. American Journal of Human Genetics, 111(5), 990-995. https://doi.org/10.1016/j.ajhg.2024.04.001

Vancouver

Sun Q, Yang Y, Rosen JD, Chen J, Li X, Guan W et al. MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric. American Journal of Human Genetics. 2024;111(5):990-995. https://doi.org/10.1016/j.ajhg.2024.04.001

Author

Sun, Quan ; Yang, Yingxi ; Rosen, Jonathan D ; Chen, Jiawen ; Li, Xihao ; Guan, Wyliena ; Jiang, Min-Zhi ; Wen, Jia ; Pace, Rhonda G ; Blackman, Scott M ; Bamshad, Michael J ; Gibson, Ronald L ; Cutting, Garry R ; O'Neal, Wanda K ; Knowles, Michael R ; Kooperberg, Charles ; Reiner, Alexander P ; Raffield, Laura M ; Carson, April P. ; Rich, Stephen S. ; Rotter, Jerome I. ; Loos, Ruth J F ; Kenny, Eimear ; Jaeger, Byron C ; Min, Yuan-I ; Fuchsberger, Christian ; Li, Yun. / MagicalRsq-X : A cross-cohort transferable genotype imputation quality metric. In: American Journal of Human Genetics. 2024 ; Vol. 111, No. 5. pp. 990-995.

Bibtex

@article{886fc95a1b8e481ab742d1c434cf325b,
title = "MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric",
abstract = "Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.",
keywords = "Humans, Polymorphism, Single Nucleotide, Software, Genotype, Gene Frequency, Cohort Studies, Linkage Disequilibrium, Genome-Wide Association Study/methods, Genome, Human, Quality Control, Machine Learning, Whole Genome Sequencing/standards",
author = "Quan Sun and Yingxi Yang and Rosen, {Jonathan D} and Jiawen Chen and Xihao Li and Wyliena Guan and Min-Zhi Jiang and Jia Wen and Pace, {Rhonda G} and Blackman, {Scott M} and Bamshad, {Michael J} and Gibson, {Ronald L} and Cutting, {Garry R} and O'Neal, {Wanda K} and Knowles, {Michael R} and Charles Kooperberg and Reiner, {Alexander P} and Raffield, {Laura M} and Carson, {April P.} and Rich, {Stephen S.} and Rotter, {Jerome I.} and Loos, {Ruth J F} and Eimear Kenny and Jaeger, {Byron C} and Yuan-I Min and Christian Fuchsberger and Yun Li",
note = "Copyright {\textcopyright} 2024 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.",
year = "2024",
doi = "10.1016/j.ajhg.2024.04.001",
language = "English",
volume = "111",
pages = "990--995",
journal = "American Journal of Human Genetics",
issn = "0002-9297",
publisher = "Cell Press",
number = "5",

}

RIS

TY - JOUR

T1 - MagicalRsq-X

T2 - A cross-cohort transferable genotype imputation quality metric

AU - Sun, Quan

AU - Yang, Yingxi

AU - Rosen, Jonathan D

AU - Chen, Jiawen

AU - Li, Xihao

AU - Guan, Wyliena

AU - Jiang, Min-Zhi

AU - Wen, Jia

AU - Pace, Rhonda G

AU - Blackman, Scott M

AU - Bamshad, Michael J

AU - Gibson, Ronald L

AU - Cutting, Garry R

AU - O'Neal, Wanda K

AU - Knowles, Michael R

AU - Kooperberg, Charles

AU - Reiner, Alexander P

AU - Raffield, Laura M

AU - Carson, April P.

AU - Rich, Stephen S.

AU - Rotter, Jerome I.

AU - Loos, Ruth J F

AU - Kenny, Eimear

AU - Jaeger, Byron C

AU - Min, Yuan-I

AU - Fuchsberger, Christian

AU - Li, Yun

N1 - Copyright © 2024 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

PY - 2024

Y1 - 2024

N2 - Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.

AB - Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.

KW - Humans

KW - Polymorphism, Single Nucleotide

KW - Software

KW - Genotype

KW - Gene Frequency

KW - Cohort Studies

KW - Linkage Disequilibrium

KW - Genome-Wide Association Study/methods

KW - Genome, Human

KW - Quality Control

KW - Machine Learning

KW - Whole Genome Sequencing/standards

U2 - 10.1016/j.ajhg.2024.04.001

DO - 10.1016/j.ajhg.2024.04.001

M3 - Journal article

C2 - 38636510

VL - 111

SP - 990

EP - 995

JO - American Journal of Human Genetics

JF - American Journal of Human Genetics

SN - 0002-9297

IS - 5

ER -

ID: 392988387