Efficient approaches for large-scale GWAS with genotype uncertainty
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Efficient approaches for large-scale GWAS with genotype uncertainty. / Jørsboe, Emil; Albrechtsen, Anders.
In: G3: Genes, Genomes, Genetics (Bethesda), Vol. 12, No. 1, jkab385, 2022.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Efficient approaches for large-scale GWAS with genotype uncertainty
AU - Jørsboe, Emil
AU - Albrechtsen, Anders
PY - 2022
Y1 - 2022
N2 - Association studies using genetic data from SNP-chip-based imputation or low-depth sequencing data provide a cost-efficient design for large-scale association studies. We explore methods for performing association studies applicable to such genetic data and investigate how using different priors when estimating genotype probabilities affects the association results. Our proposed method, ANGSD-asso's latent model, models the unobserved genotype as a latent variable in a generalized linear model framework. The software is implemented in C/C++ and can be run multi-threaded. ANGSD-asso is based on genotype probabilities, which can be estimated using either the sample allele frequency or the individual allele frequencies as a prior. We explore through simulations how genotype probability-based methods compare with using genetic dosages. Our simulations show that in a structured population using the individual allele frequency prior has better power than the sample allele frequency. In scenarios with sequencing depth and phenotype correlation ANGSD-asso's latent model has higher statistical power and less bias than using dosages. Adding additional covariates to the linear model of ANGSD-asso's latent model has higher statistical power and less bias than other methods that accommodate genotype uncertainty, while also being much faster. This is shown with imputed data from UK Biobank and simulations.
AB - Association studies using genetic data from SNP-chip-based imputation or low-depth sequencing data provide a cost-efficient design for large-scale association studies. We explore methods for performing association studies applicable to such genetic data and investigate how using different priors when estimating genotype probabilities affects the association results. Our proposed method, ANGSD-asso's latent model, models the unobserved genotype as a latent variable in a generalized linear model framework. The software is implemented in C/C++ and can be run multi-threaded. ANGSD-asso is based on genotype probabilities, which can be estimated using either the sample allele frequency or the individual allele frequencies as a prior. We explore through simulations how genotype probability-based methods compare with using genetic dosages. Our simulations show that in a structured population using the individual allele frequency prior has better power than the sample allele frequency. In scenarios with sequencing depth and phenotype correlation ANGSD-asso's latent model has higher statistical power and less bias than using dosages. Adding additional covariates to the linear model of ANGSD-asso's latent model has higher statistical power and less bias than other methods that accommodate genotype uncertainty, while also being much faster. This is shown with imputed data from UK Biobank and simulations.
KW - admixture
KW - association mapping
KW - case-control study
KW - next-generation sequencing
KW - quantitative traits
KW - GENOME-WIDE ASSOCIATION
KW - POPULATION STRATIFICATION
KW - IMPUTATION
KW - REGRESSION
U2 - 10.1093/g3journal/jkab385
DO - 10.1093/g3journal/jkab385
M3 - Journal article
C2 - 34865001
VL - 12
JO - G3: Genes, Genomes, Genetics (Bethesda)
JF - G3: Genes, Genomes, Genetics (Bethesda)
SN - 2160-1836
IS - 1
M1 - jkab385
ER -
ID: 291215962