Pairagon+N-SCAN_EST: a model-based gene annotation pipeline

Novo Nordisk Foundation
Center for Basic Metabolic Research

Pairagon+N-SCAN_EST: a model-based gene annotation pipeline

Research output: Contribution to journal › Journal article › Research › peer-review

Standard

Pairagon+N-SCAN_EST: a model-based gene annotation pipeline. / Arumugam, Manimozhiyan; Wei, Chaochun; Brown, Randall H; Brent, Michael R.

In: Genome Biology (Online Edition), Vol. 7 Suppl 1, 2006, p. S5.1-10.

Research output: Contribution to journal › Journal article › Research › peer-review

Harvard

Arumugam, M, Wei, C, Brown, RH & Brent, MR 2006, 'Pairagon+N-SCAN_EST: a model-based gene annotation pipeline', Genome Biology (Online Edition), vol. 7 Suppl 1, pp. S5.1-10. https://doi.org/10.1186/gb-2006-7-s1-s5

APA

Arumugam, M., Wei, C., Brown, R. H., & Brent, M. R. (2006). Pairagon+N-SCAN_EST: a model-based gene annotation pipeline. Genome Biology (Online Edition), 7 Suppl 1, S5.1-10. https://doi.org/10.1186/gb-2006-7-s1-s5

Vancouver

Arumugam M, Wei C, Brown RH, Brent MR. Pairagon+N-SCAN_EST: a model-based gene annotation pipeline. Genome Biology (Online Edition). 2006;7 Suppl 1:S5.1-10. https://doi.org/10.1186/gb-2006-7-s1-s5

Author

Arumugam, Manimozhiyan ; Wei, Chaochun ; Brown, Randall H ; Brent, Michael R. / Pairagon+N-SCAN_EST: a model-based gene annotation pipeline. In: Genome Biology (Online Edition). 2006 ; Vol. 7 Suppl 1. pp. S5.1-10.

Bibtex

@article{e7262336260f4a02943b727c5131849c,

title = "Pairagon+N-SCAN_EST: a model-based gene annotation pipeline",

abstract = "This paper describes Pairagon+N-SCAN_EST, a gene annotation pipeline that uses only native alignments. For each expressed sequence it chooses the best genomic alignment. Systems like ENSEMBL and ExoGean rely on trans alignments, in which expressed sequences are aligned to the genomic loci of putative homologs. Trans alignments contain a high proportion of mismatches, gaps, and/or apparently unspliceable introns, compared to alignments of cDNA sequences to their native loci. The Pairagon+N-SCAN_EST pipeline's first stage is Pairagon, a cDNA-to-genome alignment program based on a PairHMM probability model. This model relies on prior knowledge, such as the fact that introns must begin with GT, GC, or AT and end with AG or AC. It produces very precise alignments of high quality cDNA sequences. In the genomic regions between Pairagon's cDNA alignments, the pipeline combines EST alignments with de novo gene prediction by using N-SCAN_EST. N-SCAN_EST is based on a generalized HMM probability model augmented with a phylogenetic conservation model and EST alignments. It can predict complete transcripts by extending or merging EST alignments, but it can also predict genes in regions without EST alignments. Because they are based on probability models, both Pairagon and N-SCAN_EST can be trained automatically for new genomes and data sets.",

keywords = "Base Sequence, Computational Biology, DNA, Complementary, Expressed Sequence Tags, Genes, Genome, Human, Genomics, Humans, Models, Statistical, Open Reading Frames, Phylogeny, RNA, Messenger, Sequence Alignment, Software",

author = "Manimozhiyan Arumugam and Chaochun Wei and Brown, {Randall H} and Brent, {Michael R}",

year = "2006",

doi = "10.1186/gb-2006-7-s1-s5",

language = "English",

volume = "7 Suppl 1",

pages = "S5.1--10",

journal = "Genome Biology (Online Edition)",

issn = "1474-7596",

publisher = "BioMed Central Ltd.",

}

RIS

TY - JOUR

T1 - Pairagon+N-SCAN_EST: a model-based gene annotation pipeline

AU - Arumugam, Manimozhiyan

AU - Wei, Chaochun

AU - Brown, Randall H

AU - Brent, Michael R

PY - 2006

Y1 - 2006

N2 - This paper describes Pairagon+N-SCAN_EST, a gene annotation pipeline that uses only native alignments. For each expressed sequence it chooses the best genomic alignment. Systems like ENSEMBL and ExoGean rely on trans alignments, in which expressed sequences are aligned to the genomic loci of putative homologs. Trans alignments contain a high proportion of mismatches, gaps, and/or apparently unspliceable introns, compared to alignments of cDNA sequences to their native loci. The Pairagon+N-SCAN_EST pipeline's first stage is Pairagon, a cDNA-to-genome alignment program based on a PairHMM probability model. This model relies on prior knowledge, such as the fact that introns must begin with GT, GC, or AT and end with AG or AC. It produces very precise alignments of high quality cDNA sequences. In the genomic regions between Pairagon's cDNA alignments, the pipeline combines EST alignments with de novo gene prediction by using N-SCAN_EST. N-SCAN_EST is based on a generalized HMM probability model augmented with a phylogenetic conservation model and EST alignments. It can predict complete transcripts by extending or merging EST alignments, but it can also predict genes in regions without EST alignments. Because they are based on probability models, both Pairagon and N-SCAN_EST can be trained automatically for new genomes and data sets.

AB - This paper describes Pairagon+N-SCAN_EST, a gene annotation pipeline that uses only native alignments. For each expressed sequence it chooses the best genomic alignment. Systems like ENSEMBL and ExoGean rely on trans alignments, in which expressed sequences are aligned to the genomic loci of putative homologs. Trans alignments contain a high proportion of mismatches, gaps, and/or apparently unspliceable introns, compared to alignments of cDNA sequences to their native loci. The Pairagon+N-SCAN_EST pipeline's first stage is Pairagon, a cDNA-to-genome alignment program based on a PairHMM probability model. This model relies on prior knowledge, such as the fact that introns must begin with GT, GC, or AT and end with AG or AC. It produces very precise alignments of high quality cDNA sequences. In the genomic regions between Pairagon's cDNA alignments, the pipeline combines EST alignments with de novo gene prediction by using N-SCAN_EST. N-SCAN_EST is based on a generalized HMM probability model augmented with a phylogenetic conservation model and EST alignments. It can predict complete transcripts by extending or merging EST alignments, but it can also predict genes in regions without EST alignments. Because they are based on probability models, both Pairagon and N-SCAN_EST can be trained automatically for new genomes and data sets.

KW - Base Sequence

KW - Computational Biology

KW - DNA, Complementary

KW - Expressed Sequence Tags

KW - Genes

KW - Genome, Human

KW - Genomics

KW - Humans

KW - Models, Statistical

KW - Open Reading Frames

KW - Phylogeny

KW - RNA, Messenger

KW - Sequence Alignment

KW - Software

U2 - 10.1186/gb-2006-7-s1-s5

DO - 10.1186/gb-2006-7-s1-s5

M3 - Journal article

C2 - 16925839

VL - 7 Suppl 1

SP - S5.1-10

JO - Genome Biology (Online Edition)

JF - Genome Biology (Online Edition)

SN - 1474-7596

ER -

ID: 43976012

Novo Nordisk Foundation Center for Basic Metabolic Research

Pairagon+N-SCAN_EST: a model-based gene annotation pipeline

Standard

Harvard

APA

Vancouver

Author

Bibtex

RIS