Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner. / Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan; Brent, Michael R.

In: Bioinformatics, Vol. 25, No. 13, 07.2009, p. 1587-1593.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Lu, DV, Brown, RH, Arumugam, M & Brent, MR 2009, 'Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.', Bioinformatics, vol. 25, no. 13, pp. 1587-1593. https://doi.org/10.1093/bioinformatics/btp273

APA

Lu, D. V., Brown, R. H., Arumugam, M., & Brent, M. R. (2009). Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner. Bioinformatics, 25(13), 1587-1593. https://doi.org/10.1093/bioinformatics/btp273

Vancouver

Lu DV, Brown RH, Arumugam M, Brent MR. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner. Bioinformatics. 2009 Jul;25(13):1587-1593. https://doi.org/10.1093/bioinformatics/btp273

Author

Lu, David V ; Brown, Randall H ; Arumugam, Manimozhiyan ; Brent, Michael R. / Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner. In: Bioinformatics. 2009 ; Vol. 25, No. 13. pp. 1587-1593.

Bibtex

@article{51249aab0e90477b8342a2e41399ad41,
title = "Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.",
abstract = "MOTIVATION: The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics. RESULTS: We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. AVAILABILITY: Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/",
keywords = "Animals, Base Sequence, Complementary, Complementary: chemistry, DNA, Genomics, Genomics: methods, Humans, Markov Chains, Mice, Rats, Sequence Alignment, Sequence Alignment: methods",
author = "Lu, {David V} and Brown, {Randall H} and Manimozhiyan Arumugam and Brent, {Michael R}",
year = "2009",
month = jul,
doi = "10.1093/bioinformatics/btp273",
language = "English",
volume = "25",
pages = "1587--1593",
journal = "Computer Applications in the Biosciences",
issn = "1471-2105",
publisher = "Oxford University Press",
number = "13",

}

RIS

TY - JOUR

T1 - Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.

AU - Lu, David V

AU - Brown, Randall H

AU - Arumugam, Manimozhiyan

AU - Brent, Michael R

PY - 2009/7

Y1 - 2009/7

N2 - MOTIVATION: The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics. RESULTS: We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. AVAILABILITY: Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/

AB - MOTIVATION: The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics. RESULTS: We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. AVAILABILITY: Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/

KW - Animals

KW - Base Sequence

KW - Complementary

KW - Complementary: chemistry

KW - DNA

KW - Genomics

KW - Genomics: methods

KW - Humans

KW - Markov Chains

KW - Mice

KW - Rats

KW - Sequence Alignment

KW - Sequence Alignment: methods

U2 - 10.1093/bioinformatics/btp273

DO - 10.1093/bioinformatics/btp273

M3 - Journal article

VL - 25

SP - 1587

EP - 1593

JO - Computer Applications in the Biosciences

JF - Computer Applications in the Biosciences

SN - 1471-2105

IS - 13

ER -

ID: 43976320