Recent Changes

2010-10-24

Latest File Release

genewaltz (1.0)2010-07-10 18:42

Wiki Guide

Side Bar

GeneWaltz Wiki

GeneWaltzは遺伝子発見をするプログラムです。 従来の遺伝子発見プログラムで検出される遺伝子には、実際は遺伝子でないものも含まれていました。 そこを改良しました。

Background

Identifying protein-coding regions in genomic sequences is an essential step in genome analysis. It is well known that the proportion of false positives among genes predicted by current methods is high, especially when the exons are short. These false positives are problematic because they waste time and resources of experimental studies.

Methods

We developed GeneWaltz, a new filtering method that reduces the risk of false positives in gene finding. GeneWaltz utilizes a codon-to-codon substitution matrix that was constructed by comparing protein-coding regions from orthologous gene pairs between mouse and human genomes. Using this matrix, a scoring scheme was developed; it assigned higher scores to coding regions and lower scores to non-coding regions. The regions with high scores were considered candidate coding regions. One-dimensional Karlin-Altschul statistics was used to test the significance of the coding regions identified by GeneWaltz.

Results

The proportion of false positives among genes predicted by GENSCAN and Twinscan were high, especially when the exons were short. GeneWaltz 3 significantly reduced the ratio of false positives to all positives predicted by GENSCAN and Twinscan, especially when the exons were short.

Conclusions

GeneWaltz will be helpful in experimental genomic studies. GeneWaltz binaries and the matrix are available online at http://en.sourceforge.jp/projects/genewaltz/.

See details on

http://www.biodatamining.org/content/3/1/6