The scripts in the DAWGPAWS suite are designed to facilitate high throughput generation of computational results for both geneDAWGPAWS is a suite of scripts written in Perl that are desinged to assist a Distributed Annotation Working Group (DAWG) in the sequence annotation of BAC sized contigs. Since this suite of software was initially written to annotate randomly sampled wheat BACs, it is refered to as a Pipeline to Annoate Wheat Sequnces (PAWS). Although these programs were initially designed for wheat, the scripts can be applied to nearly any eukaryotic sequence annoation pipeline.
The scripts in the DAWGPAWS suite are designed to facilitate high throughput generation of computational results for both gene annotation and transposable element annotation using a number of existing sequence anotation programs. These results are converted from the native output of the individual annotation programs to the standard GFF file format. Since a GFF file is a simple tab delmited text file, multiple computational results for a single sequence can easily be concatenated together using standard text maninpulation tools such as cat; proof that cats and dawgs can play well together. DAWGPAWS provides additional tools to facilitate using the Apollo Genome Annotation Curation Tool for visualization and curation of the computational results.
Usage of DAWGPAWS is through a series of command line programs. A command line interface was chosen since these programs will usually be run remotely on high preformance machines running some version of the Unix/Linux operating system. Since a command line inteface has been written for all programs, no special knowledge of Perl is required to use this suite. However, since some the Perl scripts rely on external Perl modules, a working knowledge of Perl will be helpful when installing DAWGPAWS.
How to Cite DAWGPAWS
A manuscript has been submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program.
JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/
Supported Annotation Programs
DAWGPAWS includes scripts to work with computational evidences from the following standalone programs and web sites. You can get information for the following programs by clicking on the annotation program name, and you can get the documentation for the DAWGPAWS script by clicking on the script name. For full information on installing and using these programs, see the DAWGPAWS manual.
These annotation programs are not included in the DAWGPAWS program and must be installed separately.
Here are some key features of "DAWGPAWS":
· FASTA file manipulation to prepare sequences for the annotation pipeline
· Annotation of gaps in the sequence assemblies
· Executing ab initio gene anotation and transposable element annotation programs in a high throughput pipeline
· BLAST pipeline suitable for use in a cluster computing framework
· Conversion of output from annotation software to the common GFF file format
· Conversion of gff files to the game.xml format
Help is available in all command line programs using command line flags. Help info and full manual files are still under development for some programs. To access help where progname is the name of the individual program:
· progname --usage
· generate a basic usage message with info on the required arguments
· progname --help
· generate a more extensive help message that includes required arguments and all options
· progname --man
· view the full manual for the program
· Manual pages for the individual programs are available on the DAWGPAWS web site and are including in the Release-1.0 download of the DAWGPAWS suite.