Burrows-Wheeler Aligner 0.6.1

An efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome.
Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome.

The software implements two algorithms, bwa-short and BWA-SW. The former works for query sequences shorter than 200bp and the latter for longer sequences up to around 100kbp.

Both algorithms do gapped alignment. They are usually more accurate and faster on queries with low error rates. Please see the BWA manual page for more information.

Does BWA align 454 reads?

 Yes and no. The BWA-SW component of BWA works well on 454 reads about 200bp or longer. It achieves similar alignment accuracy to SSAHA2 while much faster. BWA-SW also works for shorter reads, but the sensitivity is lower. In addition, BWA-SW does not support paired-end alignment.

What is maximum query sequence length in alignment?

 It is recommended to only use bwa-short on reads shorter than 200bp. Although bwa-short works for up to a few kbp query in principle, its performance is degraded. For long reads, BWA-SW is better.

 The BWA-SW component can align a BAC sequence (about 150kbp) against the human genome. The speed in terms of aligned bases per time unit is comparable to the speed of 1kbp read alignment. In principle, BWA-SW should be able to align a few Mbp query sequence at a similar speed, but I have not tried.

What is the tolerance of sequencing errors?

 Bwa-short is mainly designed for sequencing error rates below 2%. Although users can ask it to tolerate more errors by tuning command-line options, its performance is quickly degraded. Note that for Illumina reads, bwa-short may optionally trim low-quality bases from the 3'-end before alignment and thus is able to align more reads with high error rate in the tail, which is typical to Illumina data.

 BWA-SW tolerates more errors given longer alignment. Simulation suggests that BWA-SW may work well given 2% error for an 100bp alignment, 3% error for a 200bp, 5% for 500bp and 10% for 1000bp or longer alignment.

Does BWA find chimeric reads?

 Yes, the BWA-SW component is able to find chimera. BWA usually reports one alignment for each read but may output two or more alignments if the read/contig is a chimera.

Does BWA call SNPs like MAQ?

 No, BWA only does alignment. Nonetheless, it outputs alignments in the SAM format which is supported by several generic SNP callers such as samtools and GATK.

I see one read in a pair has high mapping quality, but the other read has zero. Is this right?

 This is correct. Mapping quality is assigned for individual read, not for a read pair. It is possible that one read can be mapped unambiguously, but its mate falls in a tandom repeat and thus its accurate position cannot be determined.

I see a read stands out the end of a chromosome and is flagged as unmapped (flag 0x4). What is happening here?

 Internally BWA concatenates all reference sequences into one long sequence. A read may be mapped to the junction of two adjacent reference sequences. In this case, BWA will flag the read as unmapped, but you will see position, CIGAR and all the tags. A better solution would be to choose an alternative position or trim the alignment out of the end, but this is quite complicated in programming and is not implemented at the moment.

Does BWA work on reference sequences longer than 4GB in total?

 No, this is not possible and will not be supported in the near future due to the technical complexity involved.

Errata

The suffix array interval of an empty string should [0,n-1] where n is the length of database string, not [1,n-1] as is stated in Li and Durbin (2009 and 2010). Correspondingly, we need to define O(a,-1)=0 and revise the pseudocode in Figure 3 from Li and Durbin (2009). BWA implementation is actually correct. The mistake only occurs to the paper. We apologize for the confusion and thank Nils Homer and Abel Antonio Carrion Collado for pointing this out.

last updated on:
May 4th, 2012, 6:52 GMT
price:
FREE!
developed by:
Li H. and Durbin R
license type:
GPL v3 
category:
ROOT \ Science and Engineering \ Bioinformatics

FREE!

In a hurry? Add it to your Download Basket!

user rating

UNRATED
0.0/5
 

0/5

What's New in This Release:
  • Bugfix: duplicated alternative hits in the XA tag.
  • Bugfix: when trimming enabled, bwa-aln trims 1bp less.
  • Disabled the color-space alignment. 0.6.x is not working with SOLiD reads at present.
  • Bugfix: segfault due to excessive ambiguous bases.
read full changelog

Add your review!

SUBMIT