Bio::Grep is a Perl extension for searching in DNA and Protein sequences.
SYNOPSIS
use Bio::Grep;
my $sbe = Bio::Grep->new('Vmatch');
# define the location of the suffix arrays
$sbe->settings->datapath('data');
mkdir($sbe->settings->datapath);
# now generate a suffix array. you have to do this only once.
$sbe->generate_database({
file => 'ATH1.cdna',
description => 'AGI Transcripts',
});
# search in this suffix array
$sbe->settings->database('ATH1.cdna');
# search for the reverse complement and allow 2 mismatches
$sbe->settings->query('UGAACAGAAAG');
$sbe->settings->reverse_complement(1);
$sbe->settings->mismatches(2);
# or you can use Fasta file with queries
# $sbe->settings->query_file('Oligos.fasta');
# $sbe->search();
# Alternatively, you can specify the settings in the search call.
# This also resets everything except the paths and the database
# (because it is likely that they don't change when search is called
# multiple times)
$sbe->search( { query => 'AGAGCCCT',
reverse_complement => 1,
mismatches => 1,
});
my @ids;
# output some informations!
while ( my $res = $sbe->next_res ) {
print $res->sequence->id . "
";
print $res->alignment_string() . "
";
push @ids, $res->sequence_id;
}
# get the gene sequences of all matches as Bio::SeqIO object.
# (to generate a Fasta file for example)
my $seqio = $sbe->get_sequences(@ids);
Bio-Grep is a collection of Perl modules for searching in DNA and Protein sequences. It supports different back-ends, most importantly some (enhanced) suffix array implementations. Currently, there is no suffix array tool that works in all scenarios (for example whole genome, protein and RNA data). Bio::Grep provides a common API to the most popular tools. This way, you can easily switch or combine tools.
Product's homepage
Requirements:
· Perl