HTML::LinkExtractor can extract links from an HTML document.
HTML::LinkExtractor is used for extracting links from HTML. It is very similar to HTML::LinkExtor, except that besides getting the URL, you also get the link-text.
Example ( please run the examples ):
use HTML::LinkExtractor;
use Data::Dumper;
my $input = q{If < a href="http://perl.com/" > I am a LINK!!! < /a >};
my $LX = new HTML::LinkExtractor();
$LX->parse($input);
print Dumper($LX->links);
__END__
# the above example will yield
$VAR1 = [
{
'_TEXT' => '< a href="http://perl.com/" > I am a LINK!!! < /a >',
'href' => bless(do{(my $o = 'http://perl.com/')}, 'URI::http'),
'tag' => 'a'
}
];
HTML::LinkExtractor will also correctly extract nested link-type tags.
Product's homepage
Requirements:
· Perl