Softpedia
 


LINUX CATEGORIES:



GLOBAL PAGES >>
NEWS ARCHIVE >>
SOFTPEDIA REVIEWS >>
MEET THE EDITORS >>
WEEK'S BEST
  • Linux Kernel 3.9.6 / 3....
  • Linux Kernel 3.0.82 LTS...
  • KDE Software Compilatio...
  • PulseAudio 4.0
  • Wireshark 1.10.0
  • NetworkManager 0.9.8.2
  • LibreOffice 3.6.6 / 4.0...
  • SystemRescueCd 3.7.0
  • Linux Kernel 3.10 RC6
  • Ubuntu Tweak 0.8.5
  • Home > Linux > Programming > Perl Modules

    WWW::Scraper::Lite 15

    Download button

    No screenshots available
    Downloads: 130  View global page NEW!  Tell us about an update
    User Rating:
    Rated by:
    NOT RATED
    0 user(s)
    Developer:

    License / Price:

    Last Updated:

    Category:
    Roger Pettett | More programs
    GPL v3 / FREE
    December 30th, 2011, 02:40 GMT
    ROOT / Programming / Perl Modules

     Read user reviews (0)  Refer to a friend  Subscribe

    WWW::Scraper::Lite description

    A framework for scraping results from search engines

    WWW::Scraper::Lite is a HTTP scraper module written in Perl.

    SYNOPSIS

     my $domain = 'http://devsite.local/';
     my $scraper = WWW::Scraper::Lite->new();
     $scraper->crawl($domain,
     {
     '//a' => sub { # handler for all 'a' tags
     my ($scraper, $nodes) = @_;
     $scraper->enqueue(grep { $_ =~ m{^$domain} } # only this domain
     map { $scraper->url_remove_anchor($_) } # only index pages without #anchor
     map { $scraper->url_make_absolute($_) } # indexer needs absolute URLs
     map { $_->{href} } # pull href out of the 'a' DOM node
     @{$nodes});
     },
     '/*' => sub { # handler for all content
     my ($scraper, $nodes) = @_;
     print $scraper->{current}->{response}->content; # do something useful with HTTP response
     },
     }
     );



    Product's homepage

    Requirements:

    · Perl
    · strict
    · warnings
    · LWP::UserAgent
    · HTML::TreeBuilder::XPath

      


    TAGS:

    domain scraper | HTTP scraper | Perl module | Perl | domain | HTTP

    Go to top

    WindowsGamesDriversMacLinuxScriptsMobileHandheldNews

    SUBMIT PROGRAM   |   ADVERTISE   |   GET HELP   |   SEND US FEEDBACK   |   RSS FEEDS   |   UPDATE YOUR SOFTWARE   |   ROMANIAN FORUM