Softpedia
 


LINUX CATEGORIES:



GLOBAL PAGES >>
NEWS ARCHIVE >>
SOFTPEDIA REVIEWS >>
MEET THE EDITORS >>
WEEK'S BEST
  • Linux Kernel 3.9.6 / 3....
  • Linux Kernel 3.0.82 LTS...
  • KDE Software Compilatio...
  • PulseAudio 4.0
  • Wireshark 1.10.0
  • NetworkManager 0.9.8.2
  • LibreOffice 3.6.6 / 4.0...
  • SystemRescueCd 3.7.0
  • Linux Kernel 3.10 RC6
  • Ubuntu Tweak 0.8.5
  • Home > Linux > Programming > Perl Modules

    Web::Scraper 0.36

    Download button

    No screenshots available
    Downloads: 297  View global page NEW!  Tell us about an update
    User Rating:
    Rated by:
    NOT RATED
    0 user(s)
    Developer:

    License / Price:

    Last Updated:

    Category:
    Tatsuhiko Miyagawa | More programs
    Perl Artistic License / FREE
    December 14th, 2011, 10:14 GMT
    ROOT / Programming / Perl Modules

     Read user reviews (0)  Refer to a friend  Subscribe

    Web::Scraper description

    Web Scraping Toolkit using HTML and CSS Selectors or XPath expressions

    Web::Scraper is a web scraper toolkit, inspired by Ruby's equivalent Scrapi. It provides a DSL-ish interface for traversing HTML documents and returning a neatly arranged Perl data strcuture.

    The scraper and process blocks provide a method to define what segments of a document to extract. It understands CSS and HTML Selectors as well as XPath expressions.

    SYNOPSIS

     use URI;
     use Web::Scraper;

     # First, create your scraper block
     my $tweets = scraper {
     # Parse all LIs with the class "status", store them into a resulting
     # array 'tweets'. We embed another scraper for each tweet.
     process "li.status", "tweets[]" => scraper {
     # And, in that array, pull in the elementy with the class
     # "entry-content", "entry-date" and the link
     process ".entry-content", body => 'TEXT';
     process ".entry-date", when => 'TEXT';
     process 'a[rel="bookmark"]', link => '@href';
     };
     };

     my $res = $tweets->scrape( URI->new("http://twitter.com/miyagawa") );

     # The result has the populated tweets array
     for my $tweet (@{$res->{tweets}}) {
     print "$tweet->{body} $tweet->{when} (link: $tweet->{link})\n";
     }



    Product's homepage

    Requirements:

    · Perl

      


    TAGS:

    web scraper | Perl module | web | scraper | toolkit

    Go to top

    WindowsGamesDriversMacLinuxScriptsMobileHandheldNews

    SUBMIT PROGRAM   |   ADVERTISE   |   GET HELP   |   SEND US FEEDBACK   |   RSS FEEDS   |   UPDATE YOUR SOFTWARE   |   ROMANIAN FORUM