Softpedia
 


LINUX CATEGORIES:



GLOBAL PAGES >>
NEWS ARCHIVE >>
SOFTPEDIA REVIEWS >>
MEET THE EDITORS >>
WEEK'S BEST
  • Linux Kernel 3.9.3 / 3....
  • LibreOffice 3.6.6 / 4.0.3
  • MPlayer 1.1.1
  • systemd 204
  • Arch Linux 2013.05.01
  • Blender 2.67
  • KDE Software Compilatio...
  • CrunchBang Linux Stable...
  • Elementary OS 0.1 / 0.2...
  • SystemRescueCd 3.6.0
  • Home > Linux > Programming > Libraries

    Text::Scraper 0.02

    Download button

    No screenshots available
    Downloads: 400  View global page NEW!  Tell us about an update
    User Rating:
    Rated by:
    NOT RATED
    0 user(s)
    Developer:

    License / Price:

    Last Updated:

    Category:
    Chris McEwan | More programs
    Perl Artistic License / FREE
    August 22nd, 2007, 22:05 GMT
    ROOT / Programming / Libraries

     Read user reviews (0)  Refer to a friend  Subscribe

    Text::Scraper description

    Text::Scraper contains structured data from (un)structured text.

    Text::Scraper contains structured data from (un)structured text.

    SYNOPSIS

    use Text::Scraper;

    use LWP::Simple;
    use Data::Dumper;

    #
    # 1. Get our template and source text
    #
    my $tmpl = Text::Scraper->slurp(*DATA);
    my $src = get('http://search.cpan.org/recent') || die $!;

    #
    # 2. Extract data from source
    #
    my $obj = Text::Scraper->new(tmpl => $tmpl);
    my $data = $obj->scrape($src);

    #
    # 3. Do something really neat...(left as excercise)
    #
    print "Newest Submission: ", $data->[0]{submissions}[0]{name}, "nn";
    print "Scraper model:n", Dumper($obj), "nn";
    print "Parsed model:n", Dumper($data) , "nn";

    __DATA__

    < div class=path>< center>< table>< tr>
    < ?tmpl stuff pre_nav ?>
    < td class=datecell>< span>< big>< b> < ?tmpl var date_string ?> < /b>< /big>< /span>< /td>
    < ?tmpl stuff post_nav ?>
    < /tr>< /table>< /center>< /div>

    < ul>
    < ?tmpl loop submissions ?>
    < li>< a href="< ?tmpl var link ?>">< ?tmpl var name ?>< /a>
    < ?tmpl if has_description ?>
    < small> -- < ?tmpl var description ?>< /small>
    < ?tmpl end has_description ?>
    < /li>
    < ?tmpl end submissions ?>
    < /ul>

    ABSTRACT

    Text::Scraper provides a fully functional base-class to quickly develop Screen-Scrapers and other text extraction tools. Programmatically generated text such as dynamic webpages are trivially reversed engineered.

    Using templates, the programmer is freed from staring at fragile, heavily escaped regular expressions, mapping capture groups to named variables or wrestling with the DOM and badly formed HTML. In addition, extracted data can be hierarchical, which is beyond the capabilities of vanilla regular expressions.

    Text::Scraper's functionality overlaps some existing CPAN modules - Template::Extract and WWW::Scraper.
    Text::Scraper is much more lightweight than either and has a more general application domain than the latter. It has no dependencies on other frameworks, modules or design-decisions. On average, Text::Scraper benchmarks around 250% faster than Template::Extract - and uses significantly less memory.

    Unlike both existing modules, Text::Scraper generalizes its functionality to allow the programmer to refine template capture groups beyond (.*?), fully redefine the template syntax and introduce new template constructs bound to custom classes.

    Product's homepage

    Requirements:

    · Perl

      


    TAGS:

    structured data | text scraper | Perl module | scraper | structured | text

    Go to top

    WindowsGamesDriversMacLinuxScriptsMobileHandheldNews

    SUBMIT PROGRAM   |   ADVERTISE   |   GET HELP   |   SEND US FEEDBACK   |   RSS FEEDS   |   UPDATE YOUR SOFTWARE   |   ROMANIAN FORUM