scRUBYt! 0.3.4

scRUBYt! is a simple to learn and use, yet very powerful web extraction framework in Ruby.

  Add it to your Download Basket!

 Add it to your Watch List!


Rate it!
send us
an update
GPL (GNU General Public License) 
3.3/5 18
Peter Szinek and David Krmpotic
ROOT \ Internet \ HTTP (WWW)
scRUBYt! is a simple to learn and use, yet very powerful web extraction framework in Ruby. Navigate through the Web, Extract, query, transform and save data from the Web page of interest by the concise and easy to use DSL provided by scRUBYt!

About Ruby:

Ruby is a dynamic, reflective, general purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was initially developed and designed by Yukihiro "Matz" Matsumoto.

Ruby supports multiple programming paradigms (including functional, object oriented and imperative), and features a dynamic type system and automatic memory management; it is therefore similar in varying respects to Python, Perl, Lisp, Dylan, and CLU.

In its current, official implementation, written in C, Ruby is a single-pass interpreted language. As there is currently no specification of the Ruby language, this implementation is considered the de facto reference. As of 2008, there are a number of alternative implementations of the Ruby language, including Rubinius, JRuby, YARV, and IronRuby, each of which takes a different approach, with JRuby providing just-in-time compilation functionality.

The language was created by Yukihiro Matsumoto, who started working on Ruby on February 24, 1993, and released it to the public in 1995. "Ruby" was named as a gemstone because of a joke within Matsumoto's circle of friends alluding to the name of the Perl programming language.

As of December 2007, the latest stable version of the reference implementation is 1.8.6. Apart from the reference, several other virtual machines are being developed for Ruby. These include JRuby, a port of Ruby to the Java platform, IronRuby, an implementation for the .NET Framework produced by Microsoft, and Rubinius, an interpreter modeled after self-hosting Smalltalk virtual machines.



What's New in This Release:

Script pattern; possibility to evaluate custom function on the input of the pattern
Constant pattern; Can add constant patterns with the syntax: pattern 'Hello world'
:type => :constant
Text pattern; foundation for new output method: to_flat_xml for creating feed-like flat XMLs instead of hierarchical ones
to_flat_xml with spec delimiters splits up the concatenated hash results
Change in the semantics of the "div[stuff]" style examples
divs which contain "stuff" (rather than their whole text is "stuff") are matched
generalization is false by default
Possibility to define arbitrary delimiter for to_hash (used when the result contains commas)
Changes in the logging module: (Credit: Tim Fletcher)
Extract the logging into a class to allow for filtering
Allow the logger to be set to nil (to disable logging), and have this as the default.
Logging now has to be explicitly enabled, as follows:
Scrubyt.logger =
Allow loggers to point to streams other than STDERR. Add comments, unit tests, and todos.
Changes in the download pattern:
possibility to specify an array of files that should be ignored during the downloading (e.g. 'nopicture.gif')
Handling timeout during downloads instead of crashing
Fixed downloading in case the filename contains no '.'
Fixed downloading for more URL types that were not working before
New option: example_type. Possibility to force example type (instead of leaving it to scRUBYt! to guess)
Entirely new test suite using rcov; Tests are added continously; The goal is to achieve full coverage
Fixed the infamous regexp bug which caused the pricegrabber scenario (among other things) to fail
Do not evaluate the detail pattern twice
Fixed dependencies (namely parse_tree_reloaded)

Last updated on May 18th, 2008

#web extractor #extraction framework #Ruby library #web #extraction #framework #data

Add your review!