scRUBYt! is a simple to learn and use, yet very powerful web extraction framework in Ruby. Navigate through the Web, Extract, query, transform and save data from the Web page of interest by the concise and easy to use DSL provided by scRUBYt!
Ruby is a dynamic, reflective, general purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was initially developed and designed by Yukihiro "Matz" Matsumoto.
Ruby supports multiple programming paradigms (including functional, object oriented and imperative), and features a dynamic type system and automatic memory management; it is therefore similar in varying respects to Python, Perl, Lisp, Dylan, and CLU.
In its current, official implementation, written in C, Ruby is a single-pass interpreted language. As there is currently no specification of the Ruby language, this implementation is considered the de facto reference. As of 2008, there are a number of alternative implementations of the Ruby language, including Rubinius, JRuby, YARV, and IronRuby, each of which takes a different approach, with JRuby providing just-in-time compilation functionality.
The language was created by Yukihiro Matsumoto, who started working on Ruby on February 24, 1993, and released it to the public in 1995. "Ruby" was named as a gemstone because of a joke within Matsumoto's circle of friends alluding to the name of the Perl programming language.
As of December 2007, the latest stable version of the reference implementation is 1.8.6. Apart from the reference, several other virtual machines are being developed for Ruby. These include JRuby, a port of Ruby to the Java platform, IronRuby, an implementation for the .NET Framework produced by Microsoft, and Rubinius, an interpreter modeled after self-hosting Smalltalk virtual machines.
What's New in This Release:
· Script pattern; possibility to evaluate custom function on the input of the pattern
· Constant pattern; Can add constant patterns with the syntax: pattern 'Hello world'
· :type => :constant
· Text pattern; foundation for new output method: to_flat_xml for creating feed-like flat XMLs instead of hierarchical ones
· to_flat_xml with spec delimiters splits up the concatenated hash results
· Change in the semantics of the "div[stuff]" style examples
· divs which contain "stuff" (rather than their whole text is "stuff") are matched
· generalization is false by default
· Possibility to define arbitrary delimiter for to_hash (used when the result contains commas)
· Changes in the logging module: (Credit: Tim Fletcher)
· Extract the logging into a class to allow for filtering
· Allow the logger to be set to nil (to disable logging), and have this as the default.
· Logging now has to be explicitly enabled, as follows:
· Scrubyt.logger = Scrubyt::Logger.new
· Allow loggers to point to streams other than STDERR. Add comments, unit tests, and todos.
· Changes in the download pattern:
· possibility to specify an array of files that should be ignored during the downloading (e.g. 'nopicture.gif')
· Handling timeout during downloads instead of crashing
· Fixed downloading in case the filename contains no '.'
· Fixed downloading for more URL types that were not working before
· New option: example_type. Possibility to force example type (instead of leaving it to scRUBYt! to guess)
· Entirely new test suite using rcov; Tests are added continously; The goal is to achieve full coverage
· Fixed the infamous regexp bug which caused the pricegrabber scenario (among other things) to fail
· Do not evaluate the detail pattern twice
· Fixed dependencies (namely parse_tree_reloaded)