VoDoo/Stream 1.3

Expressive and extensible formalism for transducers

  Add it to your Download Basket!

 Add it to your Watch List!


Rate it!

What's new in VoDoo/Stream 1.3:

  • Namespace support has been designed. Then matching can be done using
  • batch element name and/or the corresponding namespace.
  • Review of the transformation process without compatibility problem
  • with transducers written with previous versions. This change
Read full changelog
send us
an update
GPL (GNU General Public License) 
Didier Plaindoux
ROOT \ Text Editing&Processing \ Markup
VoDoo/Stream is a software that provides hight level expressive and extensible formalism for transducers for any kind of format. It was mainly based on three major paradigms. First one was a stream layer for tokenization. An automata layer for recognitions. Last one was a rule based document transformation based on stream and automata.

VoDoo/Stream project is based on three concepts:

- Transducers
- Automata
- Stream

# First one inspired by event-based programming style like SAX or generic lexer in Objective-Caml provides a stream based for data denotation.
# Second one provides expressive and classical automata in order to match and recognize patterns when analyzing streams.
# The last one was a hight level structuration of automata done in order to provide expressive mechanism for data transformation.

Finally a XSLT like language is defined in order to express data transformations.

Stream representation

Stream was a simple formalism based on opening and closing a level, labels and text. Using this simple grammar we provide a simple tree (XML for example) stream denotation (XML was given by a dedicate SAX handler). Current supported formats are XML and free text. More formalisms can be supported and done using stream extension facility. A stream interpreation was provided for Document Object Model. Then a stream can manipulate either a pure text, an ad-hoc stream and a DOM based data.

In comparison the STAX approach was a low level XML matching integration based on token stream representation of XML fragments. The Stream representation used with classical switch/case conditional structure is similar to STAX approach but such integration is two low level and do not provide an expressive layer for XML management and was in fact at the same level than SAX.

Automata for Stream recognition

Automata provides a hight level for pattern recognition and variable binding. It produces DAG with specific attributes for variable denotations. Such automata is able to find or also to match a given stream. An automata was built using a given stream containing extended formalism including pattern like repetition, any kind of label or text and choice. Such stream was analysed in order to given a direct acyclic graph used for the automata generation (classical approach).

Transducer for Stream transformation

Transducers are in fact ordered set of rules. A rule has a selection part and a body. A selection can deal with pathes (tree visitor) and current entity. A first entity was the tree node and selection can be done filtering its name or attributes. A second entity was the string which can be filtered using usual pattern matching. A body was a piece of java code which is able to continue parsing or not (recursive descent).

Transducer Stream Processor language: XSP

Finally a transducer language - called XSP - expressed in XML is defined. This language has a bootstrap definition in XML (only for XML and text transformation for the moment). Such XSP definition was extended in order to provide rules supporting code written in languages providing a BSF handler (Javascript, Beanshell, jRuby, Jython ...).

Last updated on October 16th, 2008

#define transducers #document analysis #transducers #document #analyzer #automata

Add your review!