VoDoo/Stream is a software that provides hight level expressive and extensible formalism for transducers for any kind of format. It was mainly based on three major paradigms. First one was a stream layer for tokenization. An automata layer for recognitions. Last one was a rule based document transformation based on stream and automata.
VoDoo/Stream project is based on three concepts:
# First one inspired by event-based programming style like SAX or generic lexer in Objective-Caml provides a stream based for data denotation.
# Second one provides expressive and classical automata in order to match and recognize patterns when analyzing streams.
# The last one was a hight level structuration of automata done in order to provide expressive mechanism for data transformation.
Finally a XSLT like language is defined in order to express data transformations.
Stream was a simple formalism based on opening and closing a level, labels and text. Using this simple grammar we provide a simple tree (XML for example) stream denotation (XML was given by a dedicate SAX handler). Current supported formats are XML and free text. More formalisms can be supported and done using stream extension facility. A stream interpreation was provided for Document Object Model. Then a stream can manipulate either a pure text, an ad-hoc stream and a DOM based data.
In comparison the STAX approach was a low level XML matching integration based on token stream representation of XML fragments. The Stream representation used with classical switch/case conditional structure is similar to STAX approach but such integration is two low level and do not provide an expressive layer for XML management and was in fact at the same level than SAX.
Automata for Stream recognition
Automata provides a hight level for pattern recognition and variable binding. It produces DAG with specific attributes for variable denotations. Such automata is able to find or also to match a given stream. An automata was built using a given stream containing extended formalism including pattern like repetition, any kind of label or text and choice. Such stream was analysed in order to given a direct acyclic graph used for the automata generation (classical approach).
Transducer for Stream transformation
Transducers are in fact ordered set of rules. A rule has a selection part and a body. A selection can deal with pathes (tree visitor) and current entity. A first entity was the tree node and selection can be done filtering its name or attributes. A second entity was the string which can be filtered using usual pattern matching. A body was a piece of java code which is able to continue parsing or not (recursive descent).
Transducer Stream Processor language: XSP
What's New in This Release: [ read full changelog ]
· Namespace support has been designed. Then matching can be done using
· batch element name and/or the corresponding namespace.
· Review of the transformation process without compatibility problem
· with transducers written with previous versions. This change
· increase the expressivity and the stream management possibilities.
· Then it is possible to dispatch analyses like any LL parser catching
· element with content filter and sibling content.
· Location added in order to easily track error when parsing XML file
· or any kind of document. Now each document as a location maintained
· during transducing operations and can be used to link locations.
· XSP extension to XML synthesis and manipulation providing an XML to
· XML transformation paradigm.
· JEM rewritten using the last improvements done for the parsing and
· extension for embedded XML term.