2.4.1 GPL (GNU General Public License)    
  not rated
A PEG Parser-Interpreter in Python





Python is a nice scripting language. It even gives you access to it's own parser and compiler. It also gives you access to different other parsers for special purposes like XML and string templates.

But sometimes you may want to have your own parser. This is what's pyPEG for.

To get a quick view on what's happening, please read this article on how to parse an arbitrary language to XML with pyPEG on my blog.

What is PEG?

PEG means Parsing Expression Grammar. It's something like the idea of Regular Expressions for context free languages; a very clear explanation you'll find in the Wikipedia article about PEG.

With PEGs you can describe the same languages like with BNF (and they're even similar).

What is a Parser-Interpreter?

Common parsers are not using PEGs and top-down parsing, but LR(n) or LL(n) and bottom-up parsing. This results in the idea of implementing parser generators.

Because with LR(n) or LL(n) parsers you need to calculate out a DFA first, usually you let the parser generator do this for you. The result is a parser implementation for your BNF grammar, which was the input. One could call a parser generator a compiler from BNF to a parser implementation.

A Parser-Interpreter does work as an interpreter instead of being such a compiler. Just give your grammar as input, and it parses the described language out of text. There will be no program generated.

Using pyPEG

That means: using pyPEG is very easy ;-) If you know regular expressions already, you will learn to use pyPEG quickly.

A small sample

An example: think of a simple language like this one:

function fak(n) {
 if (n==0) { // 0! is 1 by definition
 return 1;
 } else {
 return n * fak(n - 1);

A pyPEG for that language looks like the following code (see also the sample script):

def comment(): return [re.compile(r"//.*"), re.compile("/*.*?*/", re.S)]
def literal(): return re.compile(r'd*.d*|d+|".*?"')
def symbol(): return re.compile(r"w+")
def operator(): return re.compile(r"+|-|*|/|==")
def operation(): return symbol, operator, [literal, functioncall]
def expression(): return [literal, operation, functioncall]
def expressionlist(): return expression, -1, (",", expression)
def returnstatement(): return keyword("return"), expression
def ifstatement(): return keyword("if"), "(", expression, ")", block, keyword("else"), block
def statement(): return [ifstatement, returnstatement], ";"
def block(): return "{", -2, statement, "}"
def parameterlist(): return "(", symbol, -1, (",", symbol), ")"
def functioncall(): return symbol, "(", expressionlist, ")"
def function(): return keyword("function"), symbol, parameterlist, block
def simpleLanguage(): return function
Last updated on June 3rd, 2012

0 User reviews so far.