Softpedia
 


LINUX CATEGORIES:



GLOBAL PAGES >>
NEWS ARCHIVE >>
SOFTPEDIA REVIEWS >>
MEET THE EDITORS >>
WEEK'S BEST
  • Linux Kernel 3.9.3 / 3....
  • LibreOffice 3.6.6 / 4.0.3
  • MPlayer 1.1.1
  • systemd 204
  • Arch Linux 2013.05.01
  • Blender 2.67
  • KDE Software Compilatio...
  • CrunchBang Linux Stable...
  • Elementary OS 0.1 / 0.2...
  • SystemRescueCd 3.6.0
  • Home > Linux > Text Editing&Processing > Markup

    MWParserFromHell 0.1.1

    Download button

    No screenshots available
    Downloads: 92  Tell us about an update
    User Rating:
    Rated by:
    NOT RATED
    0 user(s)
    Developer:

    License / Price:

    Last Updated:

    Category:
    Ben Kurtovic | More programs
    MIT/X Consortium Lic... / FREE
    September 24th, 2012, 00:38 GMT
    ROOT / Text Editing&Processing / Markup

     Read user reviews (0)  Refer to a friend  Subscribe

    MWParserFromHell description

    A parser for MediaWiki wikicode

    MWParserFromHell is a Python package that provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode. It supports Python 2 and Python 3.

    Developed by Earwig with help from Σ.

    Installation

    The easiest way to install the parser is through the Python Package Index, so you can install the latest release with pip install mwparserfromhell (get pip). Alternatively, get the latest development version:

    git clone git://github.com/earwig/mwparserfromhell.git
    cd mwparserfromhell
    python setup.py install


    You can run the comprehensive unit testing suite with python setup.py test.

    Usage

    Normal usage is rather straightforward (where text is page text):

    >>> import mwparserfromhell
    >>> wikicode = mwparserfromhell.parse(text)


    wikicode is a mwparserfromhell.wikicode.Wikicode object, which acts like an ordinary unicode object (or str in Python 3) with some extra methods. For example:

    >>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?"
    >>> wikicode = mwparserfromhell.parse(text)
    >>> print wikicode
    I has a template! {{foo|bar|baz|eggs=spam}} See it?
    >>> templates = wikicode.filter_templates()
    >>> print templates
    ['{{foo|bar|baz|eggs=spam}}']
    >>> template = templates[0]
    >>> print template.name
    foo
    >>> print template.params
    ['bar', 'baz', 'eggs=spam']
    >>> print template.get(1).value
    bar
    >>> print template.get("eggs").value
    spam


    Since every node you reach is also a Wikicode object, it's trivial to get nested templates:

    >>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")
    >>> print code.filter_templates()
    ['{{foo|this {{includes a|template}}}}']
    >>> foo = code.filter_templates()[0]
    >>> print foo.get(1).value
    this {{includes a|template}}
    >>> print foo.get(1).value.filter_templates()[0]
    {{includes a|template}}
    >>> print foo.get(1).value.filter_templates()[0].get(1).value
    template


    Additionally, you can include nested templates in filter_templates() by passing recursive=True:

    >>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}"
    >>> mwparserfromhell.parse(text).filter_templates(recursive=True)
    ['{{foo|{{bar}}={{baz|{{spam}}}}}}', '{{bar}}', '{{baz|{{spam}}}}', '{{spam}}']


    Templates can be easily modified to add, remove, alter or params. Wikicode can also be treated like a list with append(), insert(), remove(), replace(), and more:

    >>> text = "{{cleanup}} '''Foo''' is a [[bar]]. {{uncategorized}}"
    >>> code = mwparserfromhell.parse(text)
    >>> for template in code.filter_templates():
    ... if template.name == "cleanup" and not template.has_param("date"):
    ... template.add("date", "July 2012")
    ...
    >>> print code
    {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{uncategorized}}
    >>> code.replace("{{uncategorized}}", "{{bar-stub}}")
    >>> print code
    {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
    >>> print code.filter_templates()
    ['{{cleanup|date=July 2012}}', '{{bar-stub}}']


    You can then convert code back into a regular unicode object (for saving the page!) by calling unicode() on it:

    >>> text = unicode(code)
    >>> print text
    {{cleanup|date=July 2012}} '''Foo''' is a [[bar]]. {{bar-stub}}
    >>> text == code
    True


    Likewise, use str(code) in Python 3.

    Integration

    mwparserfromhell is used by and originally developed for EarwigBot; Page objects have a parse method that essentially calls mwparserfromhell.parse() on page.get().

    If you're using PyWikipedia, your code might look like this:

    import mwparserfromhell
    import wikipedia as pywikibot
    def parse(title):
     site = pywikibot.get_site()
     page = pywikibot.Page(site, title)
     text = page.get()
     return mwparserfromhell.parse(text)


    If you're not using a library, you can parse templates in any page using the following code (via the API):

    import json
    import urllib
    import mwparserfromhell
    API_URL = "http://en.wikipedia.org/w/api.php"
    def parse(title):
     raw = urllib.urlopen(API_URL, data).read()
     res = json.loads(raw)
     text = res["query"]["pages"].values()[0]["revisions"][0]["*"]
     return mwparserfromhell.parse(text)



    Product's homepage

    Requirements:

    · Python

      


    TAGS:

    MediaWiki wikicode | MediaWiki parser | Python | MediaWiki | wikicode

    Go to top

    WindowsGamesDriversMacLinuxScriptsMobileHandheldNews

    SUBMIT PROGRAM   |   ADVERTISE   |   GET HELP   |   SEND US FEEDBACK   |   RSS FEEDS   |   UPDATE YOUR SOFTWARE   |   ROMANIAN FORUM