Parse::MediaWikiDump is a Perl module with tools to process MediaWiki dump files.
SYNOPSIS
use Parse::MediaWikiDump;
$source = 'dump_filename.ext';
$source = *FILEHANDLE;
$pages = Parse::MediaWikiDump::Pages->new($source);
$links = Parse::MediaWikiDump::Links->new($source);
#get all the records from the dump files, one record at a time
while(defined($page = $pages->next)) {
print "title '", $page->title, "' id ", $page->id, "n";
}
while(defined($link = $links->next)) {
print "link from ", $link->from, " to ", $link->to, "n";
}
#information about the page dump file
$pages->sitename;
$pages->base;
$pages->generator;
$pages->case;
$pages->namespaces;
$pages->current_byte;
$pages->size;
#information about a page record
$page->redirect;
$page->categories;
$page->title;
$page->namespace;
$page->id;
$page->revision_id;
$page->timestamp;
$page->username;
$page->userid;
$page->minor;
$page->text;
#information about a link
$link->from;
$link->to;
$link->namespace;
This module provides the tools needed to process the contents of various MediaWiki dump files.
Product's homepage
Requirements:
· Perl