MKSearch beta 1

MKSearch provides a Web metadata spider and search engine.
MKSearch provides a Web metadata spider and search engine.

MKSearch is a metadata search engine that indexes structured metadata in Web documents instead of free text in the document body.

The data acquisition system conforms to the Dublin Core metadata in HTML recommendations, and supports other application profiles, such as the UK e-Government Metadata Standard.

It also indexes native RDF formats, including RSS 1.0. The system has five major components: a Web crawler, an HTML document validator and formatter, a set of custom indexers, an RDF storage and query system, and a public query interface, provided through a standard servlet container.

System composition

The MKSearch system is composed of several other free software components. Further details are provided in the MKSearch development plans.

JSpider is a Java Web crawler engine that has pluggable interfaces that can be used to add custom processing and content handling. MKSearch uses custom SAX-based content handlers for extracting metadata from Web documents.

Sesame is a set of RDF processing and storage APIs and applications that includes RDF data query facilities. MKSearch uses Sesame to store indexed metadata in RDF format and to search the repository via the public query interface.

JTidy is a utility for correcting common HTML markup errors and is used to convert HTML documents to XHTML so they can be processed using SAX.

last updated on:
February 16th, 2007, 13:05 GMT
license type:
GPL (GNU General Public License) 
developed by:
Philip Shaw
ROOT \ Internet \ HTTP (WWW)
Download Button

In a hurry? Add it to your Download Basket!

user rating



Rate it!

Add your review!