WebGraph 3.0.2

WebGraph is a framework to study the web graph.
WebGraph is a framework to study the web graph. WebGraph provides simple ways to manage very large graphs, exploiting modern compression techniques. More precisely, it is currently made of:

1. A set of flat codes, called codes, which are particularly suitable for storing web graphs (or, in general, integers with power-law distribution in a certain exponent range). The fact that these codes work well can be easily tested empirically, but we also try to provide a detailed mathematical analysis.
2. Algorithms for compressing web graphs that exploit gap compression and referentiation (� la LINK), intervalisation and codes to provide a high compression ratio: for instance, the WebBase graph (2001 crawl) is compressed at 3.08 bits per link, and a snapshot of about 18,500,000 pages of the .uk domain gathered by UbiCrawler is compressed at 2.22 bits per link (the corresponding figures for the transposed graphs are 2.89 bits per link and 1.98 bits per link). The algorithms are controlled by several parameters, which provide different tradeoffs between access speed and compression ratio.
3. Algorithms for accessing a compressed graph without actually decompressing it, using lazy techniques that delay the decompression until it is actually necessary.
4. A complete, documented implementation of the algorithms above in Java, contained in the package it.unimi.dsi.webgraph. Besides a clearly defined API, the package contains several classes that allow to modify (e.g., transpose) or recompress a graph, so to experiment with various settings. The package relies on fastutil for a type-specific, high-performance collections framework, on MG4J for bit-level I/O, on the COLT distribution for ready-to-use, efficient algorithms and on GNU getopt for line-command parsing.
5. Data sets for very large graph (e.g., a billion of links). These are either gathered from public sources (such as WebBase), or produced by UbiCrawler.

In the end, with WebGraph you can access and analyse a very large web graph, even on a PC with as little as 256 Mbytes of RAM. Using WebGraph is as easy as installing a few jar files and downloading a data set. This makes studying phenomena such as PageRank, distribution of graph properties of the web graph, etc. very easy.

last updated on:
February 25th, 2012, 12:40 GMT
license type:
GPL (GNU General Public License) 
developed by:
Sebastiano Vigna
ROOT \ Internet \ HTTP (WWW)
Download Button

In a hurry? Add it to your Download Basket!

user rating 15



Rate it!
What's New in This Release:
  • This version adds several improvements to HyperANF, and a few bugfixes.
  • WebGraph can now be found on Maven Central.
read full changelog

Add your review!