YACY is a distributed Web crawler and also a caching HTTP/HTTPS proxy. Pages that pass through the proxy are indexed and can be searched using a built-in HTTP server.
YACY peers connect each other and form a P2P-based index exchange network based on distributed hash tables. Explicit web crawls can be done locally or collaboratively, forming a global search and distributed indexing engine for the Web.
YACY also provides URL filtering with blacklist sharing among other proxy peers, individual Web and servlet page hosting, a file sharing zone, and a database engine.
Product's homepage
Here are some key features of "YACY":
· Search your own or the global index
· Crawl your own pages or start distributed crawling
· Run your peer to support other YaCy crawlers
· Provide Information on your peer using the built-in http-server, file-sharing zone and wiki
· Built-in caching http proxy
· Indexing benefits from the proxy cache; private information is not stored or indexed
· Usage of the proxy is not a requisite for web indexing, but it enables you to access the new top-level-domains '.yacy'
· Filter unwanted content like ad- or spyware; share your web-blacklist with other peers
· Easy installation! No additional database required!
· No central server!
· GPL'ed, freeware
What's New in This Release: [ read full changelog ]
· YaCy now has an embedded Solr 4.0.0 with the standard Solr XML search interface integrated.
· This is the primary indexing engine now.
· There is now an enhanced crawler with live link structure visualization.
· This release adds a Host Browser to explore the file structure of crawled hosts.
· It shows loaded pages, pages with errors, and pending files in the same way a file browser would show the contents of a host.