libpuzzle 0.11

A small library that helps you find similar pictures.

  Add it to your Download Basket!

 Add it to your Watch List!


Rate it!

What's new in libpuzzle 0.11:

  • This release fixes an incorrect assertion with tiny pictures, and the fix for text processing can now be properly enabled in the puzzle-diff tool.
Read full changelog
send us
an update
BSD License 
2.4/5 11
ROOT \ Multimedia \ Graphics
libpuzzle is a small library that helps you find similar pictures.

The Puzzle library is designed to quickly find visually similar images (GIF, PNG, JPG), even if they have been resized, recompressed, recolored or slightly modified.

The library is free, lightweight yet very fast, configurable, easy to use and it has been designed with security in mind.
This is a C library, but is also comes with a command-line tool and PHP bindings.

Sample applications

· finding duplicate images in photo libraries
· image classification
· image search services
· moderation (pictures sent by users on forums, wikis, blogs, etc). Pictures similar to other pictures that were previously banned can be signaled to moderators.

How does it work?

The library is a free implementation of the algorithm published as an image signature for any kind of image by H. Chi Wong, Marshall Bern and David Goldberg.

The first step splits a bitmap picture into blocks. This is a “summary” of the picture, after an initial automatic cropping of featureless borders.

The relationships between adjacent blocks construct a vector (PuzzleCvec), that is the signature of the picture.
The similarity between two pictures can be characterized as the normalized distance between two PuzzleCvec vectors.

Will it work with a database that has millions of pictures?

A typical image signature only requires 182 bytes, using the built-in compression/decompression functions.

Similar signatures share identical “words”, ie. identical sequences of values at the same positions. By using compound indexes (word + position), the set of possible similar vectors is dramatically reduced, and in most cases, no vector distance actually requires to get computed.

Indexing through words and positions also makes it easy to split the data into multiple tables and servers.

So yes, the Puzzle library is certainely not incompatible with projects that need to index millions of pictures.

Last updated on March 25th, 2009


#image search #image duplicates #image classification #pictures #image #duplicates #library

Add your review!