Duke is a cross-platform software written in Java designed as flexible and fast deduplication (or record linkage, or entity resolution) engine.
Product's homepage
Here are some key features of "Duke":
· High performance.
· Highly configurable.
· Support for CSV, JDBC, SPARQL, and NTriples DataSources.
· Many built-in comparators.
· Plug in your own data sources, comparators, and cleaners.
· Command-line client for getting started.
· API for embedding into any kind of application.
· Support for batch processing and continuous processing.
· Can maintain database of links found via JNDI/JDBC.
Requirements:
· Java 2 Standard Edition Runtime Environment
What's New in This Release: [ read full changelog ]
· Support for multi-threading, an upgrade to Lucene 4.0, higher performance, more comparators, more cleaners, major improvements to the command line client, and more.