combineExport is a Perl module to export records in XML from Combine database.
SYNOPSIS
combineExport --jobname [--profile alvis|dc|combine --charset utf8|isolatin --number --recordid < n > --md5 < MD5 > --incremental --xsltscript ...]
OPTIONS AND ARGUMENTS
jobname is used to find the appropriate configuration (mandatory)
--profile
Three profiles: alvis, dc, and combine . alvis and combine are similar XML formats.
'alvis' profile format is defined by the Alvis enriched document format DTD. It uses charset UTF-8 per default.
'combine' is more compact with less redundancy.
'dc' is XML encoded Dublin Core data.
--charset
Selects a specific characterset from UTF-8, iso-latin-1 Overrides --profile settings.
--collapseinlinks
Skip inlinks with duplicate anchor-texts (ie just one inlink per unique anchor-text).
--nooutlinks
Do not include any outlinks in the exported records.
--ZebraIndex
ZebraIndex sends XML records directly to the Zebra server defined in Combine configuration variable 'ZebraHost'. It uses the default Zebra configuration: profile=combine, nooutlinks, collapseinlinks and is compatible with the direct Zebra indexing done during harvesting when 'ZebraHost' is defined in the Combine configuration. Requires that the Zebra server is running.
--SolrIndex
SolrIndex sends XML records directly to the Solr server defined in Combine configuration variable 'SolrHost'. It uses the default Solr configuration: profile=combine, nooutlinks, collapseinlinks and is compatible with the direct Solr indexing done during harvesting when 'SolrHost' is defined in the Combine configuration. Requires that the Solr server is running.
--xsltscript
Generates records in Combine native format and converts them using this XSLT script before output. See example scripts in /etc/combine/*.xsl
--number
the max number of records to be exported
--recordid
Export just the one record with this recordid
--md5
Export just the one record with this MD5 checksum
--pipehost, --pipeport
Specifies the server-name and port to connect to and export data using the Alvis Pipeline. Exports incrementally, ie all changes since last call to combineExport with the same pipehost and pipeport.
--incremental
Exports incrementally, ie all changes since last call to combineExport using --incremental
Product's homepage
Requirements:
· Perl
What's New in This Release: [ read full changelog ]
· Fixed some tests
· Added support for exceptions to GeoIP
· Better handling of special characters
· Added support for new URL scheduling algorithms (including score based)
· Improved HTML -> text extraction