NCBI C++ Toolkit Changelog

New in version 7.0.0

June 13th, 2011
  • HIGHLIGHTS:
  • Added LDS2 (Local Data Storage v.2) which is based on SQLite3, has new features and better performance. Also implemented LDS2 data loader to use LDS2 from the Object Manager.
  • XmlWrapp –this convenient XML handling API has been mostly finished (and even polished).
  • Implemented tunneling and authorization of HTTP connections and tunneling of secure sockets, through HTTP proxies.
  • CFormatGuess now allows distinguishing between GTF, GFF3, and GFF2. It's a possibly breaking change. For more details see below.
  • Implemented major parts of CFeatTree, the class to organize features defined on a biological sequence into a hierarchy that reflects their parent-child relationships (based on the feature subtypes).
  • CORELIB:
  • Implemented locale-independent conversion of string to double and back; changed core libraries to use it.
  • NStr::Justify() -- for formatting of paragraphs of text.
  • CNcbiApplication -- make FindProgramExecutablePath static, and more robust; add a static higher-level GetAppName method. Look for global configuration files in more cases.
  • CMetaRegistry::FindRegistry -- new method exposing the logic determining which file (if any) to load.
  • CEnvironmentCleaner -- new class to discard unwanted environment variables.
  • CFileIO -- back to original behavior: do not close the file handle if it's assigned via SetFileHandle().
  • SERIAL:
  • Serialization of AnyContent data objects -- fixed to recognize and properly process attributes in their values.
  • Corrected the reading of XML data to assign to an element default value when it has no content.
  • Added support for sequences of elements, where the element has a default value.
  • DATATOOL:
  • Corrected code generation of:
  • CHOICE data objects;
  • binary data types with attributes.
  • Corrected conversion of double type values to preserve more significant digits.
  • CONNECT:
  • Added keepalive socket option (fSOCK_KeepAlive).
  • Added NCBI connectivity test (CConnTest).
  • UTILITES:
  • g_FindDataFile -- New function for locating data files in (configurable) standard locations.
  • CChecksumStreamWriter - new class to compute checksum of the data written to a stream.
  • g_GZip_ScanForChunks() - new API, to query compressed stream positions. Added implementation for getting positions for separate gzip-files inside concatenated gzip file.
  • Added compression/decompression stream manipulators (include/util/compress/stream_util.hpp).
  • CFormatGuess (util/format_guess.{h/c}pp) updated, with a possibly breaking change. The purpose of this is to allow CFormatGuess to distinguish between GTF, GFF3, and GFF2. Currently it lumps all of those formats into a one 'eGtf' value. The old 'eGtf' value (3) is being replaced with 'eGtf_POISONED', and will not be returned again. The new value for 'eGtf' (21) will mean a file that should be read with CGtfReader (objtools/readers/gtf_reader.hpp). The new value 'eGff3' (22) is for files meant to be read with CGff3Reader (objtools/readers/gff3_reader.hpp), and 'eGff2' (24) is for files meant to be read with CGff2Reader (include/objtools/readers/gff2_reader.hpp)
  • BIO-OBJECTS:
  • CBioseq::GetNonLocalId -- New method to help place sequences imported from FASTA files with range specifications in more context; wrapped by CBioseq_Handle::GetNonLocalIdOrNull (likewise new).
  • CSeq_id::IdentifyAccession -- Implement or improve recognition for more prefixes (GA, HH, HI, HO-HU, JA-JO, EAAA-EZZZ, and IAA-IZZ, some of which correspond to the new possibility of DDBJ TPA WGS data) and mixed-in TPA protein accessions (mostly from EMBL, but some from GenBank too).
  • Distinguish WGS master accessions by a new flag bit. Relax over-strict PDB recognition logic.
  • CSeq_id::IsValidLocalID, CSeq_id::ParseIDs -- New functionality for working with plain-text sequence identifiers, factored out of CFastaReader and generalized somewhat.
  • SSeqIdRange -- New type (complete with parser and on-the-fly "iterator") for working with Seq-id ranges, as present in some FASTA defline source modifiers.
  • BIO-TOOLS:
  • CFastaOstream -- Optionally accept custom titles for single sequences. Tag negative-strand ranges with leading 'c's.
  • CFastaReader -- Support negative-strand ranges and Sequin's compact defline-style gap syntax (">?N" where N is a number; or ">?unk100").
  • COBALT:
  • Added command-line option -num_domain_hits that limits number of conserved domains per sequence used in computing alignment constraints.
  • Phylogenetic trees:
  • Added higher level interface for computing phylogenetic tree from sequence alignments (for example BLAST and COBALT results). Class CPhyTreeCalc computes phylogenetic tree, and CPhyTreeFormater prints the tree in Newick and Nexus format.
  • BIO-OBJECT LIBRARIES:
  • Implemented CheckNumRows() and other methods for sparse alignments.
  • To reduce memory footprint: added read-hooks to reduce memory used by alignments after deserialization; Na-strand now uses one byte of memory where possible; Score.value choice is now embedded in CScore.
  • Capitalize accession in CSeq_id::GetLabel().
  • BIO-OBJECT MANAGER:
  • Added getter methods for boolean fields in CTableFieldHandle.
  • Added GetBestGeneForFeat() based on CFeatTree.
  • Implemented GetBestOverlappingFeat() on CFeatTree.
  • Added fast CScope::GetTaxid().
  • Implemented bulk loading for acc/ver, gi, label, and taxid.
  • Added zero-length gaps check to CSeqMap and CSeqVector.
  • Implemented GetLength() and GetCoverage() for bond locations.
  • Improvements:
  • Added helper method to fill CFeatTree on location.
  • Sped up mapping of simple CSeq_loc_mix locations in CFeat_CI.
  • Stricter sorting of features in CFeat_CI to avoid ambiguities.
  • CSeq_feat_Handle getters now work with Seq-table features too.
  • Seq-table features now support multi-level user fields.
  • Non Seq-feat Seq-tables are now recognized even if located in split chunk.
  • Sped up CBioseq_Handle::AddId().
  • Optimized CScope::AttachXxx().
  • Support split of named annotation.
  • CSeqVector and CSeqVector_CI's CanGetRange() now return false instead of throwing an exception.
  • Allow to specify how to deal with existing handles in ResetHistory().
  • Optimized re-parenting if more features are added to CFeatTree.
  • Added possibility to debug CScope creation/deletion.
  • Many changes to the C++ cleanup functionality to imitate the cleanup functionality which already exists in C. There is still more work to be done with BasicCleanup, but significant progress has been made. Little work has been done for ExtendedCleanup as of yet.
  • CSeq_loc_Mapper can now be initialized with a GC-Assembly.
  • Bug fixes:
  • Fixed mapping of mix locations on minus strand in CFeat_CI.
  • Many fixes in the way CFeatTree links features.
  • Several thread-safety fixes.
  • Fixed typo preventing adding aligns and graphs to CSeq_annot_EditHandle.
  • Safeguard against exceptions when sorting features in CFeat_CI.
  • GENBANK DATA LOADER:
  • Registered HPRD external annotations.
  • Added optional exclude_wgs_master param in pubseqos/pubseqos2 readers.
  • Implemented bulk loading for acc/ver, gi, label, and taxid.
  • Added CGBDataLoader::CloseCache().
  • Improvement:
  • Use bulk loading requests in CScope::GetBioseqHandles().
  • Separate reader statistics by type of loaded blobs.
  • Added timestamp to GenBank debug messages.
  • Use IConnValidator for opening PubSeqOS connections.
  • Added split-version to chunk requests and chunk subkeys in GenBank cache to avoid using wrong chunks when blob split state is changed in ID.
  • Added secondary less confusing param names for open timeout.
  • Do not multiply retry count by number of connections.
  • OBJECT MANAGER TEST AND DEMO APPLICATIONS:
  • id2_fetch_simple -- added -id options for arbitrary Seq-id's.
  • test_bulkinfo -- new test application.
  • FASTA:
  • C++ feature table functionality has been made more functional such as for part of the BankIt project.
  • asn2flat utility
  • Huge number of changes to flatfile formatter to bring it much closer to release-ready state (possibly release ready at this point, although some relatively minor issues remain).
  • XMLWRAPP:
  • Fixed segmentation fault in case of taking a reference to XPath expression running results.
  • Added helpers to get public ID, system ID and DTD name for external and internal subsets.
  • Added methods to lookup node attributes.
  • Fixed execution of XPath expression: it now starts from the given node.
  • Fixed searching attributes (including default) when a namespace is provided.
  • Added ability to run XPath expression without necessity to register namespaces explicitly.
  • Added ability to provide containers for collecting errors and warnings while parsing documents.
  • Added ability to modify values and namespaces of node’s default attributes.
  • Added ability to test if an attribute is default.
  • Added ability to insert or remove attributes while taking into account their namespaces.
  • Added ability to strip XML declaration when a document is saved.
  • WindowMasker:
  • Added a new input format, "seqids"; with this input format, the input is a file containing a sequence id on each line, and the algorithm uses the Bio-Object Manager to look up the sequences.
  • Added a new class CWinMaskConfig, for storing all the WindowMasker configuration parameters. The class can be used to add the needed command-line arguments to CArgDescriptions, and then get the configuration parameters from the command-line arguments.
  • BUILD FRAMEWORK (UNIX):
  • Interpret command-line specifications of APP_PROJ or LIB_PROJ as a cue to clear out other *_PROJ settings not also provided there. (Requires GNU Make; builds with Sun make continue to work as before.)
  • Supply more targets in subdirectories: *_f (using local flat makefiles produced on demand, ignoring dependencies on other parts of the tree), *_fd (wrapping the top-level Makefile.flat), clean_sources and purge_sources.
  • Configure and its convenience scripts (compilers/unix/*.sh):
  • Noteworthy new flag --without-3psw – to not use with any 3rd-party software.
  • Added a check for GLEW.
  • Improved checks for Boost and OpenGL.
  • Support specifying run paths on Darwin (Mac) systems with modern toolchains.
  • BLAST:
  • On Darwin (Mac OS X), build only for Intel processors even in otherwise universal builds due to a PowerPC toolchain limitation.
  • Added support for retrieving NCBI Taxonomy IDs for which WindowMasker support is available.
  • Allow the specification of a query sequence along with multiple sequence alignment file in psiblast.
  • Added database hard-masking support.
  • Added database soft-masking for translated searches.
  • Added support for btop (BLAST traceback operations) and query and subject length in the tabular report.
  • Command-line applications -- allow psiblast to search multiple queries, added optional -input_type for makeblastdb
  • Allow use of best hit and XML in blast2sequences mode.
  • Improved formatting performance for remote searches.
  • makembindex can now build masked MegaBLAST index directly from a BLAST nucleotide database using the masking information stored in the BLAST database. This is accomplished by new command line option -db_mask to makembindex. The option accepts the integer id of the filtering algorithm supported by the BLAST database. The option can only be applied in conjunction with -iformat blastdb.
  • To assist a user in finding out the numeric ids of filtering algorithms supported by a BLAST database, the flag -show_filters is introduced. Applying the flag with -iformat blastdb and BLAST database as an input causes makembindex to output a list of available filtering algorithms and exit.
  • APPLICATIONS NETCACHE:
  • NetCache is reworked to include the following features:
  • better management of disk space;
  • lock-less work with blobs, versioning is used instead;
  • multi-port listening and per-client settings differentiating.
  • NetCache and ICache APIs:
  • Use Uint8 everywhere for blob size.
  • Allow partial blob retrieval.
  • Introduced blob password protection; empty passwords are treated as no password.
  • Worker node APIs:
  • New parameter for terminating the worker node if its memory consumption exceeds the specified limit (parameter "total_memory_limit").
  • New parameter for terminating the worker node if its run time exceeds the specified limit (parameter "total_time_limit").
  • GRID APPLICATIONS:
  • netscheduled
  • Fixed a bug that caused no reply to the queue deletion command.
  • remote_app
  • New configuration parameter ("tmp_dir") to control how temporary directory name is generated - to reduce its length.
  • Log blob writing error.
  • netcache_control
  • Allow partial blob retrieval.
  • New command -remove to delete blobs by their ids.
  • New parameter -auth to specify authentication string to use.
  • New commands -reconf and -reinit for use by NetCache administrators.
  • netschedule_control
  • Enabled compatibility mode to make netschedule_control work with older worker nodes.
  • cgi2rcgi.cgi
  • Do not create an empty NetCache blob as a placeholder for the progress message.
  • Log Grid errors that are reported to the user.
  • Allow spaces in the job ID parameter.
  • Support output of the job status information in JSON format.
  • Allow custom HTML templates to be defined for GRID errors and other events.
  • Added no-cache HTTP headers to avoid caching of intermediate results.
  • ncfetch.cgi
  • New parameter to access password-protected blobs.
  • Interpret extra parameter "filename" as a file name for the downloaded file.

New in version Dec 31 2008 (January 1st, 2009)

  • This release adds a method to compute column-specific pseudocounts in PSI-BLAST.
  • It refactors the grid services library.
  • It adds unit test framework and error logging for all File API classes.
  • It fixes pthread support on IRIX. It enhances support of XML serialization.
  • It fixes support for Sybase.
  • It adds support for smaller lookup tables for small queries.
  • It adds an API to retrieve GenBank loader statistics.
  • It has assorted other enhancements, speedups, and bugfixes.