dtRdr::doc::Book::whitespace Perl module contains issues with whitespace.
Weird things happen when whitespace doesn't count, but sort of counts.
The annotations rely on a reliable character position, which can be very different from byte offset due to character encoding and whitespace collapses. Thus, we have to establish conventions for whitespace which can be consistently applied in all of these situations.
All your spaces are belong to one position.
The general rule is that any amount of whitespace, whether spanning a tag or not, is treated a single space character.
This becomes a little difficult with book formats that contain (rendered) nested content nodes. Because of these types of books, a position needs to be able to map from global to local so that the position in a parent can be calculated given the position in a child. See dtRdr::doc::Book::annotree for "math is fun."
As for whitespace, we have to adopt a convention that a space at the end or beginning of a node needs to belong somewhere. In these examples, I'll use square brackets to represent the opening and closing of node xml tags.
[ a [b][c[d]]]
[ a[ b][c[d] ] ]
The above are not intended to be necessarily equivalent. Just representative situations.
Because lots of linebreaks and/or indentation from manual editing and/or conversion tools is so common, the situation almost always looks like this in reality.
[ a [ b ] [ c [ d ] ] ]
This should basically reduce into the following:
[a [b ][c [d ]]]
no node starts with a space
there are no consecutive spaces, regardless of tag boundaries
This convention is important because it needs to be shared between the book base class (which does the annotation-insertion xml munging) and the individual book plugins (which build the annotation offset table to allow for position math.)
I still need to prove it, but I believe that even this should be equivalent to the canonical example above.
[ a[ b][ c[ d] ]]
And, to be pragmatic, this is not really worth chasing, since nested content nodes which are accessible both individually and from within the parent is an impossible-to-resolve-into-a-pagewise-reader concept.
To install this distribution, run the following commands:
If the Build.PL step complains of dependencies, they should be satisfied via your favorite CPAN mirror.