1.0 GPL (GNU General Public License)    



htmlcat is a script that combines a number of HTML files into one.

The beginning of the first file (up to and including < body ... >) is used for all the files since only their bodies are concatenated. An optional divider followed by the label of a file is used between files.

Note that:

* the code relies on the calling shell to expand wildcard filenames like '*.html'; this is automatic in a Unix shell, but does not happen at a DOS prompt
* the original files must conform to HTML conventions; if necessary use htmlfix first to correct major problems
* < body ... > and < /body > must be on a line of their own; any other information on these lines will be lost
* in anchors, href="..." and name="..." must be not be split across a line
* any material after "< /body >" (such as HTML comments) will be lost
* the script might get confused by a symbolic directory index link or references to files in remote directories (though it does its best)
* if you move the concatenated HTML file, remember to move any other local files (e.g. images) to the same relative location (e.g. the same directory)
* for use with a frame-based collection of files, exclude the frameset definition file from the list of inputs and probably start with a contents file


The command line options are:

print divider between concatenated files
print usage as help
-o file
name output file (this will be ignored if present in the input list, e.g. due to giving *.html)
sort input files into case-insensitive alphabetical order (putting the index file first if necessary, and removing the file it points to from the inputs if it is a symbolic link)


Run on one or more HTML files. Warning messages are sent to standard error. Examples of usage are:

htmlcat -o some.html def.html res.html
concatenate def.html and res.html to some.html
htmlcat -d -o all.html *.html
concatenate all HTML files to all.html with dividers between them
htmlcat -o -s out.html *.html
sort then concatenate all HTML files to out.html
htmlcat *.html > /tmp/all.html
concatenate all HTML files to standard output (here /tmp/all.html); for this method, do not create a concatenated file in the same directory or the script will run indefinitely on its own output!

The only things likely to need changed for installation are the directory index filename and the nature of a file divider (see customise subroutine in the code).

Change the first line of the script according to where Perl is located. Although tested with Perl5, the script may work with only minor changes for Perl4.
Last updated on September 27th, 2005

0 User reviews so far.

htmlcat is a script that combines a number of HTML files into one.


#perl script #Concatenate HTML Files #combines HTML files #htmlcat #perl #script #Concatenate