Site Map Generator 1.0
Site Map Generator application is a platform-independent site map generator.
To run the generator, you do not need a shell access to your web server. The script is implemented as a simple crawler that can run from any computer that has Python installed on it. The crawler only follows local links and skips links to external sites. The generator will generate sitemap records with the "< lastmod >" dates if your web server returns web pages with the 'Last-Modified' time stamp. If the crawler encounters an error when downloading a page or when parsing it, it will try to continue with another page.
To run the script, you will need Python version 2.4 or higher. (The current Python version is 2.5, you can download it from Python's official site.) The script needs no installation, simply copy it to a suitable directory and run it from there.
The script is mainly useful for smaller and medium-sized sites. It only generates a single sitemap file, so it will max out at 50,000 URLs (this is Google's limit for sitemap files). The script's default limit is 1,000 URLs but you can change it with the -m option.
The script's command line syntax is as follows:
python sitemap_gen.py < options > < starting URL >
The options are as follows:
-h --help Print the help and exit
-b < ext > --block < ext > Exclude URLs with the given extension; must be without the leading dot. The comparison is case insensitive, so for example DOC and doc are treated the same. You can use this option several times to block several extensions.
-c < value > --changefreq < value > Set the change frequency. The given value is used in all sitemap entries (maybe a future version of this script will change that). The allowed values are: always, hourly, daily, weekly, monthly, yearly, never.
-p < prio > --priority < prio > Set the priority. The value must be from the interval between 0.0 and 1.0. The value will be used in all sitemap entries.
-m < value > --max-urls < value > Set the maximum number of URLs to be crawled. The default value is 1000 and the largest value that you can set is 50000 (the script generates only a single sitemap file).
-o < file > --output-file < file > Set the name of the generated sitemap file. The default file name is sitemap.xml.
python sitemap_gen.py -b doc -b bmp -o test_sitemap.xml http://www.your-site-name.com/index.html