oss-license-extract is a program which generates a comprehensive a copyright notice and license for a given set of source files. It is particularly useful when redistributing open source software or auditing existing licenses.
This program is a Perl script which recursively scans a list of files and directories for program files, and attempts to automatically extract license and copyright information from them. Copyright statements are linked with their accompanying licenses and each license is compared with the licenses from the other files to eliminate duplicates. The output is a comprehensive list containing a single copy of each unique license and its associated copyright holders covering all files scanned. This list is printed to standard output by default, though a different name can be given on the command line.
Files and directories to scan can be given on the command line and/or specified in a text file with the -f option. A program file is currently defined as one which matches *.c, *.h, *.sh, *.pl, or *.py (the program actually calls find(1) with those globs to determine which files to read).
The program detects licenses by looking for blocks of text that start with either "Redistribution" or "Permission". It assumes any copyright notices will be found before the license, if present. Multiple licenses in the same file are supported. Licenses which contain several common forms of the advertising clause and/or a warranty will have the authors' names' replaced with "copyright holder". This is to avoid detecting as unique the large number of licenses that differ only in the author's name.
Some licenses include additional text between the copyright notice and the start of the license itself. The program will include this text at the end of all other license and copyright information, after checking for and removing duplicates. This text can be excluded using the '-x' option.
Duplicate licenses are detected by converting the text to lower case, stripping all punctuation and whitespace, and then comparing the remainder. Obviously, licenses which differ only by transposed characters, typos, and other subtle changes will be considered two completely different licenses. Improvements to this algorithm are welcomed and encouraged.
Usage: oss-license-extract.pl [-h] [-f listfile] [-x] [-o outfile] file|dir[...]