Halcyon

Generates Kolkata fingerprints for web application identification.
Halcyon is a repository crawler that runs checksums for static files found within a given git repository. After performing a change frequency analysis, it begins recording the checksums with the static files updated the most often and works its way down from there. Using checksum data, the application then generates well-formed version fingerprint signatures in YML format, for easy feeding into kolkata. Additionally, signature output includes revision ID, so it may be possible to find an exact commit for the instance of the application in question.

Dependencies:
git repository software

Usage:
The application may be time-intensive, depending on the volume of files that need to be checksummed and the number of revisions that they may have.

usage: halcyon.py [-h] [-c] -u URL -f FILE -m MATCH
[--omit-directory OMIT_DIRECTORY] [-t TOP]

optional arguments:
-h, --help show this help message and exit
-c, --clone Clone the repo first.
-u URL, -p URL, --url URL, --path URL
Path or URL to the repository.
-f FILE, --file FILE File to search for version information
-m MATCH, --match MATCH
Regex to match line with version number (ie: '^\\\$wp_version = \x27([^']+)\x27;$')
--omit-directory OMIT_DIRECTORY
Comma separated list of directories to omit. (Helpful for removing install directories from signature generation)
-t TOP, --top TOP Top 'n' most-frequently-edited files to use. (0 for unlimited)

Example:
python2 halcyon.py -u https://github.com/WordPress/WordPress.git -c -f wp-version.php -m "^\\\$wp_version = \x27([^']+)\x27;$" -t 1