whatweb

WhatWeb identifies websites. Its goal is to answer the question, “What is that Website?”.
WhatWeb recognises web technologies including content management systems (CMS), blogging platforms, statistic/analytics packages, JavaScript libraries, web servers, and embedded devices.
WhatWeb can be stealthy and fast, or thorough but slow.
WhatWeb supports an aggression level to control the trade off between speed and reliability.
When you visit a website in your browser, the transaction includes many hints of what web technologies are powering that website.
Sometimes a single webpage visit contains enough information to identify a website but when it does not, WhatWeb can interrogate the website further.
The default level of aggression, called ‘passive’, is the fastest and requires only one HTTP request of a website.
This is suitable for scanning public websites. More aggressive modes were developed for in penetration tests.
Most WhatWeb plugins are thorough and recognise a range of cues from subtle to obvious.
For example, most WordPress websites can be identified by the meta HTML tag, e.g. ‘‘, but a minority of WordPress websites remove this identifying tag but this does not thwart WhatWeb.
The WordPress WhatWeb plugin has over 15 tests, which include checking the favicon, default installation files, login pages, and checking for “/wp-content/” within relative links.

Example Usage
whatweb [options]
Using WhatWeb on a handful of websites, standard WhatWeb output is in colour.
backbox@backbox:~$ whatweb google.it
http://google.it [301] X-XSS-Protection[1; mode=block], HTTPServer[gws],
RedirectLocation[1], UncommonHeaders[x-xss-protection], IP[74.125.39.103],
Title[301 Moved], Country[UNITED STATES][US]
http://www.google.it/ [200] X-XSS-Protection[1; mode=block], HTTPServer[gws], UncommonHeaders[x-xss-protection], HTML5, IP[74.125.39.99],
Cookies[NID,PREF], Title[Google], Country[UNITED STATES][US]

Verbose Output
backbox@backbox:~$ whatweb -v www.morningstarsecurity.com
www.morningstarsecurity.com/ [200]

http://www.morningstarsecurity.com [200] WordPress[3.0.1],
Google-API[ajax/libs/jquery/1.3.2/jquery.min.js ], Google-Analytics[GA][791888],
HTTPServer[Apache], UncommonHeaders[x-pingback], JQuery[1.4.2],
Title[MorningStar Security], MetaGenerator[WordPress 3.0.1], RSSFeed[2],
MD5[59f20aef7452702787fff7ec46733501], Tag-Hash[2e45809b1f8a1ecf782757d8dbafbb08],
Header-Hash[dba021c0aa225c8eede02c7dcc45b0d8], Footer-Hash[d0efcc9da7c8c45eb1e2ac5b8d5b354e]

Footer-Hash => hash (string: d0efcc9da7c8c45eb1e2ac5b8d5b354e)
Google-API => google javascript API (version: ajax/libs/jquery/1.3.2/jquery.min.js )
Google-Analytics => pageTracker = ...UA-123-1231 (string: GA,accounts: 791888)
HTTPServer => server string (string: Apache)
Header-Hash => hash (string: dba021c0aa225c8eede02c7dcc45b0d8)
JQuery => script (version: 1.4.2)
MD5 => md5 hash of html (string: 59f20aef7452702787fff7ec46733501)
MetaGenerator => meta generator tag (string: WordPress 3.0.1)
RSSFeed => rss link type, rss link (string: http://www.morningstarsecurity.com/wp-content/themes/pyrmont-v2-white/st...)
Tag-Hash => tag pattern hash (string: 2e45809b1f8a1ecf782757d8dbafbb08)
Title => page title (string: MorningStar Security)
UncommonHeaders => headers (string: x-pingback)
WordPress => wp-content (certainty: 75), meta generator tag (version: 3.0.1), Relative /wp-content/ link

Log Output
There are currently 6 types of log output. They are:
--log-brief=FILE Log brief, one-line output. Default output.
--log-full=FILE Log verbose output (might be removed in future)
--log-xml=FILE Log XML format
--log-json=FILE Log JSON format
--log-json-verbose=FILE Log JSON Verbose format
--log-errors=FILE Log errors. This is usually printed to the screen in red.

You can output to multiple logs simultaneously by specifying multiple command line logging options.

Brief Logging
backbox@backbox:~$ whatweb --brief-full b.log digg.com
http://digg.com [200] X-Powered-By[PHP/5.2.9-digg8], Cookies[1337,PHPSESSID,ccc], UncommonHeaders[keep-alive], Title[Digg - The Latest News
Headlines, Videos and Images], HTTPServer[Apache], Mailto, Header-Hash[2df7eaaa4480f28013aaf48ae9266b84], MD5[24bc43e698e5d1388e836f5eee094fbe],
Footer-Hash[ca2ffbc939969a2246cde196f0fc4841], Div-Span-Structure[828d809947c3c760d41c720c9203993b]

This is one connection per line and is search-able with grep.
XML Logging

The XML logging is currently naive and may change. Please contact me if you have suggestions.
whatweb --log-xml x.log digg.com

Plugins
Matches are made with:
Text strings (case sensitive)
Regular expressions
Google Hack Database queries (limited set of keywords)
MD5 hashes
URL recognition
HTML tag patterns
Custom ruby code for passive and aggressive operations

Show the plugin list:
backbox@backbox:~$ whatweb -l

To view more detail about a plugin or plugins:
backbox@backbox:~$ whatweb -I phpBB