Crawler

For collecting information from the web

FOCA

FOCA (Fingerprinting Organizations with Collected Archives) is a tool used mainly to find metadata and hidden information in the documents its scans. These documents may be on web pages, and can be downloaded and analyzed with FOCA.

It is capable of analyzing a wide variety of documents, with the most common being Microsoft Office, Open Office, or PDF files, although it also analyzes Adobe InDesign or SVG files, for instance.

These documents are searched for using three possible search engines: Google, Bing, and Exalead. The sum of the results from the three engines amounts to a lot of documents. It is also possible to add local files to extract the EXIF information from graphic files, and a complete analysis of the information discovered through the URL is conducted even before downloading the file.

With all data extracted from all files, FOCA matches information in an attempt to identify which documents have been created by the same team and what servers and clients may be inferred from them.

Spiderfoot

SpiderFoot
SpiderFoot is an open source intelligence automation tool. Its goal is to automate the process of gathering intelligence about a given target.

Purpose
There are three main areas where SpiderFoot can be useful:

If you are a pen-tester, SpiderFoot will automate the reconnaissance stage of the test, giving you a rich set of data to help you pin-point areas of focus for the test.

Understand what your network/organization is openly exposing to the outside world. Such information in the wrong hands could be a significant risk.

SpiderFoot can also be used to gather threat intelligence about suspected malicious IPs you might be seeing in your logs or have obtained via threat intelligence data feeds.

Mini MySqlat0r

Mini MySqlat0r is a multi-platform application used to audit web sites in order to discover and exploit SQL injection vulnerabilities. It is written in Java and is used through a user-friendly GUI that contains three distinct modules.

The Crawler modules allows the user to view the web site structure and gather all tamper able parameters. These parameters are then sent to the Tester module that tests all parameters for SQL injection vulnerabilities. If any are found, they are then sent to the Exploiter module that can exploit the injections to gather data from the database.

Mini MySqlat0r can be used on any platform running the Java environment and is distributed under licence GPL.

REQUIREMENTS:

The Java runtime environment is necessary to use Mini MySqlat0r:
Java JRE

SPartan

Overview:
SPartan is a Frontpage and Sharepoint fingerprinting and attack tool. Features:

Sharepoint and Frontpage fingerprinting
Management of Friendly 404s
Default Sharepoint and Frontpage file and folder enumeration
Active Directory account enumeration
Download interesting files and documents, including detection of uninterpreted ASP and ASPX
Search for keywords in identified pages
Saves state from previous scans
Site crawling
Accepts NTLM creds and session cookies for authenticated scans

whatweb

WhatWeb identifies websites. Its goal is to answer the question, “What is that Website?”.
WhatWeb recognises web technologies including content management systems (CMS), blogging platforms, statistic/analytics packages, JavaScript libraries, web servers, and embedded devices.
WhatWeb can be stealthy and fast, or thorough but slow.
WhatWeb supports an aggression level to control the trade off between speed and reliability.
When you visit a website in your browser, the transaction includes many hints of what web technologies are powering that website.
Sometimes a single webpage visit contains enough information to identify a website but when it does not, WhatWeb can interrogate the website further.
The default level of aggression, called ‘passive’, is the fastest and requires only one HTTP request of a website.
This is suitable for scanning public websites. More aggressive modes were developed for in penetration tests.
Most WhatWeb plugins are thorough and recognise a range of cues from subtle to obvious.
For example, most WordPress websites can be identified by the meta HTML tag, e.g. ‘‘, but a minority of WordPress websites remove this identifying tag but this does not thwart WhatWeb.
The WordPress WhatWeb plugin has over 15 tests, which include checking the favicon, default installation files, login pages, and checking for “/wp-content/” within relative links.

Example Usage
whatweb [options]
Using WhatWeb on a handful of websites, standard WhatWeb output is in colour.
backbox@backbox:~$ whatweb google.it
http://google.it [301] X-XSS-Protection[1; mode=block], HTTPServer[gws],
RedirectLocation[1], UncommonHeaders[x-xss-protection], IP[74.125.39.103],
Title[301 Moved], Country[UNITED STATES][US]
http://www.google.it/ [200] X-XSS-Protection[1; mode=block], HTTPServer[gws], UncommonHeaders[x-xss-protection], HTML5, IP[74.125.39.99],
Cookies[NID,PREF], Title[Google], Country[UNITED STATES][US]

Verbose Output

SearchDiggity

SearchDiggity 3.1 is the primary attack tool of the Google Hacking Diggity Project. It is Stach & Liu’s MS Windows GUI application that serves as a front-end to the most recent versions of our Diggity tools: GoogleDiggity, BingDiggity, Bing LinkFromDomainDiggity, CodeSearchDiggity, DLPDiggity, FlashDiggity, MalwareDiggity, PortScanDiggity, SHODANDiggity, BingBinaryMalwareSearch, and NotInMyBackYard Diggity.

GScrape

GScrape is a small perl script that uses Google's Ajax API (Google::Search) to find vulnerable websites.

GScrape is a simple tool, it will look for a file specified by the user containing a list of search terms, query google with those search terms and retrieve an array of websites, which are then tested for Local File Inclusion and SQL injection vulnerabilities, if any are found they are logged to the output file specified by the user.

Example:
perl gscrape.pl -f dork.lst -o gscrape.log

Note:
GScrape will not return any results unless your input file actually contains a list of search terms.

Halcyon

Generates Kolkata fingerprints for web application identification.
Halcyon is a repository crawler that runs checksums for static files found within a given git repository. After performing a change frequency analysis, it begins recording the checksums with the static files updated the most often and works its way down from there. Using checksum data, the application then generates well-formed version fingerprint signatures in YML format, for easy feeding into kolkata. Additionally, signature output includes revision ID, so it may be possible to find an exact commit for the instance of the application in question.

Dependencies:
git repository software

Usage:
The application may be time-intensive, depending on the volume of files that need to be checksummed and the number of revisions that they may have.

usage: halcyon.py [-h] [-c] -u URL -f FILE -m MATCH
[--omit-directory OMIT_DIRECTORY] [-t TOP]

optional arguments:
-h, --help show this help message and exit
-c, --clone Clone the repo first.
-u URL, -p URL, --url URL, --path URL
Path or URL to the repository.
-f FILE, --file FILE File to search for version information
-m MATCH, --match MATCH
Regex to match line with version number (ie: '^\\\$wp_version = \x27([^']+)\x27;$')
--omit-directory OMIT_DIRECTORY
Comma separated list of directories to omit. (Helpful for removing install directories from signature generation)
-t TOP, --top TOP Top 'n' most-frequently-edited files to use. (0 for unlimited)

Example:
python2 halcyon.py -u https://github.com/WordPress/WordPress.git -c -f wp-version.php -m "^\\\$wp_version = \x27([^']+)\x27;$" -t 1

Vanguard

Vanguard is an extensible utility with module support built for testing different types of web exploitation on a given domain.
Features

Main application features:
Fully Configurable
WebCrawlers crawl all open HTTP and HTTPS ports output from nmap
LibWhisker2 For HTTP IDS Evasion (Same options as nikto)
Tests via GET,POST, and COOKIE

Web penetration tests:
SQL injection (This test is signature free!)
LDAP Injection
XSS
File inclusion
Command Injection

Usage:
perl scan.pl -h [hostname] -e [evasion option]

Application Dependencies:

Notice: You must run this application as root.
You must have nmap from http://nmap.org installed to run this application correctly.
Protip: You can undo the root requirement by removing the check for root and modifying the nmap configuration.

Perl Dependencies:
LibWhisker2 requires Net::SSLeay. You may need to get this from cpan, compile it in, or install it from your distribution's package manager.
YAML
Clone
Notice: You can install these libraries with cpan.

webvulscan

WebVulScan is a web application vulnerability scanner. It is a web application itself written in PHP and can be used to test remote, or local, web applications for security vulnerabilities. As a scan is running, details of the scan are dynamically updated to the user. These details include the status of the scan, the number of URLs found on the web application, the number of vulnerabilities found and details of the vulnerabilities found.

After a scan is complete, a detailed PDF report is emailed to the user. The report includes descriptions of the vulnerabilities found, recommendations and details of where and how each vulnerability was exploited.

The vulnerabilities tested by WebVulScan are:

Reflected Cross-Site Scripting
Stored Cross-Site Scripting
Standard SQL Injection
Broken Authentication using SQL Injection
Autocomplete Enabled on Password Fields
Potentially Insecure Direct Object References
Directory Listing Enabled
HTTP Banner Disclosure
SSL Certificate not Trusted
Unvalidated Redirects

Features:

Crawler: Crawls a website to identify and display all URLs belonging to the website.
Scanner: Crawls a website and scans all URLs found for vulnerabilities.
Scan History: Allows a user to view or download PDF reports of previous scans that they performed.
Register: Allows a user to register with the web application.
Login: Allows a user to login to the web application.
Options: Allows a user to select which vulnerabilities they wish to test for (all are enabled by default).
PDF Generation: Dynamically generates a detailed PDF report.
Report Delivery: The PDF report is emailed to the user as an attachment.

Syndicate content