Malheur Malware Analyzer

Malheur is a tool for the automatic analysis of malware behavior (program behavior recorded from malicious software in a sandbox environment). It has been designed to support the regular analysis of malicious software and the development of detection and defense measures. Malheur allows for identifying novel classes of malware with similar behavior and assigning unknown malware to discovered classes.

Malheur builds on the concept of dynamic analysis: Malware binaries are collected in the wild and executed in a sandbox, where their behavior is monitored during run-time. The execution of each malware binary results in a report of recorded behavior. Malheur analyzes these reports for discovery and discrimination of malware classes using machine learning.

Malheur can be applied to recorded behavior of various format, as long as monitored events are separated by delimiter symbols, for example as in reports generated by the popular malware sandboxes CWSandbox, Anubis, Norman Sandbox and Joebox.

Extraction of prototypes. From a given set of reports, Malheur identifies a subset of prototypes representative for the full data set. The prototypes provide a quick overview of recorded behavior and can be used to guide manual inspection.

Clustering of behavior. Malheur automatically identifies groups (clusters) of reports containing similar behavior. Clustering allows for discovering novel classes of malware and provides the basis for crafting specific detection and defense mechanisms, such as anti-virus signatures.

Classification of behavior. Based on a set of previously clustered reports, Malheur is able to assign unknown behavior to known groups of malware. Classification enables identifying novel variants of malware and can be used to filter program behavior prior to manual inspection.

Incremental analysis. Malheur can be applied incrementally for analysis of large data sets. By processing reports in chunks, run-time and memory requirements are significantly reduced. This renders long-term application feasible, for example for daily analysis of incoming malware.

Examples:
Distances of program behavior
malheur -o out.txt -v distance dataset.zip

Extraction of prototypes
malheur -o out.txt -v prototype dataset.zip

Clustering and classification
malheur -o out1.txt -v cluster dataset1.zip
malheur -o out2.txt -v classify dataset2.zip

Incremental analysis
malheur -o out1.txt -v -r increment dataset1.zip
malheur -o out2.txt -v increment dataset2.zip
malheur -o out3.txt -v increment dataset2.zip

Debugging
malheur -o /dev/null -vvv prototype dataset.zip