Malware Analysis

Malware --- malicious software such as viruses, bots, Trojan horses, and so on --- is a growing problem. The volume of new malware is growing at an exponential rate, fueled by easy-to-use malware toolkits that automatically create hundreds of unique variants. At the same time, malware authors are deploying new techniques such as packers, encryption, and virtual machines that make analyzing a single malware instance much harder.

Our malware research currently has two main thrusts:

  1. Malware triage. Currently over 4,000 new malware variants are discovered on average per day. Given 4,000 new malware samples, which one should you look at first? Which new samples are really new, and which are simply syntactic morphs of existing malware? We are investigating malware triage techniques for helping responders to deal with the onslaught of new malware samples.
  2. Malware program analysis. Malware authors will do everything they can to fool analysis. Our research focuses on both techniques for attackers, i.e., how do you better fool analysis, as well as techniques for defenders, i.e., how can we defeat obfuscation techniques.
The SplitScreen Project: Malware Scanning at 2x the speed and 1/2 the memory.

As the amount of malware grows, so does the number of signatures used by anti-malware products (also called anti-viruses) to detect known malware. In 2008, Symantec created over 1.6 million new signatures, versus a still-boggling six hundred thousand new signatures in 2007. The explosion of signatures poses significant scaling challenges to malware defenses running on traditional computers, as well as defenses for emerging weaker computational devices such as smart-phones, iPads, and netbooks.

The SplitScreen project is design to make existing pattern-based anti-malware solutions faster, less memory hungry, and applicable to weaker computational devices. At a high level, SplitScreen divides scanning into two steps. First, all files are scanned using a small, cache-optimized data structure we call a feed-forward Bloom filter (FFBF). The FFBF implements an approximate pattern-matching algorithm that has one-sided error: it will properly identify all malicious files, but may also identify some safe files as malicious. The FFBF outputs: (1) a set of suspect matched files, and (2) a subset of signatures from the signature database needed to confirm that suspect files are indeed malicious. SplitScreen then rescans the suspect matched files using the subset of signatures using an exact pattern matching algorithm.

The SplitScreen architecture naturally leads to a demand-driven, network-based architecture where clients download the larger exact signatures only when needed in step 2 (SplitScreen still accelerates traditional single-host scanning when running the client and the server on the same host):

SplitScreen Architecture
Source Code and Paper

Our paper on SplitScreen was published at NSDI 2010. It is available here.

We implemented SplitScreen by modifying ClamAV. We make our source code available below in the interest of scholarly dissemination. We recommend to install the client before installing the server application. We do not offer any support for the code. (Though we are interested in working with people with extensions, and will be happy to help such people out with any difficulties.)

If you would like to the source code, please email David Brumley. We are happy to give it to academic researchers with a .edu email and webpage. Corporations that are members of CyLab can also get access. (This is one of the advantages to being a CyLab member.)

Credits

SplitScreen is a joint collaboration with Sang Kil Cha, Iulian Moraru, David Brumley, and David Andersen.