ByteWeight is an effective tool to recognize functions in binaries, especially those without symbols. Given a binary, ByteWeight outputs function information in any of the following three ways:

  • Function Start. This is the lowest address of each function.
  • Function Boundary. This is a pair of address (s, e). The first address s is the lowest address, and the second address e is the highest address.
  • Function Instruction. This is a list of address, and each address is a start of an disassembled instruction.
  • ByteWeight is a simple and accurate function recognition tool. We evaluated ByteWeight against three well-known tools that feature function identification: IDA, BAP, and Dyninst. Our data set consists of 2,200 binaries created with three different compilers, with four different optimization levels, and across two different operating systems. In our experiments with 2,200 binaries, we found that in general, ByteWeight wins all other tools. In particular, ByteWeight missed 44,621 functions in comparison with the 266,672 functions missed by the industry-leading tool IDA. Furthermore, while IDA misidentified 459,247 functions, ByteWeight misidentified only 43,992 functions. For more information, please refer to our publication (See Reference).


    We offer virtual machine including source code as well as experiment data. All prerequisite packages are already installed in the virtual machine, so you should be able to compile and run after log in (see Quick Start). We will integrate ByteWeight to the coming version of BAP (Binary Analysis Platform).



    We applied 10-fold cross validation to the evaluation. For binaries in each architecture and each target, we randomly splited them to 10 folds, and we tested each fold of binaries based on the training from the other 9 folds.

    The experiment data is shown as below. For each link, you will see the folders named from 1 to 10. In each folder, the file "signature" is the trained signature file and the folder "binary" is the folder of the testing binaries.

    ELF (Linux) 64-bit

    ELF (Linux) 32-bit

    PE (Windows) 64-bit

    PE (Windows) 32-bit

    Quick Start

    You can log into the virtual machine under user name byteweight and password password. ByteWeight is in the home directory of user, and inside there are two folders: code and experiment. To run the code, please read file ~/ByteWeight/code/README.


    Tiffany Bao, Jonathan Burket, Maverick Woo, Rafael Turner, and David Brumley. ByteWeight: Learning to Recognize Functions in Binary Code. In Proceedings of the 23rd USENIX Security Symposium, 2014, pp. 845-860. [pdf]


    If you have any questions, please contact Tiffany Bao. Comments or suggestions are also welcomed.