Read more of this story at Slashdot.
Read more of this story at Slashdot.
by Aditi Gupta, Carnegie Mellon University
As a summer intern at Trail of Bits, I’ve been working on building Fennec, a tool to automatically replace function calls in compiled binaries that’s built on top of McSema, a binary lifter developed by Trail of Bits.
Let’s say you have a compiled binary, but you don’t have access to the original source code. Now, imagine you find something wrong with your program, or something you’d like to change. You could try to fix it directly in the binary—for example, by patching the file in a hex editor—but that becomes tedious very quickly. Instead, being able to write a C function and swap it in would massively speed up the process.
I spent my summer developing a tool that allows you to do so easily. Knowing the name of the function you want to replace, you can write another C function that you want to use instead, compile it, and feed it into Fennec, which will automatically create a new and improved binary.
To demonstrate what Fennec can do, let’s look at a fairly common cryptographic vulnerability that has shown up in the real world: the use of a static initialization vector in the CBC mode of AES encryption. In the very first step of CBC encryption, the plaintext block is XOR’ed with an initialization vector (IV). An IV is a block of 128 bits (the same as the block size) that is used a single time in any encryption to prevent repetition in ciphertexts. Once encrypted, this ciphertext plays the role of the IV for the next block of plaintext and is XOR’ed with this plaintext block.
This process can become insecure when an initialization vector is constant throughout plaintexts. Under a fixed IV, if every message begins with the same block of plaintext, they will all correspond to the same ciphertext. In other words, a static IV can allow an attacker to analyze multiple ciphertexts as a group rather than as individual messages. Below is an example of an IV being generated statically.
unsigned char *generate_iv() { return (unsigned char *)"0123456789012345"; }
Sometimes, developers will use cryptography libraries like OpenSSL to do the actual encryption, but write their own functions to generate IVs. This can be dangerous, since non-random IVs can make AES insecure. I built Fennec to help fix issues like this one—it checks whether IV generation was random or static and replaces the function with a new, secure IV if necessary.
The end goal was to lift executable binaries to LLVM bitcode with McSema and combine it with some LLVM manipulation to replace any function automatically. I started by understanding my cryptographic example and exploring different ways of patching binaries as a bit of background before getting started.
My first step was to work through a couple of the Matasano Cryptopals Challenges to learn about AES and how it can be used and broken. This stage of the project gave me working encryption and decryption programs in C, both of which called OpenSSL, as well as a few Python scripts to attack my implementations. My encryption program used a static IV generation function, which I was hoping to replace automatically later.
I kept using these C binaries throughout the summer. Then, I started looking at binary patching. I spent some time looking into both LD_PRELOAD and the Witchcraft Compiler Collection, which would work if my IV generation function was dynamically linked into the program. The goal of my project, however, was to replace function calls within binaries, not just dynamically loaded functions.
I didn’t want to complicate everything with lifted bitcode yet, so I started by using clean bitcode that generated directly from source code. I wanted to run an LLVM pass on this bitcode to change the functionality of part of my program—namely, the part that generated an IV.
I started by trying to change the function’s bitcode directly in my pass, but soon moved to writing a new function in C and making my original program call that function instead. Every call to the old function would be replaced with a call to my new function.
After some experiments, I created an LLVM pass that would replace all calls to my old function with calls to a new one. Before moving to lifted bitcode, I added code to make sure I would still be able to call the original function if I wanted to. In my cryptographic example, this meant being able to check whether the original function was generating a static IV and, if so, replace it with the code below, as opposed to assuming it was insecure and replacing it no matter what.
// a stub function that represents the function in original binary unsigned char *generate_iv_original() { unsigned char *result = (unsigned char *)""; // the contents of this function do not matter return result; } unsigned char *random_iv() { unsigned char *iv = malloc(sizeof(int) * 16); RAND_bytes(iv, 16); // an OpenSSL call return iv; } unsigned char *replacement() { unsigned char *original = generate_iv_original(); for (int i = 0; i < 10; i++) { unsigned char *iv = generate_iv_original(); if (iv == original) { . // if the IV is static return random_iv(); } } return original; }
With my tool working on clean bitcode, it was time to start looking at lifted bitcode. I familiarized myself with how McSema worked by lifting and recompiling binaries and looking through the intermediate representation. Because McSema changes the way functions are called, it took some extra effort to make my tool work on lifted bitcode in the same way that it had on clean bitcode. I had to lift both the original binary and the replacement with McSema. Additional effort was required because the replacement function in a non-lifted binary doesn’t follow McSema’s calling conventions, so it couldn’t be swapped in trivially.
Function names and types are more complex through McSema, but I eventually made a working procedure. Like the tool for clean bitcode, the original function could be kept for use in the replacement.
The last step was to generalize my process and wrap everything into a command line tool that others could use. So I tested it on a variety of targets (including stripped binaries and dynamically-loaded functions), added tests, and tested my installation process.
The complete process consists of three primary steps: 1) lifting the binaries to bitcode with McSema, 2) using an LLVM pass to carry out the function replacement within the bitcode, and 3) recompiling a new binary. The LLVM pass is the core of this tool, as it actually replaces the functions. The pass works by iterating through each instruction in the program and checking whether it is a call to the function we want to replace. In the following code, each instruction is checked for calls to the function to replace.
for (auto &B : F) { for (auto &I : B) { // check if instruction is call to function to be replaced if (auto *op = dyn_cast(&I)) { auto function = op->getCalledFunction(); if (function != NULL) { auto name = function->getName(); if (name == OriginalFunction) { ...
Then, we find the replacement function by looking for a new function with the specified name and same type as the original.
Type *retType = function->getReturnType(); FunctionType *newFunctionType = FunctionType::get(retType, function->getFunctionType()->params(), false); // create new function newFunction = (Function *)(F.getParent()->getOrInsertFunction(ReplacementFunction, newFunctionType));
The next step is to pass the original function’s arguments to the new call.
CallSite CS(&I); // get args to original function to be passed to replacement std::vector arguments; for (unsigned int i = 0; i < CS.arg_size(); i++) { Value *arg = CS.getArgument(i); arguments.push_back(arg); } ArrayRev argArray = ArrayRef(arguments);
Finally, we create a call to the new replacement function and delete any calls to the original function.
IRBuilder builder(op); // create call to replacement function Value* newCall = builder.CreateCall(newFunction, argArray); // replace all calls to old function with calls to new function for (auto& U : op->uses()) { User* user = U.getUser(); user->setOperand(U.getOperandNo(), newCall); }
Although the LLVM pass does the work of replacing a given function, it is wrapped with the other steps in a bash script that implements the full process. First, we disassemble and lift both input binaries using McSema.
Next, we analyze and tweak the bitcode to find the names of the functions as McSema represents them. This section of code includes support for both dynamically-loaded functions and stripped binaries, which affect the names of functions. We need to know these names so that we can pass them as arguments to the LLVM pass when we actually do the replacement. If we were to look for the names from the original binary, the LLVM pass wouldn’t be able to find any matching functions, since we’re using lifted bitcode.
Finally, we run the pass. If we don’t need access to the original function, we only need to run the pass on the original binary. If, however, we want to call the original function from the replacement, we run the pass on both the original binary and the replacement binary. In this second case, we are replacing the original function with the replacement function, and the stub function with the original function. Lastly, we recompile everything to a new working binary.
Fennec uses binary lifting and recompilation to make a difficult problem relatively manageable. It’s especially useful for fixing security bugs in legacy software, where you might not have access to source code.
Using this tool, it becomes possible to automatically fix a cryptographic IV vulnerability. As seen below, the original binary encrypts a message identically each time using a static IV. After running Fennec, however, the newly created binary uses a different IV, thereby producing a unique ciphertext each time it is run, even on the same plaintext (blue).
# Original binary aditi@nessie:~/ToB-Summer19$ ./encrypt "" MDEyMzQ1Njc4OTAxMjM0NQ==/reJh+5rktBatDpyuJNQEBo++0pyIRGZiNsmZkN09HTPIOBVqQ9ov6CrxPXO7dC4cUJGYzBEsejHuTQyjVQh+XsLCHyDkURmfCuJ+a97raPY+o8pKKt8yf/xTmYMtyq2zf7EQxqPxv2bXKdP+6K+h9KyuO3q4+3JbuJFTesNLy8Np1m9ShJ9UAHvAdO6LCZvQ N91kz0ytIH+s7LgajIWyises+yz26UBQwOzZLeLcQp4= 176 aditi@nessie:~/ToB-Summer19$ ./encrypt "" MDEyMzQ1Njc4OTAxMjM0NQ==/reJh+5rktBatDpyuJNQEBo++0pyIRGZiNsmZkN09HTPIOBVqQ9ov6CrxPXO7dC4cUJGYzBEsejHuTQyjVQh+XsLCHyDkURmfCuJ+a97raPY+o8pKKt8yf/xTmYMtyq2zf7EQxqPxv2bXKdP+6K+h9KyuO3q4+3JbuJFTesNLy8Np1m9ShJ9UAHvAdO6LCZvQ N91kz0ytIH+s7LgajIWyises+yz26UBQwOzZLeLcQp4= 176 aditi@nessie:~/ToB-Summer19$ ./encrypt "" MDEyMzQ1Njc4OTAxMjM0NQ==/reJh+5rktBatDpyuJNQEBo++0pyIRGZiNsmZkN09HTPIOBVqQ9ov6CrxPXO7dC4cUJGYzBEsejHuTQyjVQh+XsLCHyDkURmfCuJ+a97raPY+o8pKKt8yf/xTmYMtyq2zf7EQxqPxv2bXKdP+6K+h9KyuO3q4+3JbuJFTesNLy8Np1m9ShJ9UAHvAdO6LCZvQ N91kz0ytIH+s7LgajIWyises+yz26UBQwOzZLeLcQp4= 176 aditi@nessie:~/ToB-Summer19$ bash run.sh 2 ../mcsema-2.0.0-ve/remill-2.0.0/remill-build-2/ /home/aditi/ToB-Summer19/ida-6.9/idal64 encrypt replaceIV generate_iv replacement generate_iv_original -lcrypto # Fennec's modified binary aditi@nessie:~/ToB-Summer19$ ./encrypt.new "" L+PYRFiOKMcu18hSqdGQEw==/aK2hYm/GXHwA2tqZxPmoNccQwW+Zhj7E0PQUSRF+lOLJiEMwOc7yv+/Z2AA0pEJjP7Jq4lHMpq2eIVl73lvav0pJiVlOcmfnFwQ9cu0MW0EWqUdgl2FCsWKtO/TAfGhcQPopJyvP8KD/LHlru4QIfZiym7//tt0V9vvabFCLNiSTRG350XKO/zoydeuRFfSu 0HmNNQbAcLSQkcUETH424RyQ4SxmcreW3krOw30kfJY= 176 aditi@nessie:~/ToB-Summer19$ ./encrypt.new "" hYnowxN2Z3QyPIzwNaFzJw==/pzCq+V1q5ipHoqJXZ9MaeDr+nMdV5E1RbeI+YrcQqXjFHcVmDSq4yZboEuIJJjkbNbdO5DG6n3CQnZ1C7CumGdaZsddaYJueORROk7X+PnQZUq5bKqvdN7ZJEhK7qaerjogOF4TAotDV3ryLC6l/EWY01DkhGrf0hlXAkjQnOz28lXF40GNMd6pIjcoIbZze V72v5s5q67fVdKdCzVE3BH76qX8qYS9YnN5JkGLERYA= 176 aditi@nessie:~/ToB-Summer19$ ./encrypt.new "" r3/wMu5nD3rEFn7N88fCjQ==/MisK9RcK8RLsqjV2nrAfprghBYrBmeJS3FbJ4YG6zHBk+uA0CcZ+R4CSDolAaAPlCmkupfxy6bFHNEqyMVv7moPaiJEAkHDDU/FKen8eAJjMvz9+RK+xmQja238jk7xmaS6JbJOdh8teQ2XiMzlHsBYBVpw89UBFrTqOSN8qtlgU3aR4xUVlwZAA1+Pg2GHy 2CIWQI6ioHGDhN3P3po7MaOldJAgHGZO5d2GluroI70= 176
You can download Fennec and find instructions for its use here.
If you have questions or comments about the tool, you can find Aditi on Twitter at @aditi_gupta0!