PyQBDI

PyQBDI brings Python3 bindings over the QBDI API. That way, you can take advantage of the QBDI features directly from your Python scripts without bothering using C/C++. It may be pretty useful if you need to build something quickly. However, it introduces some limitations:

PyQBDI cannot be used to instrument a Python process
The performances are poorer than when using the C/C++ APIs
The Python runtime’s and the target’s architectures must be the same

Memory allocation

Unlike the C/C++ APIs, interacting with the process’ memory is much more complicated while in Python – that is, memory regions cannot be allocated, read or written. Luckily, PyQBDI offers helpers to allow users to perform these actions.

import pyqbdi

value = b"bytes array"

addr = pyqbdi.allocateMemory(len(value))
pyqbdi.writeMemory(addr, value)
value2 = pyqbdi.readMemory(addr, len(value))
assert value == value2
pyqbdi.freeMemory(addr)

Load the target code

In this tutorial, we aim at executing the foo function which lies in a shared library whose name is mylib.so, in the context of QBDI. PyQBDI will give us a hand doing so.

import pyqbdi
import ctypes

mylib = ctypes.cdll.LoadLibrary("mylib.so")
funcPtr = ctypes.cast(mylib.foo, ctypes.c_void_p).value

Note that if you want to instrument a whole binary, PyQBDIPreload should be preferred (see PyQBDIPreload).

Initialise the virtual machine

First off, we need to initialise the virtual machine (VM) itself. Calling the pyqbdi.VM() is needed to craft a new instance.

vm = pyqbdi.VM()

Allocate a virtual stack

The virtual machine does not work with the regular stack that your process uses – instead, QBDI needs its own stack. Therefore, we have to ask for a virtual stack using pyqbdi.allocateVirtualStack(). This function is responsible for allocating an aligned memory space, setting the stack pointer register accordingly and returning the top address of this brand-new memory region.

state = vm.getGPRState()
fakestack = pyqbdi.allocateVirtualStack(state, 0x100000)
assert fakestack != None

Write our first callback function

Now that the virtual machine has been set up, we can start playing with QBDI core features.

To have a trace of our execution, we will need a callback that will retrieve the current address and the disassembly of the instruction and print it.

As the callback will be called on an instruction, the callback must follow the InstCallback type. Inside the callback, we can get an InstAnalysis of the current instruction with pyqbdi.VM.getInstAnalysis(). To have the address and the disassembly, the InstAnalysis needs to have the type pyqbdi.ANALYSIS_INSTRUCTION (for the address) and pyqbdi.ANALYSIS_DISASSEMBLY (for the disassembly). These two pyqbdi.AnalysisType are the default parameters of pyqbdi.VM.getInstAnalysis() and can be omitted.

def showInstruction(vm, gpr, fpr, data):
    # Obtain an analysis of the instruction from the VM
    instAnalysis = vm.getInstAnalysis()

    # Printing disassembly
    print("0x{:x}: {}".format(instAnalysis.address, instAnalysis.disassembly))

    return pyqbdi.CONTINUE

An InstCallback must always return an action (pyqbdi.VMAction) to the VM to specify if the execution should continue or stop. In most cases CONTINUE should be returned to continue the execution.

Register a callback

The callback must be registered in the VM. The function pyqbdi.VM.addCodeCB() allows registering a callback for every instruction. The callback can be called before the instruction (pyqbdi.PREINST) or after the instruction (pyqbdi.POSTINST).

cid = vm.addCodeCB(pyqbdi.PREINST, showInstruction, None)
assert cid != pyqbdi.INVALID_EVENTID

The function returns a callback ID or the special ID pyqbdi.INVALID_EVENTID if the registration fails. The callback ID can be kept if you want to unregister the callback later.

Set instrumented ranges

QBDI needs a range of addresses where the code should be instrumented. If the execution goes out of this scope, QBDI will try to restore an uninstrumented execution.

In our example, we need to include the method in the instrumented range. The method pyqbdi.VM.addInstrumentedModuleFromAddr() can be used to add a whole module (binary or library) in the instrumentation range with a single address of this module.

assert vm.addInstrumentedModuleFromAddr(funcPtr)

Run the instrumentation

We can finally run the instrumentation using the pyqbdi.VM.call() function. It aligns the stack, sets the argument(s) (if needed) and a fake return address and calls the target function through QBDI. The execution stops when the instrumented code returns to the fake address.

asrun, retval = vm.call(funcPtr, [args1, args2])
assert asrun

pyqbdi.VM.call() returns when the function has completely run in the context of QBDI. The first return value has been filled with the value of the return register (e.g. RAX for X86_64).

It may turn out that the function does not expect the calling convention pyqbdi.VM.call() uses. In this precise case, you must set up the proper context and the stack yourself and call pyqbdi.VM.run() afterwards.

Terminate the execution properly

At last, before exiting, we need to free up the virtual stack we have allocated by calling pyqbdi.alignedFree().

pyqbdi.alignedFree(fakestack)

Full example

Merging everything we have learnt throughout this tutorial, we are now able to solve real problems. For instance, the following example shows how one can generate an execution trace of the sin function by using a PyQBDI script:

#!/usr/bin/env python3

import sys
import math
import ctypes
import pyqbdi
import struct

def insnCB(vm, gpr, fpr, data):
    instAnalysis = vm.getInstAnalysis()
    print("0x{:x}: {}".format(instAnalysis.address, instAnalysis.disassembly))
    return pyqbdi.CONTINUE


def run():
    # get sin function ptr
    if sys.platform == 'darwin':
        libmname = 'libSystem.dylib'
    elif sys.platform == 'win32':
        libmname = 'api-ms-win-crt-math-l1-1-0.dll'
    else:
        libmname = 'libm.so.6'
    libm = ctypes.cdll.LoadLibrary(libmname)
    funcPtr = ctypes.cast(libm.sin, ctypes.c_void_p).value

    # init VM
    vm = pyqbdi.VM()

    # create stack
    state = vm.getGPRState()
    addr = pyqbdi.allocateVirtualStack(state, 0x100000)
    assert addr is not None

    # instrument library and register memory access
    vm.addInstrumentedModuleFromAddr(funcPtr)
    vm.recordMemoryAccess(pyqbdi.MEMORY_READ_WRITE)

    # add callbacks on instructions
    vm.addCodeCB(pyqbdi.PREINST, insnCB, None)

    # Cast double arg to long and set FPR
    arg = 1.0
    carg = struct.pack('<d', arg)
    fpr = vm.getFPRState()
    fpr.xmm0 = carg

    # call sin(1.0)
    pyqbdi.simulateCall(state, 0x42424242)
    success = vm.run(funcPtr, 0x42424242)

    # Retrieve output FPR state
    fpr = vm.getFPRState()
    # Cast long arg to double
    res = struct.unpack('<d', fpr.xmm0[:8])[0]
    print("%f (python) vs %f (qbdi)" % (math.sin(arg), res))

    # cleanup
    pyqbdi.alignedFree(addr)

if __name__ == "__main__":
    run()