PyQBDI
PyQBDI brings Python3 bindings over the QBDI API. That way, you can take advantage of the QBDI features directly from your Python scripts without bothering using C/C++. It may be pretty useful if you need to build something quickly. However, it introduces some limitations:
PyQBDI cannot be used to instrument a Python process
The performances are poorer than when using the C/C++ APIs
The Python runtime’s and the target’s architectures must be the same
Memory allocation
Unlike the C/C++ APIs, interacting with the process’ memory is much more complicated while in Python – that is, memory regions cannot be allocated, read or written. Luckily, PyQBDI offers helpers to allow users perform these actions.
import pyqbdi
value = b"bytes array"
addr = pyqbdi.allocateMemory(len(value))
pyqbdi.writeMemory(addr, value)
value2 = pyqbdi.readMemory(addr, len(value))
assert value == value2
pyqbdi.freeMemory(addr)
Load the target code
In this tutorial, we aim at executing the foo
function which lies in a shared library whose name is mylib.so
, in the context of QBDI.
PyQBDI will give us a hand doing so.
import pyqbdi
import ctypes
mylib = ctypes.cdll.LoadLibrary("mylib.so")
funcPtr = ctypes.cast(mylib.foo, ctypes.c_void_p).value
Note that if you want to instrument a whole binary, PyQBDIPreload should be preferred (see PyQBDIPreload).
Initialise the virtual machine
First off, we need to initialise the virtual machine (VM
) itself.
Calling the pyqbdi.VM()
is needed to craft a new instance.
vm = pyqbdi.VM()
Allocate a virtual stack
The virtual machine does not work with the regular stack that your process uses – instead, QBDI needs its own stack.
Therefore, we have to ask for a virtual stack using pyqbdi.allocateVirtualStack()
. This function is
responsible for allocating an aligned memory space, set the stack pointer register accordingly and return the top address of this brand-new memory region.
state = vm.getGPRState()
fakestack = pyqbdi.allocateVirtualStack(state, 0x100000)
assert fakestack != None
Write our first callback function
Now that the virtual machine has been set up, we can start playing with QBDI core features.
To have a trace of our execution, we will need a callback that will retrieve the current address and the disassembly of the instruction and print it.
As the callback will be called on an instruction, the callback must follow the InstCallback
type. Inside the
callback, we can get an InstAnalysis
of the current instruction with pyqbdi.VM.getInstAnalysis()
.
To have the address and the disassembly, the InstAnalysis
need to have the type
pyqbdi.ANALYSIS_INSTRUCTION
(for the address) and
pyqbdi.ANALYSIS_DISASSEMBLY
(for the disassembly). These
two pyqbdi.AnalysisType
are the default parameter of pyqbdi.VM.getInstAnalysis()
and
can be omitted.
def showInstruction(vm, gpr, fpr, data):
# Obtain an analysis of the instruction from the VM
instAnalysis = vm.getInstAnalysis()
# Printing disassembly
print("0x{:x}: {}".format(instAnalysis.address, instAnalysis.disassembly))
return pyqbdi.CONTINUE
An InstCallback
must always return an action (pyqbdi.VMAction
) to the VM to specify if the execution should
continue or stop. In most cases CONTINUE
should be returned to continue the execution.
Register a callback
The callback must be registered in the VM. The function pyqbdi.VM.addCodeCB()
allows registering
a callback for every instruction. The callback can be called before the instruction
(pyqbdi.PREINST
) or after the instruction
(pyqbdi.POSTINST
).
cid = vm.addCodeCB(pyqbdi.PREINST, showInstruction, None)
assert cid != pyqbdi.INVALID_EVENTID
The function returns a callback ID or the special ID pyqbdi.INVALID_EVENTID
if
the registration fails. The callback ID can be kept if you want to unregister the callback later.
Set instrumented ranges
QBDI needs a range of addresses where the code should be instrumented. If the execution goes out of this scope, QBDI will try to restore an uninstrumented execution.
In our example, we need to include the method in the instrumented range. The method pyqbdi.VM.addInstrumentedModuleFromAddr()
can be used to add a whole module (binary or library) in the range of instrumentation with a single address of this module.
assert vm.addInstrumentedModuleFromAddr(funcPtr)
Run the instrumentation
We can finally run the instrumentation using the pyqbdi.VM.call()
function.
It aligns the stack, sets the argument(s) (if needed) and a fake return address and
calls the target function through QBDI. The execution stops when the instrumented code returns to the
fake address.
asrun, retval = vm.call(funcPtr, [args1, args2])
assert asrun
pyqbdi.VM.call()
returns if the function has completely run in the context of QBDI.
The first argument has been filled with the value of the return register (e.g. RAX
for X86_64).
It may turn out that the function does not expect the calling convention pyqbdi.VM.call()
uses.
In this precise case, you must set up the proper context and the stack yourself and call pyqbdi.VM.run()
afterwards.
Terminate the execution properly
At last, before exiting, we need to free up the virtual stack we have allocated calling pyqbdi.alignedFree()
.
pyqbdi.alignedFree(fakestack)
Full example
Merging everything we have learnt throughout this tutorial, we are now able to solve real problems.
For instance, the following example shows how one can generate an execution trace of the sin
function by using a PyQBDI script:
#!/usr/bin/env python3
import sys
import math
import ctypes
import pyqbdi
import struct
def insnCB(vm, gpr, fpr, data):
instAnalysis = vm.getInstAnalysis()
print("0x{:x}: {}".format(instAnalysis.address, instAnalysis.disassembly))
return pyqbdi.CONTINUE
def run():
# get sin function ptr
if sys.platform == 'darwin':
libmname = 'libSystem.dylib'
elif sys.platform == 'win32':
libmname = 'api-ms-win-crt-math-l1-1-0.dll'
else:
libmname = 'libm.so.6'
libm = ctypes.cdll.LoadLibrary(libmname)
funcPtr = ctypes.cast(libm.sin, ctypes.c_void_p).value
# init VM
vm = pyqbdi.VM()
# create stack
state = vm.getGPRState()
addr = pyqbdi.allocateVirtualStack(state, 0x100000)
assert addr is not None
# instrument library and register memory access
vm.addInstrumentedModuleFromAddr(funcPtr)
vm.recordMemoryAccess(pyqbdi.MEMORY_READ_WRITE)
# add callbacks on instructions
vm.addCodeCB(pyqbdi.PREINST, insnCB, None)
# Cast double arg to long and set FPR
arg = 1.0
carg = struct.pack('<d', arg)
fpr = vm.getFPRState()
fpr.xmm0 = carg
# call sin(1.0)
pyqbdi.simulateCall(state, 0x42424242)
success = vm.run(funcPtr, 0x42424242)
# Retrieve output FPR state
fpr = vm.getFPRState()
# Cast long arg to double
res = struct.unpack('<d', fpr.xmm0[:8])[0]
print("%f (python) vs %f (qbdi)" % (math.sin(arg), res))
# cleanup
pyqbdi.alignedFree(addr)
if __name__ == "__main__":
run()