ExecBlock

Introduction

The ExecBlock is a concept which tries to simplify the problem of context switching between the host and the guest.

The main problem behind context switching is to be able to reference a memory location owned by the host in the context of the guest. Loading the memory location in a register effectively destroys a guest value which would thus need to be saved somewhere. One could allow the usage of the guest stack and save values on it but this design has two major drawbacks. Firstly, although supposedly unused, this modifies guest memory and could have side effects if the program uses uninitialized stack values or in the case of unforeseen optimizations. Secondly, this assumes that the stack registers always point on the stack or that a stack exists at all which might not be the case for more exotic languages or assembly code. The only possible mechanism left is to use relative addressing load and store. While x86 and x86-64 allow 32 bits offset, ARM only allows 11 bits offset [1]. The ExecBlock design thus only requires to be able to load and store general purpose registers with an address offset up to 4096 bytes.

Note

By default, AVX registers are also saved, one may want to disable this in case where it is not necessary, thus improving perfomances. It can be achieved using the environment flag QBDI_FORCE_DISABLE_AVX.

Some modern operating systems do not allow the allocation of memory pages with read, write and execute permissions (RWX) for security reasons as this greatly facilitates remote code execution exploits. It is thus necessary to allocate two separate pages, with different permissions, for code and data. By exploiting the fact that most architectures use memory pages of 4096 bytes [2], allocating a data memory page next to a code memory page would allow the first instruction to address at least the first address of the data memory page.

[1]Thumb supports even less but this is not a problem as, with the exception of embedded ARM architectures (the Cortex-M series), one can always switch between ARM mode and Thumb mode.
[2]This is at least true for x86, x86_64, ARM, ARM64 and PowerPC.

Memory Layout

An ExecBlock is thus composed of two contiguous memory pages: the first one is called the code block and has read and execute permissions (RX) and the second one is called the data block and has read and write permissions (RW). The code block contains a prologue, responsible for the host to guest context switching, followed by the code of the translated basic block and finally the epilogue responsible for the guest to host switching. Both the prologue and the epilogue use the beginning of the data block to store the context data. The context is split in three parts: the GPR context, the FPR context and the host context. The GPR and FPR context are straight forward and documented in the API itself as the GPRState and FPRState (see the State Management part of the API). The host context is used to store host data and a memory pointer called the selector. The selector is used by the prologue to determine on which basic block to jump next. The remaining space in the data block is used for shadows which can be used to store any data needed by the patching or instrumentation process.

_images/execblock_v3.svg

Reference

class QBDI::ExecBlock

Manages the concept of an exec block made of two contiguous memory blocks (one for the code, the other for the data) used to store and execute instrumented basic blocks.

Public Functions

ExecBlock(Assembly &assembly, VMInstanceRef vminstance = nullptr)

Construct a new ExecBlock

Parameters
  • assembly: Assembly used to assemble instructions in the ExecBlock.
  • vminstance: Pointer to public engine interface

void show() const

Display the content of an exec block to stderr.

VMAction execute()

Execute the sequence currently programmed in the selector of the exec block. Take care of the callbacks handling.

SeqWriteResult writeSequence(std::vector<Patch>::const_iterator seqStart, std::vector<Patch>::const_iterator seqEnd, SeqType seqType)

Write a new sequence in the exec block. This function does not guarantee that the sequence will be written in its entierty and might stop before the end using an architecture specific terminator. Return 0 if the exec block was full and no instruction was written.

Return
A structure detailling the write operation result.
Parameters
  • seqStart: [in] Iterator to the start of a list of patches.
  • seqEnd: [in] Iterator to the end of a list of patches.
  • seqType: [in] Type of the sequence.

uint16_t splitSequence(uint16_t instID)

Split an existing sequence at instruction instID to create a new sequence.

Return
The new sequence ID.
Parameters
  • instID: [in] ID of the instruction where to split the sequence at.

rword getDataBlockOffset() const

Compute the offset between the current code stream position and the start of the data block. Used for pc relative memory access to the data block.

Return
The computed offset.

rword getEpilogueOffset() const

Compute the offset between the current code stream position and the start of the exec block epilogue code. Used for computing the remaining code space left or jumping to the exec block epilogue at the end of a sequence.

Return
The computed offset.

rword getCurrentPC() const

Obtain the value of the PC where the ExecBlock is currently writing instructions.

Return
The PC value.

uint16_t getNextInstID() const

Obtain the current instruction ID.

Return
The current instruction ID.

uint16_t getInstID(rword address) const

Obtain the instruction ID for a specific address (the address must exactly match the start of the instruction).

Return
The instruction ID or NOT_FOUND.
Parameters
  • address: The address of the start of the instruction.

uint16_t getCurrentInstID() const

Obtain the current instruction ID.

Return
The ID of the current instruction.

const InstMetadata *getInstMetadata(uint16_t instID) const

Obtain the instruction metadata for a specific instruction ID.

Return
The metadata of the instruction.
Parameters
  • instID: The instruction ID.

rword getInstAddress(uint16_t instID) const

Obtain the instruction address for a specific instruction ID.

Return
The address of the instruction.
Parameters
  • instID: The instruction ID.

const llvm::MCInst *getOriginalMCInst(uint16_t instID) const

Obtain the original MCInst for a specific instruction ID.

Return
The original MCInst of the instruction.
Parameters
  • instID: The instruction ID.

uint16_t getNextSeqID() const

Obtain the next sequence ID.

Return
The next sequence ID.

uint16_t getSeqID(rword address) const

Obtain the sequence ID for a specific address (the address must exactly match the start of the sequence).

Return
The sequence ID or NOT_FOUND.
Parameters
  • address: The address of the start of the sequence.

uint16_t getSeqID(uint16_t instID) const

Obtain the sequence ID containing a specific instruction ID.

Return
The sequence ID or NOT_FOUND.
Parameters
  • instID: The instruction ID.

uint16_t getCurrentSeqID() const

Obtain the current sequence ID.

Return
The ID of the current sequence.

uint16_t getSeqStart(uint16_t seqID) const

Obtain the sequence start address for a specific sequence ID.

Return
The start address of the sequence.
Parameters
  • seqID: The sequence ID.

uint16_t getSeqEnd(uint16_t seqID) const

Obtain the instruction id of the sequence end address for a specific sequence ID.

Return
The end address of the sequence.
Parameters
  • seqID: The sequence ID.

void selectSeq(uint16_t seqID)

Set the selector of the exec block to a specific sequence offset. Used to program the execution of a specific sequence within the exec block.

Parameters
  • seqID: [in] Basic block ID within the exec block.

Context *getContext() const

Get a pointer to the context structure stored in the data block.

Return
The context pointer.

uint16_t newShadow(uint16_t tag = NO_REGISTRATION)

Allocate a new shadow within the data block. Used by relocation to load or store data from the instrumented code.

Return
The shadow id (which is its index within the shadow array).
Parameters
  • tag: The tag associated with the registration, 0xFFFF is reserved for unregistered shadows.

void setShadow(uint16_t id, rword v)

Set the value of a shadow.

Parameters
  • id: [in] ID of the shadow to set.
  • v: [in] Value to assigne to the shadow.

rword getShadow(uint16_t id) const

Get the value of a shadow.

Return
Value of the shadow.
Parameters
  • id: [in] ID of the shadow.

rword getShadowOffset(uint16_t id) const

Get the offset of a shadow within the data block.

Return
Offset of the shadow.
Parameters
  • id: [in] ID of the shadow.