ExecBlock

Introduction

The ExecBlock is a concept which tries to simplify the problem of context switching between the host and the guest.

The main problem behind context switching is to be able to reference a memory location owned by the host in the context of the guest. Loading the memory location in a register effectively destroys a guest value which would thus need to be saved somewhere. One could allow the usage of the guest stack and save values on it but this design has two major drawbacks. Firstly, although supposedly unused, this modifies guest memory and could have side effects if the program uses uninitialized stack values or in the case of unforeseen optimizations. Secondly, this assumes that the stack registers always point on the stack or that a stack exists at all which might not be the case for more exotic languages or assembly code. The only possible mechanism left is to use relative addressing load and store. While x86 and x86-64 allow 32 bits offset, ARM only allows 11 bits offset [1]. The ExecBlock design thus only requires to be able to load and store general purpose registers with an address offset up to 4096 bytes.

Note

By default, AVX registers are also saved, one may want to disable this in case where it is not necessary, thus improving perfomances. It can be achieved using the environment flag QBDI_FORCE_DISABLE_AVX.

Some modern operating systems do not allow the allocation of memory pages with read, write and execute permissions (RWX) for security reasons as this greatly facilitates remote code execution exploits. It is thus necessary to allocate two separate pages, with different permissions, for code and data. By exploiting the fact that most architectures use memory pages of 4096 bytes [2], allocating a data memory page next to a code memory page would allow the first instruction to address at least the first address of the data memory page.

Memory Layout

An ExecBlock is thus composed of two contiguous memory pages: the first one is called the code block and has read and execute permissions (RX) and the second one is called the data block and has read and write permissions (RW). The code block contains a prologue, responsible for the host to guest context switching, followed by the code of the translated basic block and finally the epilogue responsible for the guest to host switching. Both the prologue and the epilogue use the beginning of the data block to store the context data. The context is split in three parts: the GPR context, the FPR context and the host context. The GPR and FPR context are straight forward and documented in the API itself as the GPRState and FPRState (see the State Management part of the API). The host context is used to store host data and a memory pointer called the selector. The selector is used by the prologue to determine on which basic block to jump next. The remaining space in the data block is used for shadows which can be used to store any data needed by the patching or instrumentation process.

_images/execblock_v3.svg

Reference

class ExecBlock

Manages the concept of an exec block made of two contiguous memory blocks (one for the code, the other for the data) used to store and execute instrumented basic blocks.

Public Functions

ExecBlock(const LLVMCPUs &llvmCPUs, VMInstanceRef vminstance, const std::vector<std::unique_ptr<RelocatableInst>> *execBlockPrologue = nullptr, const std::vector<std::unique_ptr<RelocatableInst>> *execBlockEpilogue = nullptr, uint32_t epilogueSize = 0)

Construct a new ExecBlock

Parameters:
  • llvmCPUs[in] LLVMCPU used to assemble instructions in the ExecBlock.

  • vminstance[in] Pointer to public engine interface

  • execBlockPrologue[in] cached prologue of ExecManager

  • execBlockEpilogue[in] cached epilogue of ExecManager

  • epilogueSize[in] size in bytes of the epilogue (0 is not know)

void changeVMInstanceRef(VMInstanceRef vminstance)

Change vminstance when VM object is moved

void show() const

Display the content of an exec block to stderr.

VMAction execute()

Execute the sequence currently programmed in the selector of the exec block. Take care of the callbacks handling.

SeqWriteResult writeSequence(std::vector<Patch>::const_iterator seqStart, std::vector<Patch>::const_iterator seqEnd)

Write a new sequence in the exec block. This function does not guarantee that the sequence will be written in its entierty and might stop before the end using an architecture specific terminator. Return 0 if the exec block was full and no instruction was written.

Parameters:
  • seqStart – [in] Iterator to the start of a list of patches.

  • seqEnd – [in] Iterator to the end of a list of patches.

Returns:

A structure detailling the write operation result.

uint16_t splitSequence(uint16_t instID)

Split an existing sequence at instruction instID to create a new sequence.

Parameters:

instID – [in] ID of the instruction where to split the sequence at.

Returns:

The new sequence ID.

inline rword getDataBlockBase() const

Get the address of the DataBlock

Returns:

The DataBlock offset.

inline rword getDataBlockOffset() const

Compute the offset between the current code stream position and the start of the data block. Used for pc relative memory access to the data block.

Returns:

The computed offset.

inline rword getEpilogueOffset() const

Compute the offset between the current code stream position and the start of the exec block epilogue code. Used for computing the remaining code space left or jumping to the exec block epilogue at the end of a sequence.

Returns:

The computed offset.

inline uint32_t getEpilogueSize() const

Get the size of the epilogue

Returns:

The size of the epilogue.

inline rword getCurrentPC() const

Obtain the value of the PC where the ExecBlock is currently writing instructions.

Returns:

The PC value.

inline uint16_t getNextInstID() const

Obtain the current instruction ID.

Returns:

The current instruction ID.

uint16_t getInstID(rword address, CPUMode cpumode) const

Obtain the instruction ID for a specific address (the address must exactly match the start of the instruction).

Parameters:
  • address – The address of the start of the instruction.

  • cpumode – The mode of the instruction

Returns:

The instruction ID or NOT_FOUND.

inline uint16_t getCurrentInstID() const

Obtain the current instruction ID.

Returns:

The ID of the current instruction.

const InstMetadata &getInstMetadata(uint16_t instID) const

Obtain the instruction metadata for a specific instruction ID.

Parameters:

instID – The instruction ID.

Returns:

The metadata of the instruction.

rword getInstAddress(uint16_t instID) const

Obtain the instruction address for a specific instruction ID.

Parameters:

instID – The instruction ID.

Returns:

The real address of the instruction.

rword getInstInstrumentedAddress(uint16_t instID) const

Obtain the instrumented address for a specific instruction ID.

Parameters:

instID – The instruction ID.

Returns:

The address in the BasicBlock of the instruction.

const llvm::MCInst &getOriginalMCInst(uint16_t instID) const

Obtain the original MCInst for a specific instruction ID.

Parameters:

instID – The instruction ID.

Returns:

The original MCInst of the instruction.

const InstAnalysis *getInstAnalysis(uint16_t instID, AnalysisType type) const

Obtain the analysis of an instruction. Analysis results are cached in the InstAnalysis. The validity of the returned pointer is only guaranteed until the end of the callback, else a deepcopy of the structure is required.

Parameters:
  • instID[in] The ID of the instruction to analyse.

  • type[in] Properties to retrieve during analysis.

Returns:

A InstAnalysis structure containing the analysis result.

inline uint16_t getNextSeqID() const

Obtain the next sequence ID.

Returns:

The next sequence ID.

uint16_t getSeqID(rword address, CPUMode cpumode) const

Obtain the sequence ID for a specific address (the address must exactly match the start of the sequence).

Parameters:

address – The address of the start of the sequence.

Returns:

The sequence ID or NOT_FOUND.

uint16_t getSeqID(uint16_t instID) const

Obtain the sequence ID containing a specific instruction ID.

Parameters:

instID – The instruction ID.

Returns:

The sequence ID or NOT_FOUND.

inline uint16_t getCurrentSeqID() const

Obtain the current sequence ID.

Returns:

The ID of the current sequence.

uint16_t getSeqStart(uint16_t seqID) const

Obtain the sequence start address for a specific sequence ID.

Parameters:

seqID – The sequence ID.

Returns:

The start address of the sequence.

uint16_t getSeqEnd(uint16_t seqID) const

Obtain the instruction id of the sequence end address for a specific sequence ID.

Parameters:

seqID – The sequence ID.

Returns:

The end address of the sequence.

void selectSeq(uint16_t seqID)

Set the selector of the exec block to a specific sequence offset. Used to program the execution of a specific sequence within the exec block.

Parameters:

seqID – [in] Basic block ID within the exec block.

inline Context *getContext() const

Get a pointer to the context structure stored in the data block.

Returns:

The context pointer.

uint16_t newShadow(uint16_t tag = ShadowReservedTag::Untagged)

Allocate a new shadow within the data block. Used by relocation to load or store data from the instrumented code.

Parameters:

tag – The tag associated with the registration, 0xFFFF is reserved for unregistered shadows.

Returns:

The shadow id (which is its index within the shadow array).

uint16_t getLastShadow(uint16_t tag)

Search the last Shadow with the tag for the current instruction. Used by relocation to load or store data from the instrumented code.

Parameters:

tag – The tag associated with the registration

Returns:

The shadow id (which is its index within the shadow array).

void setShadow(uint16_t id, rword v)

Set the value of a shadow.

Parameters:
  • id – [in] ID of the shadow to set.

  • v – [in] Value to assigne to the shadow.

rword getShadow(uint16_t id) const

Get the value of a shadow.

Parameters:

id – [in] ID of the shadow.

Returns:

Value of the shadow.

rword getShadowOffset(uint16_t id) const

Get the offset of a shadow within the data block.

Parameters:

id – [in] ID of the shadow.

Returns:

Offset of the shadow.