PatchDSL 

Language Concepts 

Type System 

The PatchDSL nomenclature is formalized as a function of where information belongs using two dichotomies:

A physical dichotomy is made between data stored in the machine registers and data stored in memory.
A conceptual dichotomy is made between data belonging or generated by the target program and data belonging or generated by QBDI.

This creates four main categories of data being manipulated by the PatchDSL. These categories and the nomenclature for their interaction are represented below.

Reg:: They represent machine registers storing data created and used by the target program.
Temp:: They represent machine registers storing data used by the instrumentation. Those are temporary scratch registers which were allocated by saving a Reg into the context and are bound to be deallocated by restoring the Reg value from the context.
Context:: The context stores in memory the processor state associated with the target program. It is mostly used for context switching between the target program and the instrumentation process and also for allocating temporary registers.
Shadows:: They represent shadow data associated with a patch and an instrumented instruction. They can be used by QBDI to store constants or Tagged Shadows.
Metadata:: Any data regarding the execution which can be generated by QBDI. For example, obtaining instruction operand values or memory access addresses.

The main objective of this strict naming convention is to make the distinction between pure (no side effects) operations and operations affecting the program state as clear as possible. To make this distinction even more apparent the DSL is strongly typed forcing variables to be declared by allocating one of the typed structures below with its constructor. Because some of those structures simply alias an integer value it can be tempting to directly use the integer value, letting the compiler do the implicit type conversion. However the point of those structures is to give a context to what those integer constants represent and the best practice is to use them everywhere possible.

struct Reg

Structure representing a register variable in PatchDSL.

Public Functions

inline Reg(unsigned int id)

Create a new register variable.

Parameters:: id – [in] The id of the register to represent.

inline unsigned int getID() const

Get back the id of the register in GPRState

Returns:: GPRState register id.

inline operator RegLLVM() const

Convert this structure to an LLVM register id.

Returns:: LLVM register id.

inline unsigned int getValue() const

Get the llvm value of the register

Returns:: llvm register id.

inline rword offset() const

Return the offset of this register storage in the context part of the data block.

Returns:: The offset.

inline bool operator<(const Reg &o) const: Needed to create a std::set

struct Temp

Structure representing a temporary register variable in PatchDSL.

Public Functions

inline Temp(unsigned int id)

Represent a temporary register variable idenified by a unique ID. Inside a patch rules or a instrumentation rules, Temp with identical ids point to the same physical register. The id 0xFFFFFFFF is reserved for internal uses. The mapping from id to physical register is determined at generation time and the allocation and deallocation instructions are automatically added to the patch.

Parameters:: id – [in] The id of the temp to represent.

inline operator unsigned int() const

Convert this Temp to its id.

Returns:: This Temp id.

struct Shadow

Structure representing a shadow variable in PatchDSL.

Public Functions

inline Shadow(uint16_t tag)

Allocate a new shadow variable in the data block with the corresponding tag.

Parameters:: tag – [in] The tag of the new shadow variable.

inline rword getTag() const

Return the tag associated with this shadow variable.

Returns:: The tag of the shadow variable.

struct Constant

Structure representing a constant value in PatchDSL.

Public Functions

inline Constant(rword v)

Represent a constant value.

Parameters:: v – [in] The represented value.

inline operator rword() const

Convert this structure to its value.

Returns:: This constant value.

struct Offset

Structure representing a memory offset variable in PatchDSL.

Public Functions

inline Offset(int64_t offset)

Allocate a new offset variable with its offset value.

Parameters:: offset – [in] The offset value

inline Offset(Reg reg)

Allocate a new offset variable with the offset in the context of a specific register.

Parameters:: reg – [in] The register whose offset to represent.

inline operator int64_t() const

Convert this structure to its value.

Returns:: This offset value.

struct Operand

Structure representing an operand instruction variable in PatchDSL.

Public Functions

inline Operand(unsigned int idx)

Represent an operand instruction identified by its index in the LLVM MCInst representation of the instruction.

Parameters:: idx – [in] The operand index.

inline operator unsigned int() const

Convert this Operand to its idx.

Returns:: This Operand idx.

Statements 

There are three main categories of statements composing PatchDSL, each characterized by different virtual base classes. The specializations of those base classes are PatchDSL statements.

QBDI::PatchCondition: They are used to match specific instructions. They take the instruction and its context as an input and return a boolean.
QBDI::PatchGenerator: They represent operations generating new instructions. They take the instruction and its context as an input and return a list of QBDI::RelocatableInst constituting the patch. In some exceptional cases no output is generated.
QBDI::InstTransform: They represent operations transforming an instruction. They only manipulate an instruction and need to be used with a QBDI::PatchGenerator to output a QBDI::RelocatableInst.

Those statements are all evaluated on an implicit context. In the case of QBDI::InstTransform the context is the instruction to modify which is determined by the QBDI::PatchGenerator which uses it. In the case of QBDI::PatchCondition and QBDI::PatchGenerator this context is made of:

the current instruction
the current instruction size
the current address

The output of each statement thus depends on the statement parameters and this implicit context.

Rules 

PatchDSL is used to write short sequences of statements called rules. There exists two variants of rules, patching rules (QBDI::PatchRule) and instrumentation rules (QBDI::InstrRule), but they both rely on the same principle. A rule is composed of two parts:

Condition:: A QBDI::PatchCondition statement which expresses the condition under which the rule should be applied. Multiple statements can be combined in a boolean expression using QBDI::Or and QBDI::And. If the evaluation of this expression returns true then the generation part of the rule is evaluated.
Generation:: A list of QBDI::PatchGenerator statements which will generate the patch rule. Each statement can output one or several QBDI::RelocatableInst, the resulting patch being the aggregation of all those statement outputs.

class PatchRule

A patch rule written in PatchDSL.

Public Functions

PatchRule(std::unique_ptr<PatchCondition> &&condition, std::vector<std::unique_ptr<PatchGenerator>> &&generators)

Allocate a new patch rule with a condition and a list of generators.

Parameters:

condition – [in] A PatchCondition which determine wheter or not this PatchRule applies.
generators – [in] A vector of PatchGenerator which will produce the patch instructions.

bool canBeApplied(const Patch &patch, const LLVMCPU &llvmcpu) const

Determine wheter this rule applies by evaluating this rule condition on the current context.

Parameters:

patch – [in] The Patch to check
llvmcpu – [in] LLVMCPU object

Returns:

True if this patch condition evaluate to true on this context.

void apply(Patch &patch, const LLVMCPU &llvmcpu) const

Generate this rule output patch by evaluating its generators on the current context. Also handles the temporary register management for this patch.

Parameters:

patch – [in] The Patch where to apply the rule
llvmcpu – [in] LLVMCPU object

class InstrRule

An instrumentation rule written in PatchDSL.

Subclassed by QBDI::AutoUnique< InstrRule, InstrRuleBasicCBK >, QBDI::AutoUnique< InstrRule, InstrRuleDynamic >, QBDI::AutoUnique< InstrRule, InstrRuleUser >

Public Functions

virtual bool tryInstrument(Patch &patch, const LLVMCPU &llvmcpu) const = 0

Determine wheter this rule have to be apply on this Path and instrument if needed.

Parameters:

patch – [in] The current patch to instrument.
llvmcpu – [in] LLVMCPU object

void instrument(Patch &patch, const PatchGeneratorUniquePtrVec &patchGen, bool breakToHost, InstPosition position, int priority, RelocatableInstTag tag) const

Instrument a patch by evaluating its generators on the current context. Also handles the temporary register management for this patch.

Parameters:

patch – [in] The current patch to instrument.
patchGen – [in] The list of patchGenerator to apply
breakToHost – [in] Add a break to VM need to be add after the patch
position – [in] Add the patch before or after the instruction
priority – [in] The priority of this patch
tag – [in] The tag for this patch

Transforms 

Transform statements, with the QBDI::InstTransform virtual base class, are a bit more subtle than other statements.

Currently their operation is limited to the QBDI::ModifyInstruction generators which always operate on the instruction of the implicit context of a patch or instrumentation rule. However their usage could be extended in the future.

Their purpose is to allow to write more generic rules by allowing modifications which can operate on a class of instructions. Using instruction transforms requires to understand the underlying LLVM MCInst representation of an instruction and llvm-mc -show-inst is a helpful tool for this task.

PatchDSL Examples 

Below some real examples of patch and instrumentation rules are shown.

Basic Patching 

Generic PC Substitution Patch Rule

Instructions using the Program Counter (PC) in their computations are problematic because QBDI will reassemble and execute the code at another address than the original code location and thus the value of the PC will change. This kind of computation using the PC is often found when using relative memory addressing.

Some cases can be more difficult to handle, but most of these instructions can be patched using a very simple generic rule performing the following steps:

Allocate a scratch register by saving a register value in the context part of the data block.
Load the value the PC register should have, into the scratch register.
Perform the original instruction but with PC replaced by the scratch register.
Deallocate the scratch register by restoring the register value from the context part of the data block.

The PatchDSL QBDI::PatchRule handles step 1 and 4 automatically for us. Expressing step 2 and 3 is relatively simple:

PatchRule(
    // Condition: Applies on every instruction using the register REG_PC
    UseReg(Reg(REG_PC)),
    // Generators: list of statements generating the patch
    {
        // Compute PC + 0 and store it in a new temp with id 0
        GetPCOffset(Temp(0), Constant(0)),
        // Modify the instruction by substituting REG_PC with the temp having id 0
        ModifyInstruction({
            SubstituteWithTemp(Reg(REG_PC), Temp(0))
        })
    }
)

This rule is generic and works under X86_64 as well as ARM. Some more complex cases of instructions using PC need to be handled another way though.

Simple Branching Instruction Patch

Another simple case which needs to be handled using a patch rule is branching instructions. They cannot be executed because that would mean the DBI process would lose control of execution. Instead of executing the branch operation, the branch target is computed and used to overwrite the value of the PC in the context part of the data block. This is followed by a context switch back to the VM which will use this target as the address where to continue the execution.

The simplest cases are the “branch to an address stored in a register” instructions. Again the temporary register allocation is automatically taken care of by the QBDI::PatchRule and we only need to write the patching logic:

PatchRule(
    // Condition: only on BX or BX_pred LLVM MCInst
    Or({
        OpIs(llvm::ARM::BX),
        OpIs(llvm::ARM::BX_pred)
    }),
    // Generators
    {
        // Obtain the value of the operand with index 0 and store it in a new temp with id 0
        GetOperand(Temp(0), Operand(0)),
        // Write the temp with id 0 at the offset in the data block of the context value of REG_PC.
        WriteTemp(Temp(0), Offset(Reg(REG_PC)))
    }
)

Two things are important to notice here. First we use QBDI::Or to combine multiple QBDI::PatchCondition. Second the fact we need to stop the execution here and switch back to the context of the VM is not expressed in the patch. Indeed the patching engine simply notices that this patch overwrites the value of the PC and thus needs to end the basic block after it.

Advanced Patching 

Conditional Branching Instruction Patch

The previous section dealt with simple patching cases where the rule does not need to be very complex. Conditional instructions can add a significant amount of complexity to the writing of patch rules and require some tricks. Below is the patch for the ARM conditional branching instruction:

PatchRule(
    // Condition: every Bcc instructions (e.g. BNE, BEQ, etc.)
    OpIs(llvm::ARM::Bcc),
    // Generators
    {
        // Compute the Bcc target (which is PC relative) and store it in a new temp with id 0
        GetPCOffset(Temp(0), Operand(0)),
        // Modify the jump target such as it potentially skips the next generator
        ModifyInstruction({
            SetOperand(Operand(0), Constant(0))
        }),
        // Compute the next instruction address and store it in temp with id 0
        GetPCOffset(Temp(0), Constant(-4)),
        // At this point either:
        //  * The jump was not taken and Temp(0) stores the next instruction address.
        //  * The jump was taken and Temp(0) stores the Bcc target
        // We thus write Temp(0) which has the correct next address to execute in the REG_PC
        // value in the context part of the data block.
        WriteTemp(Temp(0), Offset(Reg(REG_PC)))
    }
)

As we can see, this code reuses the original conditional branching instruction to create a conditional move. While this is a trick, it is an architecture independent trick which is also used under X86_64. Some details can be noted though. First the next instruction address is PC - 4 which is an ARM specificity. Secondly, the constant used to overwrite the jump target needs to be determined by hand as QBDI does not have the capacity to compute it automatically.

Complex InstTransform

The patch below is used to patch instructions which load their branching target from a memory address under X86_64. It exploits QBDI::InstTransform to convert the instruction into a load from memory to obtain this branching target:

PatchRule(
    // Condition: applies on CALL where the target is at a relative memory location (thus uses REG_PC)
    And({
        OpIs(llvm::X86::CALL64m),
        UseReg(Reg(REG_PC))
    }),
    // Generators
    {
        // First compute PC + 0 and stores it into a new temp with id 0
        GetPCOffset(Temp(0), Constant(0)),
        // Transforms the CALL *[RIP + ...] into MOV Temp(1), *[Temp(0) + ...]
        ModifyInstruction({
            // RIP is replaced with Temp(0)
            SubstituteWithTemp(Reg(REG_PC), Temp(0)),
            // The opcode is changed to a 64-bit MOV from memory to a register
            SetOpcode(llvm::X86::MOV64rm),
            // We insert the destination register, a new temp with id 1,  at the beginning of
            // the operand list
            AddOperand(Operand(0), Temp(1))
        }),
        // Temp(1) thus contains the CALL target.
        // We use the X86_64 specific SimulateCall with this target.
        SimulateCall(Temp(1))
    }
)

A few things need to be noted. First the sequence of QBDI::InstTransform is complex because it substitutes RIP and it mutates the CALL into a MOV. Secondly, new QBDI::Temp can be instantiated and used anywhere in the program. Lastly, some complex architecture specific mechanisms have been abstracted in single QBDI::PatchGenerator, like QBDI::SimulateCall.

Instrumentation Callbacks 

QBDI::InstrRule allows inserting inline instrumentation inside the patch with a concept similar to the rules shown previously. Callbacks to host code are triggered by a break to host with specific variables set correctly in the host state part of the context:

the hostState.callback should be set to the callback function address to call.
the hostState.data should be set to the callback function data parameter.
the hostState.origin should be set to the ID of the current instruction (see QBDI::GetInstId).

In practice, there exists a function which can generate the PatchGenerator needed to setup those variables correctly:

PatchGenerator::UniquePtrVec QBDI::getCallbackGenerator(InstCallback cbk, void *data)

Output a list of PatchGenerator which would set up the host state part of the context for a callback.

Parameters:

cbk – [in] The callback function to call.
data – [in] The data to pass as an argument to the callback function.

Returns:

A list of PatchGenerator to set up this callback call.

Thus, in practice, a QBDI::InstrRule which would set up a callback on every instruction writing data in memory would look like this:

InstrRule(
    // Condition: on every instruction making write access
    DoesWriteAccess(),
    // Generators: set up a callback to someCallbackFunction with someParameter
    getCallbackGenerator(someCallbackFunction, someParameter),
    // Position this instrumentation after the instruction
    InstPosition::POSTINST,
    // Break to the host after the instrumentation (required for the callback to be made)
    true
));

However the callback generator can be written directly in PatchDSL for more advantageous usages. The instrumentation rules below pass directly the written data as the callback parameter:

InstrRule(
    // Condition: on every instruction making write access
    DoesWriteAccess(),
    // Generators: set up a callback to someCallbackFunction with someParameter
    {
        // Set hostState.callback to the callback function address
        GetConstant(Temp(0), Constant((rword) someCallbackFunction)),
        WriteTemp(Temp(0), Offset(offsetof(Context, hostState.callback))),
        // Set hostState.data as the written value
        GetWriteValue(Temp(0)),
        WriteTemp(Temp(0), Offset(offsetof(Context, hostState.data))),
        // Set hostState.origin as the current instID
        GetInstId(Temp(0)),
        WriteTemp(Temp(0), Offset(offsetof(Context, hostState.origin)))
    },
    // Position this instrumentation after the instruction
    QBDI::InstPosition::POSTINST,
    // Break to the host after the instrumentation (required for the callback to be made)
    true
));