PatchDSL
Language Concepts
Type System
The PatchDSL nomenclature is formalized in function of where information belongs using two dichotomies:
A physical dichotomy is made between data stored in the machine registers and data stored in memory.
A conceptual dichotomy is made between data belonging or generated by the target program and data belonging or generated by QBDI.
This creates four main categories of data being manipulated by the PatchDSL. These categories and the nomenclature for their interaction are represented below.
- Reg:
They represent machine registers storing data created and used by the target program.
- Temp:
They represent machine registers storing data used by the instrumentation. Those are temporary scratch registers which were allocated by saving a Reg into the context and are bound to be deallocated by restoring the Reg value from the context.
- Context:
The context stores in memory the processor state associated with the target program. It is mostly used for context switching between the target program and the instrumentation process and also for allocating temporary registers.
- Shadows:
They represent shadow data associated with a patch and instrumented instruction. They can be used by QBDI to store constants or Tagged Shadows.
- Metadata:
Any data regarding the execution which can be generated by QBDI. For examples obtaining instruction operand values or memory access addresses.
The main objective of this strict naming convention is to make the distinction between pure (no side effects) operations and operations affecting the program state as clear as possible. To make this distinction even more apparent the DSL is strongly typed forcing variables to be declared by allocating one of the typed structures below with its constructor. Because some of those structures simply alias an integer value it can be tempting to directly use the integer value, letting the compiler do the implicit type conversion. However the point of those structures is to give a context to what those integer constants represent and the best practice is to use them everywhere possible.
-
struct Reg
Structure representing a register variable in PatchDSL.
Public Functions
-
inline Reg(unsigned int id)
Create a new register variable.
- Parameters:
id – [in] The id of the register to represent.
-
inline unsigned int getID() const
Get back the id of the register in GPRState
- Returns:
GPRState register id.
-
inline operator RegLLVM() const
Convert this structure to an LLVM register id.
- Returns:
LLVM register id.
-
inline unsigned int getValue() const
Get the llvm value of the register
- Returns:
llvm register id.
-
inline Reg(unsigned int id)
-
struct Temp
Structure representing a temporary register variable in PatchDSL.
Public Functions
-
inline Temp(unsigned int id)
Represent a temporary register variable idenified by a unique ID. Inside a patch rules or a instrumentation rules, Temp with identical ids point to the same physical register. The id 0xFFFFFFFF is reserved for internal uses. The mapping from id to physical register is determined at generation time and the allocation and deallocation instructions are automatically added to the patch.
- Parameters:
id – [in] The id of the temp to represent.
-
inline Temp(unsigned int id)
-
struct Shadow
Structure representing a shadow variable in PatchDSL.
Public Functions
-
inline Shadow(uint16_t tag)
Allocate a new shadow variable in the data block with the corresponding tag.
- Parameters:
tag – [in] The tag of the new shadow variable.
-
inline Shadow(uint16_t tag)
-
struct Constant
Structure representing a constant value in PatchDSL.
-
struct Offset
Structure representing a memory offset variable in PatchDSL.
Public Functions
-
inline Offset(int64_t offset)
Allocate a new offset variable with its offset value.
- Parameters:
offset – [in] The offset value
-
inline Offset(Reg reg)
Allocate a new offset variable with the offset in the context of a specific register.
- Parameters:
reg – [in] The register whose offset to represent.
-
inline operator int64_t() const
Convert this structure to its value.
- Returns:
This offset value.
-
inline Offset(int64_t offset)
Statements
There are three main categories of statements composing PatchDSL, each characterized by a different virtual base classes. The specialization of those base classes are PatchDSL statements.
QBDI::PatchCondition
They are used to match specific instructions. They take the instruction and its context as an input and return a boolean.
QBDI::PatchGenerator
They represent operations generating new instructions. They take the instruction and its context as an input and return a list of
QBDI::RelocatableInst
constituting the patch. In some exceptional cases no output is generated.QBDI::InstTransform
They represent operations transforming an instruction. They only manipulate an instruction and need to be used with a
QBDI::PatchGenerator
to output aQBDI::RelocatableInst
.
Those statements are all evaluated on an implicit context. In the case of
QBDI::InstTransform
the context is the instruction to modify which is determined by
the QBDI::PatchGenerator
which use it. In the case of QBDI::PatchCondition
and QBDI::PatchGenerator
this context is made of:
the current instruction
the current instruction size
the current address
The output of each statement thus depends on the statement parameters and this implicit context.
Rules
PatchDSL is used to write short sequence of statements called rules. There exists two variants of
rules, patching rules (QBDI::PatchRule
) and instrumentation rules
(QBDI::InstrRule
), but they both relies on the same principle. A rule is composed of
two parts:
- Condition:
A
QBDI::PatchCondition
statement which express the condition under which the rule should be applied. Multiple statements can be combined in a boolean expression usingQBDI::Or
andQBDI::And
. If the evaluation of this expression returnstrue
then the generation part of the rule is evaluated.- Generation:
A list of
QBDI::PatchGenerator
statements which will generate the patch rule. Each statement can output one or severalQBDI::RelocatableInst
, the resulting patch being the aggregation of all those statement outputs.
-
class PatchRule
A patch rule written in PatchDSL.
Public Functions
-
PatchRule(std::unique_ptr<PatchCondition> &&condition, std::vector<std::unique_ptr<PatchGenerator>> &&generators)
Allocate a new patch rule with a condition and a list of generators.
- Parameters:
condition – [in] A PatchCondition which determine wheter or not this PatchRule applies.
generators – [in] A vector of PatchGenerator which will produce the patch instructions.
-
bool canBeApplied(const Patch &patch, const LLVMCPU &llvmcpu) const
Determine wheter this rule applies by evaluating this rule condition on the current context.
- Parameters:
patch – [in] The Patch to check
llvmcpu – [in] LLVMCPU object
- Returns:
True if this patch condition evaluate to true on this context.
-
void apply(Patch &patch, const LLVMCPU &llvmcpu) const
Generate this rule output patch by evaluating its generators on the current context. Also handles the temporary register management for this patch.
- Parameters:
patch – [in] The Patch where to apply the rule
llvmcpu – [in] LLVMCPU object
-
PatchRule(std::unique_ptr<PatchCondition> &&condition, std::vector<std::unique_ptr<PatchGenerator>> &&generators)
-
class InstrRule
An instrumentation rule written in PatchDSL.
Subclassed by QBDI::AutoUnique< InstrRule, InstrRuleBasicCBK >, QBDI::AutoUnique< InstrRule, InstrRuleDynamic >, QBDI::AutoUnique< InstrRule, InstrRuleUser >
Public Functions
-
virtual bool tryInstrument(Patch &patch, const LLVMCPU &llvmcpu) const = 0
Determine wheter this rule have to be apply on this Path and instrument if needed.
- Parameters:
patch – [in] The current patch to instrument.
llvmcpu – [in] LLVMCPU object
-
void instrument(Patch &patch, const PatchGeneratorUniquePtrVec &patchGen, bool breakToHost, InstPosition position, int priority, RelocatableInstTag tag) const
Instrument a patch by evaluating its generators on the current context. Also handles the temporary register management for this patch.
- Parameters:
patch – [in] The current patch to instrument.
patchGen – [in] The list of patchGenerator to apply
breakToHost – [in] Add a break to VM need to be add after the patch
position – [in] Add the patch before or after the instruction
priority – [in] The priority of this patch
tag – [in] The tag for this patch
-
virtual bool tryInstrument(Patch &patch, const LLVMCPU &llvmcpu) const = 0
Transforms
Transform statements, with the QBDI::InstTransform
virtual base class, are a bit more
subtle than other statements.
Currently their operation is limited to the QBDI::ModifyInstruction
generators which
always operate on the instruction of the implicit context of a patch or instrumentation rule.
However their usage could be extended in the future.
Their purpose is to allow to write more generic rules by allowing modifications which can operate
on a class of instructions. Using instruction transforms requires to understand the underlying
LLVM MCInst representation of an instruction and llvm-mc -show-inst
is an helpful tool for this
task.
PatchDSL Examples
Below some real examples of patch and instrumentation rules are shown.
Basic Patching
Generic PC Substitution Patch Rule
Instructions using the Program Counter (PC) in their computations are problematic because QBDI will reassemble and execute the code at another address than the original code location and thus the value of the PC will change. This kind of computation using the PC is often found when using relative memory addressing.
Some cases can be more difficult to handle, but most of these instructions can be patched using a very simple generic rule performing the following steps:
Allocate a scratch register by saving a register value in the context part of the data block.
Load the value the PC register should have, into the scratch register.
Perform the original instruction but with PC replaced by the scratch register.
Deallocate the scratch register by restoring the register value from context part of the data block.
The PatchDSL QBDI::PatchRule
handles step 1 and 4 automatically for us. Expressing
step 2 and 3 is relatively simple:
PatchRule(
// Condition: Applies on every instruction using the register REG_PC
UseReg(Reg(REG_PC)),
// Generators: list of statements generating the patch
{
// Compute PC + 0 and store it in a new temp with id 0
GetPCOffset(Temp(0), Constant(0)),
// Modify the instruction by substituting REG_PC with the temp having id 0
ModifyInstruction({
SubstituteWithTemp(Reg(REG_PC), Temp(0))
})
}
)
This rule is generic and works under X86_64 as well as ARM. Some more complex cases of instructions using PC need to be handled another way though.
Simple Branching Instruction Patch
Another simple case which needs to be handled using a patch rule is branching instructions. They cannot be executed because that would mean DBI process would lose the hand on the execution. Instead of executing the branch operation, the branch target is computed and used to overwrite the value of the PC in the context part of the data block. This is followed by a context switch back to the VM which will use this target as the address where to continue the execution.
The simplest cases are the “branch to an address stored in a register” instructions. Again the
temporary register allocation is automatically taken care of by the QBDI::PatchRule
and
we only need to write the patching logic:
PatchRule(
// Condition: only on BX or BX_pred LLVM MCInst
Or({
OpIs(llvm::ARM::BX),
OpIs(llvm::ARM::BX_pred)
}),
// Generators
{
// Obtain the value of the operand with index 0 and store it in a new temp with id 0
GetOperand(Temp(0), Operand(0)),
// Write the temp with id 0 at the offset in the data block of the context value of REG_PC.
WriteTemp(Temp(0), Offset(Reg(REG_PC)))
}
)
Two things are important to notice here. First we use QBDI::Or
to combine multiple
QBDI::PatchCondition
. Second the fact we need to stop the execution here and switch
back to the context of the VM is not expressed in the patch. Indeed the patching engine simply
notices that this patch overwrites the value of the PC and thus needs to end the basic block after
it.
Advanced Patching
Conditional Branching Instruction Patch
The previous section dealt with simple patching cases where the rule does not need to be very complex. Conditional instructions can add a significant amount of complexity to the writing of a patch rules and requires some tricks. Below is the patch for the ARM conditional branching instruction:
PatchRule(
// Condition: every Bcc instructions (e.g. BNE, BEQ, etc.)
OpIs(llvm::ARM::Bcc),
// Generators
{
// Compute the Bcc target (which is PC relative) and store it in a new temp with id 0
GetPCOffset(Temp(0), Operand(0)),
// Modify the jump target such as it potentially skips the next generator
ModifyInstruction({
SetOperand(Operand(0), Constant(0))
}),
// Compute the next instruction address and store it in temp with id 0
GetPCOffset(Temp(0), Constant(-4)),
// At this point either:
// * The jump was not taken and Temp(0) stores the next instruction address.
// * The jump was taken and Temp(0) stores the Bcc target
// We thus write Temp(0) which has the correct next address to execute in the REG_PC
// value in the context part of the data block.
WriteTemp(Temp(0), Offset(Reg(REG_PC)))
}
)
As we can see, this code reuses the original conditional branching instruction to create a conditional move. While this is a trick, it is an architecture independent trick which is also used under X86_64. Some details can be noted though. First the next instruction address is PC - 4 which is an ARM specificity. Secondly, the constant used to overwrite the jump target needs to be determined by hand as QBDI does not have the capacity to compute it automatically.
Complex InstTransform
The patch below is used to patch instructions which load their branching target from a memory
address under X86_64. It exploits QBDI::InstTransform
to convert the instruction
into a load from memory to obtain this branching target:
PatchRule(
// Condition: applies on CALL where the target is at a relative memory location (thus uses REG_PC)
And({
OpIs(llvm::X86::CALL64m),
UseReg(Reg(REG_PC))
}),
// Generators
{
// First compute PC + 0 and stores it into a new temp with id 0
GetPCOffset(Temp(0), Constant(0)),
// Transforms the CALL *[RIP + ...] into MOV Temp(1), *[Temp(0) + ...]
ModifyInstruction({
// RIP is replaced with Temp(0)
SubstituteWithTemp(Reg(REG_PC), Temp(0)),
// The opcode is changed to a 64 bits MOV from memory to a register
SetOpcode(llvm::X86::MOV64rm),
// We insert the destination register, a new temp with id 1, at the beginning of
// the operand list
AddOperand(Operand(0), Temp(1))
}),
// Temp(1) thus contains the CALL target.
// We use the X86_64 specific SimulateCall with this target.
SimulateCall(Temp(1))
}
)
A few things need to be noted. First the sequence of QBDI::InstTransform
is complex
because it substitutes RIP and it mutates the CALL into a MOV. Secondly, new QBDI::Temp
can be instantiated and used anywhere in the program. Lastly, some complex architecture specific
mechanisms have been abstracted in single QBDI::PatchGenerator
, like
QBDI::SimulateCall
.
Instrumentation Callbacks
QBDI::InstrRule
allows to insert inline instrumentation inside the patch with a concept
similar to the rules shown previously. Callbacks to host code are triggered by a break to host with
specific variables set correctly in the host state part of the context:
the hostState.callback should be set to the callback function address to call.
the hostState.data should be set to the callback function data parameter.
the hostState.origin should be set to the ID of the current instruction (see
QBDI::GetInstId
).
In practice, there exists a function which can generate the PatchGenerator needed to setup those variables correctly:
-
PatchGenerator::UniquePtrVec QBDI::getCallbackGenerator(InstCallback cbk, void *data)
Output a list of PatchGenerator which would set up the host state part of the context for a callback.
- Parameters:
cbk – [in] The callback function to call.
data – [in] The data to pass as an argument to the callback function.
- Returns:
A list of PatchGenerator to set up this callback call.
Thus, in practice, a QBDI::InstrRule
which would set up a callback on every
instruction writing data in memory would look like this:
InstrRule(
// Condition: on every instruction making write access
DoesWriteAccess(),
// Generators: set up a callback to someCallbackFunction with someParameter
getCallbackGenerator(someCallbackFunction, someParameter),
// Position this instrumentation after the instruction
InstPosition::POSTINST,
// Break to the host after the instrumentation (required for the callback to be made)
true
));
However the callback generator can be written directly in PatchDSL for more advantageous usages. The instrumentation rules below pass directly the written data as the callback parameter:
InstrRule(
// Condition: on every instruction making write access
DoesWriteAccess(),
// Generators: set up a callback to someCallbackFunction with someParameter
{
// Set hostState.callback to the callback function address
GetConstant(Temp(0), Constant((rword) someCallbackFunction)),
WriteTemp(Temp(0), Offset(offsetof(Context, hostState.callback))),
// Set hostState.data as the written value
GetWriteValue(Temp(0)),
WriteTemp(Temp(0), Offset(offsetof(Context, hostState.data))),
// Set hostState.origin as the current instID
GetInstId(Temp(0)),
WriteTemp(Temp(0), Offset(offsetof(Context, hostState.origin)))
}
// Position this instrumentation after the instruction
QBDI::InstPosition::POSTINST,
// Break to the host after the instrumentation (required for the callback to be made)
true
));