libTriton  version 0.6 build 1389
Taint Engine

Table of Contents

[internal] All information about the taint engine.


The purpose of dynamic taint analysis is to track the information flow from the sources (usually user inputs) to the targets (such as control-flow value). It is thus capable of analyzing which region of the memory and registers are controllable by user inputs, which enables a further inspection of security properties such as whether the control-flow values can be infected by user inputs to allow control-flow hijacking.

A taint policy typically consists of the following three parts:

1) taint introduction rules 2) taint propagation rules 3) taint checking rules

The libtriton's TaintEngine implements an architecture-independent tag propagation rules while providing APIs for the taint introduction and taint checking, enabling users to define their own taint policy. We currently implement the semantics of x86 ISA only, but the core propagation rules can be extended to other ISAs as well.

An Over-Approximation

When implementing the taint propagation rules, there are three implementation choices, each with accuracy-performance trade-offs.

Triton currently implements an over-approximation, which has the following advantages over precise-approximation:

An over-approximation is also more suitable in the bit-level granularity as illustrated in the following scenario:

e.g: Imagine this 16-bits register [x-x-x---x-xx-x-x] where x are bits that the user can control and - bits that the user cannot control. This state of register is setup like that due to arithmetic operations but may be something else with a different input value. In this case, it's not useful to know what bits are controllable by the user because they will probably change with another input value. In this case, using a perfect-approximation or an under-approximation is not useful. What we want is only knowing what values can hold this register according to the input.

Precision with the Cost in Performance

An over-approximation may sacrifice the precision for the sake of simplicity and performance. Let's look at the following ASM code.

mov ax, 0x1122 ; RAX is untainted
mov al, byte ptr [user_input] ; RAX is tainted
cmp ah, 0x99 ; can we control this comparison?

Under our current over-approximation, the taint engine will raise a false positive (that says the ah is tainted while it actually is not) for this particular example, while only the seven bits of RAX (RAX[7..0]) is tainted while the other bits (RAX[63..8]) are not.

This imprecision may raise excessively extraneous false positive on a big problem and make the tool totally useless in solving real problems. Let's consider a scenario when an attacker is developing an exploit of an executable. In this scenario, what the attacker wants to know is if a register at certain program location can be controlled by himself, and furthermore, to what values the register can be manipulated to hold. An over-approximation method including the dynamic taint analysis is not capable of giving him/her such elaborate information there are so many instructions which can manipulate a register at any given program location.

In such situations, you can harness the power of symbolic execution by querying a model at a program point. This is a much pricier operation to perform than over-approximating dynamic taint analysis, but one can gain the precision while paying the cost of performance.

This is why Triton uses symbolic execution for precision and an over-approximated tainting to know if we can ask a model to the SMT solver - by asking a model, we can query the solver and check if the symbolic variables are controllable by the user input.