libTriton  version 0.4 build 1353
Taint Engine

Table of Contents

[internal] All information about the taint engine.

Description


Taint analysis is used to know at each program point what part of memory and register are controllable by the user input. There is three kinds of taint granularity but an infinite number of ways to implement this analysis:

Triton uses an over-approximation and we will describe why.

An Over-Approximation


With an over-approximation, we lose precision and we may provide false positives. Example:

mov ax, 0x1122 ; RAX is untainted
mov al, byte ptr [user_input] ; RAX is tainted
cmp ah, 0x99 ; can we control this comparison?

If we ask to the taint engine if we can control the comparison, it will say YES because RAX is tagged has tainted even if it's false. Actually, RAX[63..8] is not tainted but RAX[7..0] is.

The only advantages of an over-approximation are:

This method is destructive on a big program, and so, totally useless for an analyst. An analyst wants precisions even if this is not all possibilities. Then, why an analyst may want to know if a register is tainted?

In exploit development, what the user wants in reality is knowing if a register is controllable by himself and knowing what values can hold this register at specific program point. Taint analysis (any over-approximation you choose) cannot give you this kind of information. A lots of instructions have an influence on the value that can hold a register. (Path conditions, arithmetic operations, ...)

The big question is: How can we gain time without losing precision?

Applying a symbolic execution and asking a model at each program point to know if a register is controllable or not is pretty expensive. Therefore, we use an over-approximation to fix the loss of time and if a register is tainted, we ask a model for the precision.

e.g: Imagine this 16-bits register [x-x-x---x-xx-x-x] where x are bits that the user can control and - bits that the user cannot control. This state of register is setup like that due to arithmetic operations but may be something else with a different input value. In this case, it's not useful to know what bits are controllable by the user because they will probably change with another input value. In this case, using a perfect-approximation or an under-approximation is not useful. What we want is only knowing what values can hold this register according to the input.

That's why Triton uses symbolic execution for precision and an over-approximated tainting to know if we can ask a model to the SMT solver - Asking a model means that the symbolic variables are controllable by the user input.