libTriton  version 0.4 build 1355
AST Representations

Table of Contents

[python api] All information about the ast python module.

Description


Triton converts the x86 and the x86-64 instruction set semantics into AST representations which allows you to perform precise analysis and allows you to build and to modify your own symbolic expressions.

>>> from triton import ast
>>> ast
<module 'triton.ast' (built-in)>

AST Form


Instruction: add rax, rdx
Expression: ref!41 = (bvadd ((_ extract 63 0) ref!40) ((_ extract 63 0) ref!39))

As all Triton's expressions are on SSA form, the ref!41 is the new expression of the RAX register, the ref!40 is the previous expression of the RAX register and the ref!39 is the previous expression of the RDX register. An Instruction may contain several expressions (SymbolicExpression). For example, the previous add rax, rdx instruction contains 7 expressions: 1 ADD semantics and 6 flags (AF, CF, OF, PF, SF and ZF) semantics where each flag is stored in a new SymbolicExpression.

Instruction: add rax, rdx
Expressions: ref!41 = (bvadd ((_ extract 63 0) ref!40) ((_ extract 63 0) ref!39))
ref!42 = (ite (= (_ bv16 64) (bvand (_ bv16 64) (bvxor ref!41 (bvxor ((_ extract 63 0) ref!40) ((_ extract 63 0) ref!39))))) (_ bv1 1) (_ bv0 1))
ref!43 = (ite (bvult ref!41 ((_ extract 63 0) ref!40)) (_ bv1 1) (_ bv0 1))
ref!44 = (ite (= ((_ extract 63 63) (bvand (bvxor ((_ extract 63 0) ref!40) (bvnot ((_ extract 63 0) ref!39))) (bvxor ((_ extract 63 0) ref!40) ref!41))) (_ bv1 1)) (_ bv1 1) (_ bv0 1))
ref!45 = (ite (= (parity_flag ((_ extract 7 0) ref!41)) (_ bv0 1)) (_ bv1 1) (_ bv0 1))
ref!46 = (ite (= ((_ extract 63 63) ref!41) (_ bv1 1)) (_ bv1 1) (_ bv0 1))
ref!47 = (ite (= ref!41 (_ bv0 64)) (_ bv1 1) (_ bv0 1))

Triton deals with 64-bits registers (and 128-bits for SSE). It means that it uses the concat and extract functions when operations are performed on subregister.

mov al, 0xff -> ref!193 = (concat ((_ extract 63 8) ref!191) (_ bv255 8))
movsx eax, al -> ref!195 = ((_ zero_extend 32) ((_ sign_extend 24) ((_ extract 7 0) ref!193)))

On the line 1, a new 64bit-vector is created with the concatenation of RAX[63..8] and the concretization of the value 0xff. On the line 2, according to the AMD64 behavior, if a 32-bit register is written, the CPU clears the 32-bit MSB of the corresponding register. So, in this case, we apply a sign extension from al to EAX, then a zero extension from EAX to RAX.

AST representation


An abstract syntax tree (AST) is a representation of a grammar as tree. Triton uses a custom AST for its expressions. As all expressions are built at runtime, an AST is available at each program point. For example, let assume this set of instructions:

mov al, 1
mov cl, 10
mov dl, 20
xor cl, dl
add al, cl

At the line 5, the AST of the AL register looks like this:

This AST represents the semantics of the AL register at the program point 5 from the program point 1. Note that this AST has been simplified for a better comprehension. The real AST contains some concat and extract as mentioned in the previous chapter. According to the API you can build and modify your own AST. Then, you can perform some modifications and simplifications before sending it to the solver.

The AST reference node


To manage more easily the subtree and to keep the SSA form of registers and memory, we have added a REFERENCE node which is a "terminate" node of a tree but contains a reference to another subtree. Below, an example of one "partial" tree linked with two other subtrees.

If you try to go through the full AST you will fail at the first reference node because a reference node does not contains child nodes. The only way to jump from a reference node to the targeted node is to use the triton::engines::symbolic::SymbolicEngine::getFullAst() function.

>>> zfId = getSymbolicRegisterId(REG.ZF)
>>> partialTree = getSymbolicExpressionFromId(zfId).getAst()
>>> print partialTree
(ite (= ref!89 (_ bv0 32)) (_ bv1 1) (_ bv0 1))
>>> fullTree = getFullAst(partialTree)
>>> print fullTree
(ite (= (bvsub ((_ extract 31 0) ((_ zero_extend 32) ((_ extract 31 0) ((_ zero_extend 32) (bvxor ((_ extract 31 0) ((_ zero_extend 32) (bvsub ((_ extract 31 0) ((_ zero_extend 32) ((_ sign_extend 24) ((_ extract 7 0) SymVar_0)))) (_ bv1 32)))) (_ bv85 32)))))) ((_ extract 31 0) ((_ zero_extend 32) ((_ sign_extend 24) ((_ extract 7 0) ((_ zero_extend 32) ((_ zero_extend 24) ((_ extract 7 0) (_ bv49 8))))))))) (_ bv0 32)) (_ bv1 1) (_ bv0 1))

The SMT or Python Syntax


By default, Triton represents semantics into SMT-LIB which is an international initiative aimed at facilitating research and development in Satisfiability Modulo Theories (SMT). However, Triton allows you to display your AST via a Python syntax.

>>> from triton import *
>>> setArchitecture(ARCH.X86_64)
>>> setAstRepresentationMode(AST_REPRESENTATION.PYTHON)
>>> inst = Instruction()
>>> inst.setOpcodes("\x48\x01\xd8") # add rax, rbx
>>> inst.setAddress(0x400000)
>>> inst.updateContext(Register(REG.RAX, 0x1122334455667788))
>>> inst.updateContext(Register(REG.RBX, 0x8877665544332211))
>>> processing(inst)
>>> print inst
400000: add rax, rbx
>>> for expr in inst.getSymbolicExpressions():
... print expr
...
ref_0 = ((0x1122334455667788 + 0x8877665544332211) & 0xFFFFFFFFFFFFFFFF) # ADD operation
ref_1 = (0x1 if (0x10 == (0x10 & (ref_0 ^ (0x1122334455667788 ^ 0x8877665544332211)))) else 0x0) # Adjust flag
ref_2 = ((((0x1122334455667788 & 0x8877665544332211) ^ (((0x1122334455667788 ^ 0x8877665544332211) ^ ref_0) & (0x1122334455667788 ^ 0x8877665544332211))) >> 63) & 0x1) # Carry flag
ref_3 = ((((0x1122334455667788 ^ ~0x8877665544332211) & (0x1122334455667788 ^ ref_0)) >> 63) & 0x1) # Overflow flag
ref_4 = ((((((((0x1 ^ (((ref_0 & 0xFF) >> 0x0) & 0x1)) ^ (((ref_0 & 0xFF) >> 0x1) & 0x1)) ^ (((ref_0 & 0xFF) >> 0x2) & 0x1)) ^ (((ref_0 & 0xFF) >> 0x3) & 0x1)) ^ (((ref_0 & 0xFF) >> 0x4) & 0x1)) ^ (((ref_0 & 0xFF) >> 0x5) & 0x1)) ^ (((ref_0 & 0xFF) >> 0x6) & 0x1)) ^ (((ref_0 & 0xFF) >> 0x7) & 0x1)) # Parity flag
ref_5 = ((ref_0 >> 63) & 0x1) # Sign flag
ref_6 = (0x1 if (ref_0 == 0x0) else 0x0) # Zero flag
ref_7 = 0x400003 # Program Counter

Examples


Get a register's expression and create an assert

# Get the symbolic expression of the ZF flag
zfId = getSymbolicRegisterId(REG.ZF)
zfExpr = getFullAst(getSymbolicExpressionFromId(zfId).getAst())
# (assert (= zf True))
newExpr = ast.assert_(
ast.equal(
zfExpr,
ast.bvtrue()
)
)
# Get a model
models = getModel(newExpr)

Play with the AST

# Node information
>>> node = bvadd(bv(1, 8), bvxor(bv(10, 8), bv(20, 8)))
>>> print type(node)
<type 'AstNode'>
>>> print node
(bvadd (_ bv1 8) (bvxor (_ bv10 8) (_ bv20 8)))
>>> subchild = node.getChilds()[1].getChilds()[0]
>>> print subchild
(_ bv10 8)
>>> print subchild.getChilds()[0].getValue()
10
>>> print subchild.getChilds()[1].getValue()
8
# Node modification
>>> node = bvadd(bv(1, 8), bvxor(bv(10, 8), bv(20, 8)))
>>> print node
(bvadd (_ bv1 8) (bvxor (_ bv10 8) (_ bv20 8)))
>>> node.setChild(0, bv(123, 8))
>>> print node
(bvadd (_ bv123 8) (bvxor (_ bv10 8) (_ bv20 8)))

Python operators

>>> a = bv(1, 8)
>>> b = bv(2, 8)
>>> c = (a & ~b) | (~a & b)
>>> print c
(bvor (bvand (_ bv1 8) (bvnot (_ bv2 8))) (bvand (bvnot (_ bv1 8)) (_ bv2 8)))

As we can't overload all AST's operators only these following operators are overloaded:

Python's Operator e.g: SMT2-Lib format
a + b (bvadd a b)
a - b (bvsub a b)
a * b (bvmul a b)
a / b (bvudiv a b)
a | b (bvor a b)
a & b (bvand a b)
a ^ b (bvxor a b)
a % b (bvsrem a b)
a << b (bvshl a b)
a >> b (bvlshr a b)
~a (bvnot a)
-a (bvneg a)
a == b (= a b)
a != b (not (= a b))
a <= b (bvule a b)
a >= b (bvuge a b)
a < b (bvult a b)
a > b (bvugt a b)

Python API - Methods of the ast module