This transformation turns a function into an interpreter, whose bytecode language is specialized for this function. The transformation has been designed to induce as much diversity as possible, i.e. every decision made is dependent on the randomization seed. The diversity is both static and dynamic, i.e. each interpreter variant differs in the structure of its code as well as in its execution pattern.

A generated interpreter consists of a virtual instruction set, specialized for the input function, a bytecode array, a virtual program counter (VPC), a virtual stack pointer (VSP), a dispatch unit, and a list of instruction handlers, one for each virtual instruction:

For this transformation, Tigress first constructs type-annotated abstract syntax tree (AST) from the C source, from which it generates control-flow graphs of instruction trees. Tigress then selects a random instruction set architecture (ISA) and, using this ISA, generates a bytecode program specialized for the input function. Finally, Tigress selects a random dispatch method and produces an output program.

Diversity and Stealth

Static diversity. Tigress supports two mechanisms for generating ISAs with a high degree of static diversity:

instruction opcodes can be randomized,
the ISA can have duplicate instructions with the same semantics,
instructions can pass arguments in arbitrary combinations of stack locations and registers,
instructions can be made arbitrarily long (with highly complex semantics) through the use of superoperators.

Dynamic diversity. We ensure that dynamic execution patterns are diversified by merging randomized bogus functions with the ``real'' function. We can furthermore impede dynamic analysis by making instruction traces artificially long.

Static stealth. Not only diversity but also stealth is important for interpreters. For static stealth, the split transformation can break up the interpreter loop into smaller pieces, and the AddOpaque transformation can make instruction handlers less conspicuous.

Dynamic stealth. For dynamic stealth, Tigress interpreters can be made reentrant, meaning only a few iterations of the dispatch loop are executed at a time, effectively mixing instructions executed from the interpreter with instructions executed by the rest of the program. This is of particular interest when wanting to hide the execution pattern from analysts, and when the exact time that the function executes is not important, as long as it completes eventually.

Dispatch Method Selection

For both static and dynamic diversity, Tigress supports eight different dispatch methods. The following code is generated for the different methods, where Ξ^op1; is the instruction handler for operator op1:

Dispatch	Generated code
switch	switch(prog[pc]) { op1: Ξ^op1; break; op2: Ξ^op2; break; }
direct	goto prog[pc]; op1hdl: Ξ^op1; goto prog[pc]; op2hdl: Ξ^op2; goto *prog[pc];
indirect	goto jtab[prog[pc]]; op1hdl: Ξ^op1; goto jtab[prog[pc]]; op2hdl: Ξ^op2; goto *jtab[prog[pc]];
call	void op1fun(){Ξ^op1} void op2fun(){Ξ^op2} … call *prog[pc]();
ifnest	if (prog[pc]==op1) Ξ^op1 else if (prog[pc]==op2) Ξ^op2 else if …
linear, binary, interpolation	alg = linear\|binary\|interpolation\|… top: goto *(search^alg(map,prog[pc])); op1hdl: Ξ^op1; goto top; op2hdl: Ξ^op2; goto top;

Instruction Set Architecture Generation

Instruction sets can use stacks, registers, or both to pass values between instructions. By default, the following, very simple, instruction set is used:

  labels:         l ∈ Labels 
  functions:      f ∈ Funs 
  variables:      x ∈ Vars 
  strings:        s ∈ Strings 
  temporaries:     t ::= reg^int | stack^{int  
  binary operators: binop ::= add | sub | …
  unary operators:  unop ::= uminus | neg | …
  types:           τ ::= int | float | … | void *
  literals:        λ ::= intlit | floatlit | …
  instructions: e ::=  
       t ← constant τ λ
     | t ← local  x
     | t ← global  x
     | t ← formal  x
     | t ← string  s
     | t ← binary  τ  binop t t
     | t ← unary  τ  unop t
     | t ← convert  τ τ t
     | t ← ternary  τ t t t
     | t ← load  τ t
     | store τ t t
     | t ← memcpy  t t int
     | call  f
     | x, x, ← asm  s  t, t, …
     | indirectCall  t
     | return  τ t
     | goto  l
     | t ← addrOfLabel  l
     | indirectGoto  t
     | branchIfTrue  t  l 
     | switch  τ t  λ  λ  l ⟨l, l, …⟩ 
     | merged  ⟨ e, e, \ldots⟩}

However, a high degree of diversity can be achieved from the way instructions communicate with each other, through values stored on the stack or passed in virtual registers. Tigress can generate instructions that use any combination of registers and stack storage for the inputs they read or the output they produce.

ISA Diversity through Duplicate Operators and Superoperators

Tigress can induce further diversity by merging instructions into superoperators. New, merged, instructions can have an almost abritrary complex semantics, involving multiple arithmetic operations and operations both on the stack and virtual registers. For more information on superoperators, see Optimizing an ANSI C interpreter with superoperators by Todd Proebsting. The complex semantics of instructions generated by superoperators make manual analysis of generated interpreters, such as discussed by Rolles in Unpacking virtualization obfuscators, difficult.

Consider setting --VirtualizeMaxDuplicateOps=2 and --VirtualizeOperands=mixed resulting in two store-int instructions, one that takes both arguments in registers, and one that takes one argument on the stack and the other in a register. Tigress will chose between them randomly. Here are the corresponding instruction handlers:

case _0__store_int$left_REG_0$right_REG_1: 
   (_0__pc[0]) ++;
   *((int *)_0__regs[0][*((int *)_0__pc[0])]._void_star) = _0__regs[0][*((int *)(_0__pc[0] + 4))]._int;
   _0__pc[0] += 8;
   break;

case _0__store_int$right_STA_0$left_REG_0: 
   (_0__pc[0]) ++;
   *((int *)_0__regs[0][*((int *)_0__pc[0])]._void_star) = _0__stack[0][_0__sp[0] + 0]._int;
   (_0__sp[0]) --;
   _0__pc[0] += 4;
   break;

Consider next setting --VirtualizeSuperOpsRatio=2.0 and --VirtualizeMaxMergeLength=10, resulting in virtual instructions with highly complex semantics:

Here is the instruction handler for one such instruction, made up by merging 10 primitive instructions:

case _0__local$result_STA_0$value_LIT_0__\
   convert_void_star2void_star$left_STA_0$result_REG_0__\
   load_int$result_REG_0$left_REG_1__\
   local$result_STA_0$value_LIT_0__\
   convert_void_star2void_star$left_STA_0$result_REG_0__\
   store_int$left_REG_0$right_REG_1__\
   local$result_REG_0$value_LIT_1__\
   local$result_STA_0$value_LIT_0__\
   convert_void_star2void_star$left_STA_0$result_REG_0__\
   load_int$result_STA_0$left_REG_0: 
    (_0__pc[0]) ++;
    _0__regs[0][*((int *)(_0__pc[0] + 4))]._void_star = (void *)(_0__locals + *((int *)_0__pc[0]));
    _0__regs[0][*((int *)(_0__pc[0] + 8))]._int = *((int *)_0__regs[0][*((int *)(_0__pc[0] + 12))]._void_star);
    _0__regs[0][*((int *)(_0__pc[0] + 20))]._void_star = (void *)(_0__locals + *((int *)(_0__pc[0] + 16)));
    *((int *)_0__regs[0][*((int *)(_0__pc[0] + 24))]._void_star) = _0__regs[0][*((int *)(_0__pc[0] + 28))]._int;
    _0__regs[0][*((int *)(_0__pc[0] + 32))]._void_star = (void *)(_0__locals + *((int *)(_0__pc[0] + 36)));
    _0__regs[0][*((int *)(_0__pc[0] + 44))]._void_star = (void *)(_0__locals + *((int *)(_0__pc[0] + 40)));
    _0__stack[0][_0__sp[0] + 1]._int = *((int *)_0__regs[0][*((int *)(_0__pc[0] + 48))]._void_star);
    (_0__sp[0]) ++;
    _0__pc[0] += 52;
    break;

Note that the instruction name really is almost 400 characters long; the backslashes are here only for display purposes! Also note that the instruction itself is 53 bytes long, almost as long as the longest VAX instruction (EMODH, 54 bytes) and much longer than the longest x86 instruction (15 bytes)

Instruction Handler Obfuscation

You can split up the generated interpreter by inserting opaque predicates. This is useful to make the instruction handlers and the dispatch logic less conspicuous.

VPC Obfuscation

You can add opaque expressions to the virtual PC to make it more difficult to find.

One possible attack on interpreters is to perform a taint analysis on input-dependent variables, and discard any instructions which are not tainted (input dependent instructions are colored green, and instructions introduced by the virtualizer are colored blue):

To frustrate such analyses we can add implicit flow to the VPC:

Bogus Functions

Generate bogus functions that are virtualized along with the "real" function. Instructions from the bogus and real function are executed cyclically and in sequence, i.e. first an instruction from the real function, then one from bogus function number 1, then one from bogus function number 2, etc., and then the process repeats with an instruction from the real function. The purpose is to frustrate dynamic analyses that try to locate the virtual program counter, by providing multiple VPCs, one "real", and one or more that behave as if they were real, but which interpret unused functions:

Bogus Loops

Add random computations to every iteration of the dispatch loop. Use this to frustrate dynamic analysis by

inserting bogus instructions between consecutive iterations of the dispatch loop, thereby making the dispatch harder to recognize;
making traces longer and thereby harder to store and analyze.

Reentrant Interpreters

Make interpreters that can execute a few instructions, return, and later resume to execute a few more instructions, until, eventually, they terminate. This is particularly useful when it is not important exactly when the a piece of code executes, as long as it executes eventually, and where the stealthiness of the computations is paramount.

The result is an instruction trace that intermixes instructions from the original program and the interpreted function:

You must prepare your code in the following ways:

The function you want to virtualize must have an argument int* operation. It can occur anywhere among the formal parameters:
```
void foo(int* operation, int n, int* result) {…}
```
The first time foo gets called, operation must be <0, and you must pass actual arguments to foo that it will use throughout the computation:
```
int operation = -10; 
foo(&operation,n,&result);
```
"-10" here means to initialize foo and execute 10 instructions.
Sprinkle calls to foo throughout your program, making sure that operation>0:
```
operation = 10;
foo(&operation,bogus1,&bogus2);  
```
Here you can pass whatever arguments you want to foo, they won't be used. Rather, the ones that were passed in the first call will be used throughout. "10" here means to resume foo and execute 10 instructions.

You can check if foo has terminated by testing the value of operation after the call:

operation = 10;
foo(&operation,bogus1,&bogus2);  
if (operation > 0)
   /* we're done! */
else if (operation < 0)
   /* more work to do! */

If you want to make sure that foo has terminated --- because you really want its result at a particular point --- set operation to a large enough value:
```
operation = 1000;
foo(&operation,bogus1,&bogus2);  
```
Additional calls to foo once termination has been reached is safe; no additional instructions will be executed.
If you want to call foo to compute a new value, call it again with operation<0:
```
   int operation = -10; 
   foo(&operation,n,&result);
```

To ensure termination you can

experiment yourself with how many iterations are necessary to finish the computation;
make sure that the last call to foo is passed a huge value to 'operation';

put the last call to foo in a loop

   foo(&operation);   
   while (operation < 0) {
      /* some other computation here */
      operation = 10;
      foo(&operation);   
   } 
   /* result is available here */

It is a good idea to combine reentrant interpreters with superoperators. Superoperators produce long instructions that perform more work during each iteration, and as a result the number of dispatches (i.e. loop iterations) is reduced. In other words, if you want to frustrate dynamic analysis that looks for evidence of the dispatch loop in the instruction trace, superoperators combined with reentrant interpreters will reduce the presence of such artifacts.

Encoding the Program Array

Setting --VirtualizeEncodeByteArray=true results in each program instruction being xor:ed with a constant value, thus ensuring that it is not in cleartext until it is run:

unsigned char _5_obf3_$array[1][141]  = 
     {{
        (unsigned char)formal$result_STA_0$value_LIT_0 ^ (unsigned char)16,
        (unsigned char)1 ^ (unsigned char)16,
        (unsigned char)0 ^ (unsigned char)16,
        (unsigned char)0 ^ (unsigned char)16,
        (unsigned char)0 ^ (unsigned char)16,
        (unsigned char)load_int$left_STA_0$result_STA_0 ^ (unsigned char)16,
        (unsigned char)formal$result_STA_0$value_LIT_0 ^ (unsigned char)16,
        (unsigned char)0 ^ (unsigned char)16,
        (unsigned char)0 ^ (unsigned char)16,
        (unsigned char)0 ^ (unsigned char)16,
        (unsigned char)0 ^ (unsigned char)16,
        ...
     }}

This does not work for --VirtualizeDispatch=direct.

Additionally, setting --VirtualizeObfuscateDecodeByteArray=true and --VirtualizeOpaqueStructs=input,env ensures that the decoded bytecode array depends on input. The decoding procedure looks like this:

  strcmp_result17 = (int)strlen(*(argv + (argc - 1)));
  decodeVar16 = (strcmp_result17 - 1 < 0) + (strcmp_result17 - 10 > 0) ? currentOp : (unsigned char)16;
  copyIndex15 = 0;
  while (copyIndex15 < 141) {
    localArrayCopy11[0][copyIndex15] = array[0][copyIndex15] ^ decodeVar16;
    copyIndex15 ++;
  }
  $pc[0] = localArrayCopy11[0];

Here's an example script:

tigress --Seed=0 \
   --Transform=InitImplicitFlow \
   --Transform=InitEntropy \
   --Transform=InitOpaque --Functions=main --InitOpaqueStructs=input \
   --Transform=UpdateEntropy --Functions=main --UpdateEntropyVar=argv,argc \
   --Inputs="+1:int:42,-1:length:1?10" \
   --Transform=Virtualize --InitOpaqueStructs=input,env \
   --VirtualizeDispatch=interpolation --Functions=obf3 \
   --VirtualizeEncodeByteArray=true \
   --VirtualizeObfuscateDecodeByteArray=true \
   --VirtualizeOpaqueStructs=input,env \
      arith.c --out=arith_out.c

--Inputs="+1:int:42,-1:length:1?10" is a specfication of invariants over the command line. It specifies:

the first argument on the command line should be the integer 42
the last argument should have a length between 1 and 10

--VirtualizeOpaqueStructs=input,env ensures that the program array will be encoded based on these command line invariants. If the program is invoked with a set of command line arguments that violate the invariants it is likely to crash.

Dynamic Program Arrays

Setting --VirtualizeDynamicBytecode=true results in the program array being constantly modified at runtime, much in the same way that happens to jitted code in the JitDynamic transformation. In fact, exactly the same mechanisms are used for both transformations, and they share all the same options.

Here's an example script:

tigress --Seed=0 \
   --Transform=Virtualize \
      --Functions=foo \
      --VirtualizeDispatch=switch \
      --VirtualizeDynamicBytecode=true \
      --VirtualizeDynamicCodecs=xtea \
      --VirtualizeDynamicKeyTypes=data \
      --VirtualizeDynamicBlockFraction=%100 \
      --VirtualizeDynamicReEncode=true \
      --VirtualizeDynamicRandomizeBlocks=false \
      --VirtualizeDynamicDumpCFG=false \
      --VirtualizeDynamicAnnotateTree=false \
      --VirtualizeDynamicDumpTree=false \
      --VirtualizeDynamicDumpIntermediate=false \
      --VirtualizeDynamicTrace=0 \
      --VirtualizeDynamicTraceExec=false \
      --VirtualizeDynamicTraceBlock=false \
      --VirtualizeDynamicCompileCommand="gcc -o %o %i -lm" \
      arith.c --out=arith_out.c

Options

Option	Arguments	Description
`--Transform`	`Virtualize`	Turn a function into an interpreter.
`--VirtualizeShortIdents`	`bool`	Generate shorter identifiers to produce interpreters suitable for publication. Default=false.
`--VirtualizeIsWindows`	`bool`	Set this to true if you're on Windows rather than a Unix system. Currently only relevant when generating bogus functions.
`--VirtualizeDispatch`	`switch, direct, indirect, call, ifnest, linear, binary, interpolation, ?`	Select the interpreter's dispatch method. The argument should be a comma-separated list of disparch kinds. One of these will be picked at random. Default=switch. `switch` = dispatch by while(){switch(next){...}} `direct` = dispatch by direct threading `indirect` = dispatch by indirect threading `call` = dispatch by call threading `ifnest` = dispatch by nested if-statements `linear` = dispatch by searching a table using linear search `binary` = dispatch by searching a table using binary search `interpolation` = dispatch by searching a table using interpolation search `?` = Pick a random dispatch method
`--VirtualizeOperands`	`stack, registers, mixed, ?`	Comma-separated list of the types of operands allowed in the ISA. Default=stack. `stack` = use stack arguments to instructions `registers` = use register arguments to instructions `mixed` = same as stack,registers
`--VirtualizeMaxDuplicateOps`	`INTSPEC`	Number of ADD instructions, for example, with different signatures. Default=0.
`--VirtualizeRandomOps`	`bool`	Should opcodes be randomized, or go from 0..n? Default=true.
`--VirtualizeSuperOpsRatio`	`Float>0.0`	Desired number of super operators. Default=0.0.
`--VirtualizeMaxMergeLength`	`INTSPEC`	Longest sequence of instructions to be merged into one. Default=0.
`--VirtualizeInstructionHandlerSplitCount`	`INTSPEC`	Number of opaques to add to each instruction handler. Default=0.
`--VirtualizeAddOpaqueToVPC`	`BOOLSPEC`	Whether to add opaques to the virtual program counter. Default=false.
`--VirtualizeAddOpaqueToBogusFuns`	`BOOLSPEC`	Whether to add opaque expressions to the generated bogus function. Default=false.
`--VirtualizeNumberOfBogusFuns`	`INTSPEC`	Weave the execution of random functions into the execution of the original program. This makes certain kinds of pattern-based dynamic analysis more difficult. Default=0.
`--VirtualizeBogusFunsGenerateOutput`	`BOOLSPEC`	Make the bogus function produce output (typically be writing to /dev/null), to prevent it from appearing to have no effect. Default=true.
`--VirtualizeBogusFunKinds`	`trivial, arithSeq, collatz, *`	The kind of bogus function to generate. Comma-separated list. Default=arithSeq,collatz. `trivial` = insert a trivial computation `arithSeq` = insert a simple arithmetic loop `collatz` = insert a computation of the Collatz sequence `*` = select all options
`--VirtualizeBogusLoopKinds`	`trivial, arithSeq, collatz, *`	Insert a bogus loop for each instruction list. This will extend the length of the trace, making dynamic analysis more difficult. `trivial` = insert a trivial computation `arithSeq` = insert a simple arithmetic loop `collatz` = insert a computation of the Collatz sequence `*` = select all options
`--VirtualizeBogusLoopIterations`	`INTSPEC`	Adjust this value to balance performance and trace length. Default=0.
`--VirtualizePerformance`	`IndexedStack, PointerStack, AddressSizeShort, AddressSizeInt, AddressSizeLong, CacheTop`	Tweak performance. A comma-separated list of the options below. DEFAULT PointerStack `IndexedStack` = Use array indexing to access stack elements. `PointerStack` = Use pointer operations to access stack elements. `AddressSizeShort` = Assume addresses for accessing instruction handlers fit in a short. `AddressSizeInt` = Assume addresses for accessing instruction handlers fit in an int. `AddressSizeLong` = Assume addresses for accessing instruction handlers fit in a long. `CacheTop` = Store the top of stack in a register.
`--VirtualizeReentrant`		Make the function reentrant. Default=false.
`--VirtualizeOptimizeBody`	`BOOLSPEC`	Clean up after superoperator generation by optimizing the body of the generated function. Default=false.
`--VirtualizeOptimizeTreeCode`	`BOOLSPEC`	Do constant folding etc. prior to interpreter generation. Default=false.
`--VirtualizeTrace`	`instr, args, stack, regs, *`	Insert tracing code to show the stack and the virtual instructions executing. Default=print nothing. `instr` = print instruction names `args` = print instruction names and arguments `stack` = print stack contents (currently only works if you set --VirtualizePerformance=IndexedStack) `regs` = print register contents (not implemented) `*` = select all options
`--VirtualizeStackSize`	`INTSPEC`	Number of elements in the evaluation stack. Default=32.
`--VirtualizeComment`	`bool`	Insert comments in the generated interpreter. Default=false.
`--VirtualizeDump`	`tree, ISA, instrs, types, vars, strings, SuperOps, calls, bytes, array, stack, *`	Dump internal data structures used by the virtualizer. Comma-separated list. Default=dump nothing. `tree` = dump the expression trees generated from the CIL representation `ISA` = dump the Instruction Set Architecture `instrs` = dump the generated virtual instructions `types` = dump the types found `vars` = dump the local variables found `strings` = dump the strings found `SuperOps` = dump the super operator instructions `calls` = dump the function calls found `bytes` = dump the bytecode array `array` = dump the instruction array `stack` = dump the evaluation stack `*` = select all options
`--VirtualizeImplicitFlowPC`	`PCInit, PCUpdate, *`	Insert implicit flow between the virtual program counter and instruction dispatcher. Default=none. `PCInit` = insert implcit flow between the computation of the VPC address and the first load `PCUpdate` = insert implcit flow for each VPC load (potentially very slow) `*` = select all options
`--VirtualizeImplicitFlow`	`S-Expression`	The type of implicit flow to insert. See --AntiTaintAnalysisImplicitFlow for a description. Default=none.
`--VirtualizeConditionalKinds`	`branch, compute, flag`	Ways to transform the one conditional branch that occurs in instruction handlers. Default=branch. `branch` = Use normal branches, such as if (a>b) VPC=L1 else VPC=L2 `compute` = Compute the branch, such as x=(a>b); VPC=(expression over x). Not yet implemented. `flag` = Compute the branch from the values of the flag register, such as asm("cmp a b;pushf;pop"); VPC=(expression over flag register)
`--VirtualizeOpaqueStructs`	`list, array, input, env, *`	Default=list,array. `list` = Generate opaque expressions using linked lists `array` = Generate opaque expressions using arrays `input` = Generate opaque expressions that depend on input. Requires --Inputs to set invariants over input. `env` = Generate opaque expressions from entropy. Requires --InitEntropy. `*` = Same as list,array,input,env
`--VirtualizeEncodeByteArray`	`bool`	Encode the bytecode array. It is decoded before the interpreter is entered. Doesn't work for direct dispatch. Requires opaque expressions. Default=false.
`--VirtualizeObfuscateDecodeByteArray`	`bool`	Obfuscates the program array decoded with opaque expressions. --VirtualizeOpaqueStructs=input,env are the preferable opaque kinds, since it means that the bytecode array depends on input. Default=false.
`--VirtualizeDynamicBytecode`	`BOOL`	Similar to the JitDynamic transform, make the virtualized bytecode self-modifying. Default=false.
`--VirtualizeDynamicEncoding`	`hard, soft`	How the jitted instructions are encoded. Default=hard. `hard` = The jitted instructions are encoded as code. `soft` = The jitted instructions are encoded as data (not implemented).
`--VirtualizeDynamicOptimize`	`BOOLSPEC`	Clean up the generated code by removing jumps-to-jumps. Default=true.
`--VirtualizeDynamicTrace`	`INTSPEC`	Insert runtime tracing of instructions. Set to 1 to turn it on. Default=0.
`--VirtualizeDynamicTraceExec`	`BOOLSPEC`	Annotate each instruction, showing from where it was generated, and the results of execution. Default=false.
`--VirtualizeDynamicDumpTree`	`BOOLSPEC`	Print the tree representation of the function, prior to generating the jitting code. Default=false.
`--VirtualizeDynamicAnnotateTree`	`BOOLSPEC`	Annotate the generated code with the corresponding intermediate tree code instructions. Default=false.
`--VirtualizeDynamicCodecs`	`none, ident, ident_loop, xor_transfer, xor_byte_loop, xor_word_loop, xor_qword_loop, xor_call, xor_call_trace, xtea, xtea_trace, stolen_byte, stolen_short, stolen_word`	How blocks should be encoded/decoded. Default=*. `none` = No encoding `ident` = The identity encoding using a single copy JIT instruction `ident_loop` = The identity encoding using a copy loop of primitive JIT instructions `xor_transfer` = An xor encoding using a single xor JIT instruction `xor_byte_loop` = An xor encoding using a copy loop of byte-size primitive JIT instructions `xor_word_loop` = An xor encoding using a copy loop of word-size primitive JIT instructions `xor_qword_loop` = An xor encoding using a copy loop of qword-size primitive JIT instructions `xor_call` = An xor encoding calling a xor function `xor_call_trace` = An xor encoding calling a xor function with tracing turned on (for debugging) `xtea` = An xtea encryption `xtea_trace` = An xtea encryption with tracing turned on (for debugging) `stolen_byte` = A byte-sized stolen bytes encoding `stolen_short` = A short-sized stolen bytes encoding `stolen_word` = A word-sized stolen bytes encoding
`--VirtualizeDynamicKeyTypes`	`data, code`	Where the encoding/decoding key is stored (for xor and xtea encodings) Default=data. `data` = In the data segment `code` = In the code segment (not implemented)
`--VirtualizeDynamicBlockFraction`	`FRACSPEC`	Fraction of the basic blocks in a function to encode Default=all.
`--VirtualizeDynamicRandomizeBlocks`	`BOOLSPEC`	Randomize the order of basic blocks Default=true.
`--VirtualizeDynamicReEncode`	`BOOLSPEC`	If true, blocks will be re-encoded after being executed. If false, blocks will be decoded once, and stay in cleartext. ('False' is not implemented; this option is always set to 'true'.) Default=true.
`--VirtualizeDynamicDumpCFG`	`BOOLSPEC`	Print the jitter's Control Flow Graph. This requires graphviz to be installed (the dot command is used). A number of pdf files get generated that shows the CFG at various stages of processing: CFGAfterInsertingAnnotations.pdf, CFGAfterSimplifyingJumps.pdf, CFGAfterTranslatingAnnotations.pdf, CFGBeforeInsertingAnnotations.pdf, CFGDumpingFunctionFinal.pdf, CFGDumpingFunctionInitial.pdf, CFGFixupIndirecJumps.pdf, CFGReplaceWithCompiledBlock.pdf, CFGSplitOutBranches.pdf, CFGSplitOutDataReferences.pdf, OriginalCFG.pdf Default=false.
`--VirtualizeDynamicTraceBlock`	`BOOLSPEC`	Print out a message before each block is executed. Default=false.
`--VirtualizeDynamicTraceBlock`	`STRING`	Print out a message before each block is executed. (Not currently implemented.) Default="".
`--VirtualizeDynamicCompileCommand`	`STRING`	A string of the form "gcc -std=c99 -o %o %i", where "%i" will be replaced with the name of the input file and "%o" with the name of the output file. For example, if your program uses the math library, you should set --VirtualizeDynamicCompileCommand="gcc -std=c99 -o %o %i -lm". Default="gcc -std=c99 -o %o %i".

Examples

As you are reading the code, there are a couple of interesting things to note:

Much of the symbolic information present in the transformed source files (such as types, enumerations, and structured control flow) that help make the code easy to read and understand, disappears once the source has been compiled, linked, and stripped. A successful attack will (at least partially) have to recover this information.
The code after two levels of virtualization looks very similar to the code after one level of virtualization. This is because the dispatch loop of the first virtualization gets coded into the bytecode program of the second. It's an interesting question to ask to what extent this hinders de-virtualization.
The direct and call dispatch methods result in much larger bytecode programs than the other methods. This is particularly evident on 64-bit machines where every opcode gets encoded in 8 bytes, in contrast with a single byte for the other methods. For this reason, if you are contemplating using two levels of interpretation, it's a good idea to make the second level not use direct or call dispatch, to keep the size of the program down. Future versions of Tigress will use more compact encodings for these types of dispatch.

Virtualize

Virtualize
Virtualize `fib` in test1.c using each of the dispatch methods.
tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1 \ --Transform=Virtualize --Functions=fib --VirtualizeDispatch=dispatch \ --Transform=CleanUp --CleanUpKinds=annotations \ --out=gen/... test1.c

Virtualize fib in test1.c using each of the dispatch methods.

tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1  \
   --Transform=Virtualize --Functions=fib --VirtualizeDispatch=dispatch \
   --Transform=CleanUp --CleanUpKinds=annotations \
   --out=gen/... test1.c

sw	if	di	id	ca	li	bi	ip
sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c

Virtualize ⇒ Virtualize

Virtualize ⇒ Virtualize
Virtualize `fib` in test1.c using two levels of interepretation.
tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1 \ --Transform=Virtualize --Functions=fib --VirtualizeDispatch=dispatch1 \ --Transform=Virtualize --Functions=fib --VirtualizeDispatch=dispatch2 \ --Transform=CleanUp --CleanUpKinds=annotations \ --out=gen/... test1.c

Virtualize fib in test1.c using two levels of interepretation.

tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1  \
   --Transform=Virtualize --Functions=fib --VirtualizeDispatch=dispatch1 \
   --Transform=Virtualize --Functions=fib --VirtualizeDispatch=dispatch2 \
   --Transform=CleanUp --CleanUpKinds=annotations \
   --out=gen/... test1.c

	sw	if	di	id	ca	li	bi	ip
sw	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c
if	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c
di	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c
id	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c
ca	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c
li	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c
bi	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c
ip	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c	sh ⇒ out.c

Virtualize

Virtualize
Virtualize `fib` using a switch dispatch, mixed register and stack arguments, and at most two instruction variants of each kind (i.e., no more than 2 ADD instructions, etc.).
tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1 \ --Transform=Virtualize --Functions=fib --VirtualizeDispatch=switch \ --VirtualizeMaxDuplicateOps=2 --VirtualizeOperands=mixed \ --Transform=CleanUp --CleanUpKinds=annotations \ --out=gen/virtualize_mixed.c test1.c
test1.c ⇒ virtualize_mixed.sh.txt ⇒ virtualize_mixed.c

Virtualize fib using a switch dispatch, mixed register and stack arguments, and at most two instruction variants of each kind (i.e., no more than 2 ADD instructions, etc.).

tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1  \
   --Transform=Virtualize --Functions=fib --VirtualizeDispatch=switch \
   --VirtualizeMaxDuplicateOps=2 --VirtualizeOperands=mixed \
   --Transform=CleanUp --CleanUpKinds=annotations \
   --out=gen/virtualize_mixed.c test1.c

test1.c ⇒ virtualize_mixed.sh.txt ⇒ virtualize_mixed.c

Virtualize

Virtualize
Virtualize `fib` using a switch dispatch, register and stack arguments, at most two instruction variants of each kind, and superoperators of length no more than 10.
tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1 \ --Transform=Virtualize --Functions=fib --VirtualizeDispatch=switch \ --VirtualizeMaxDuplicateOps=2 --VirtualizeOperands=mixed \ --VirtualizeSuperOpsRatio=2.0 --VirtualizeMaxMergeLength=10 \ --VirtualizeOptimizeBody=true \ --Transform=CleanUp --CleanUpKinds=annotations \ --out=gen/virtualize_super.c test1.c
test1.c ⇒ virtualize_super.sh.txt ⇒ virtualize_super.c

Virtualize fib using a switch dispatch, register and stack arguments, at most two instruction variants of each kind, and superoperators of length no more than 10.

tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1  \
   --Transform=Virtualize --Functions=fib --VirtualizeDispatch=switch \
   --VirtualizeMaxDuplicateOps=2 --VirtualizeOperands=mixed \
   --VirtualizeSuperOpsRatio=2.0 --VirtualizeMaxMergeLength=10 \
   --VirtualizeOptimizeBody=true \
   --Transform=CleanUp --CleanUpKinds=annotations \
   --out=gen/virtualize_super.c test1.c

test1.c ⇒ virtualize_super.sh.txt ⇒ virtualize_super.c

Virtualize

Virtualize
Virtualize `fib` using a switch dispatch, register and stack arguments, at most two instruction variants of each kind, obfuscate operators of length no more than 10, add opaque expressions to the dispatch, and split up instruction handlers using opaque predicates.
tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1 \ --Transform=InitOpaque --Functions=main \ --Transform=Virtualize --Functions=fib --VirtualizeDispatch=switch \ --VirtualizeMaxDuplicateOps=2 --VirtualizeOperands=mixed \ --VirtualizeSuperOpsRatio=2.0 --VirtualizeMaxMergeLength=10 \ --VirtualizeOptimizeBody=true \ --VirtualizeMaxOpaque=5\ --Transform=CleanUp --CleanUpKinds=annotations \ --out=gen/virtualize_obfuscate.c test1.c
test1.c ⇒ virtualize_obfuscate.sh.txt ⇒ virtualize_obfuscate.c

Virtualize fib using a switch dispatch, register and stack arguments, at most two instruction variants of each kind, obfuscate operators of length no more than 10, add opaque expressions to the dispatch, and split up instruction handlers using opaque predicates.

tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1  \
   --Transform=InitOpaque --Functions=main \
   --Transform=Virtualize --Functions=fib --VirtualizeDispatch=switch \
   --VirtualizeMaxDuplicateOps=2 --VirtualizeOperands=mixed \
   --VirtualizeSuperOpsRatio=2.0 --VirtualizeMaxMergeLength=10 \
   --VirtualizeOptimizeBody=true \
   --VirtualizeMaxOpaque=5\
   --Transform=CleanUp --CleanUpKinds=annotations \
   --out=gen/virtualize_obfuscate.c test1.c

test1.c ⇒ virtualize_obfuscate.sh.txt ⇒ virtualize_obfuscate.c

Virtualize

Virtualize
Virtualize `fib` using an interpolation dispatch, running a bogus function in parallel (to thwart virtual PC pattern matching attempts), and inserting bogus computation between instruction executions (to increase the length of instruction traces).
tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1 \ --Transform=InitEntropy --Functions=main \ --Transform=UpdateEntropy --Functions=fac --UpdateEntropyVar=n \ --Transform=Virtualize --Functions=fib --VirtualizeDispatch=interpolation \ --VirtualizeNumberOfBogusFuns=1 --VirtualizeBogusFunKinds=collatz \ --VirtualizeBogusLoopIterations=10 --VirtualizeBogusLoopKinds=collatz \ --Transform=CleanUp --CleanUpKinds=annotations \ --out=gen/virtualize_bogus.c test1.c
test1.c ⇒ virtualize_bogus.sh.txt ⇒ virtualize_bogus.c

Virtualize fib using an interpolation dispatch, running a bogus function in parallel (to thwart virtual PC pattern matching attempts), and inserting bogus computation between instruction executions (to increase the length of instruction traces).

tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1  \
   --Transform=InitEntropy --Functions=main \
   --Transform=UpdateEntropy --Functions=fac --UpdateEntropyVar=n \
   --Transform=Virtualize --Functions=fib --VirtualizeDispatch=interpolation \
   --VirtualizeNumberOfBogusFuns=1 --VirtualizeBogusFunKinds=collatz \
   --VirtualizeBogusLoopIterations=10 --VirtualizeBogusLoopKinds=collatz \
   --Transform=CleanUp --CleanUpKinds=annotations \
   --out=gen/virtualize_bogus.c test1.c

test1.c ⇒ virtualize_bogus.sh.txt ⇒ virtualize_bogus.c

Virtualize
Virtualize `fib` using an ifnest dispatch, and make it reentrant, i.e. call `fib` from multiple places in the program, executing a few instructions at a time, to make the trace less conspicuous. Make as long superoperators as possible, to further reduce the number of times the dispatch loop executes.
tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1 \ --Transform=Virtualize --Functions=fib --VirtualizeDispatch=ifnest \ --VirtualizeSuperOpsRatio=2.0 --VirtualizeMaxMergeLength=20 \ --VirtualizeReentrant=true \ --Transform=CleanUp --CleanUpKinds=annotations \ --out=gen/virtualize_reentrant.c test2.c
test2.c ⇒ virtualize_reentrant.sh.txt ⇒ virtualize_reentrant.c

Virtualize ⇒ Virtualize

Virtualize ⇒ Virtualize
Virtualize `fib` twice, calling Tigress twice from the command line. Use the `--FilePrefix` option to avoid name clashes.
tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1 --FilePrefix=x \ --FilePrefix=v1 \ --Transform=Virtualize --Functions=fib --VirtualizeDispatch=ifnest \ --Transform=CleanUp --CleanUpKinds=annotations \ --out=gen/v1.c test1.c tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1 --FilePrefix=x \ --FilePrefix=v2 \ --Transform=Virtualize --Functions=fib --VirtualizeDispatch=ifnest \ --Transform=CleanUp --CleanUpKinds=annotations \ --out=gen/virtualize-virtualize-prefix.c gen/v1.c
test1.c ⇒ virtualize-virtualize-prefix.sh.txt ⇒ virtualize-virtualize-prefix.c

Virtualize fib twice, calling Tigress twice from the command line. Use the --FilePrefix option to avoid name clashes.

tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1 --FilePrefix=x \
   --FilePrefix=v1 \
   --Transform=Virtualize --Functions=fib --VirtualizeDispatch=ifnest \
   --Transform=CleanUp --CleanUpKinds=annotations \
   --out=gen/v1.c test1.c

tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1 --FilePrefix=x \
   --FilePrefix=v2 \
   --Transform=Virtualize --Functions=fib --VirtualizeDispatch=ifnest \
   --Transform=CleanUp --CleanUpKinds=annotations \
   --out=gen/virtualize-virtualize-prefix.c gen/v1.c

test1.c ⇒ virtualize-virtualize-prefix.sh.txt ⇒ virtualize-virtualize-prefix.c

Virtualize ⇒ Split

Virtualize ⇒ Split
Virtualize `fib`, and split up the resulting function in order to make the dispatch loop more statically stealthy.
tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1 \ --Transform=Virtualize --Functions=fib --VirtualizeDispatch=switch \ --VirtualizeMaxDuplicateOps=2 --VirtualizeOperands=mixed \ --VirtualizeSuperOpsRatio=2.0 --VirtualizeMaxMergeLength=10 \ --VirtualizeOptimizeBody=true \ --Transform=Split --Seed=0 --SplitKinds=deep,block,top --SplitCount=100 --Functions=fib \ --Transform=CleanUp --CleanUpKinds=annotations \ --out=gen/virtualize-split.c test1.c
test1.c ⇒ .virtualize-split.sh.txt ⇒ virtualize-split.c

Virtualize fib, and split up the resulting function in order to make the dispatch loop more statically stealthy.

tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1  \
   --Transform=Virtualize --Functions=fib --VirtualizeDispatch=switch \
   --VirtualizeMaxDuplicateOps=2 --VirtualizeOperands=mixed \
   --VirtualizeSuperOpsRatio=2.0 --VirtualizeMaxMergeLength=10 \
   --VirtualizeOptimizeBody=true \
   --Transform=Split --Seed=0 --SplitKinds=deep,block,top --SplitCount=100 --Functions=fib \
   --Transform=CleanUp --CleanUpKinds=annotations \
   --out=gen/virtualize-split.c test1.c

test1.c ⇒ .virtualize-split.sh.txt ⇒ virtualize-split.c

Virtualize ⇒ Flatten

Virtualize ⇒ Flatten
Virtualize `fib` using an ifnest dispatch and flatten the resulting function using a goto dispatch.
tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1 \ --Transform=InitEntropy --Functions=main \ --Transform=InitOpaque --Functions=main --InitOpaqueStructs=array \ --Transform=Virtualize --Functions=fib --VirtualizeDispatch=ifnest \ --Transform=Flatten --Functions=fib --FlattenObfuscateNext=true --FlattenDispatch=goto \ --Transform=CleanUp --CleanUpKinds=annotations \ --out=gen/virtualize-flatten.c test1.c
test1.c ⇒ .virtualize-flatten.sh.txt ⇒ virtualize-flatten.c

Virtualize fib using an ifnest dispatch and flatten the resulting function using a goto dispatch.

tigress --Environment=x86_64:Darwin:Clang:5.1 --Verbosity=1  \
   --Transform=InitEntropy --Functions=main \
   --Transform=InitOpaque --Functions=main --InitOpaqueStructs=array \
   --Transform=Virtualize --Functions=fib --VirtualizeDispatch=ifnest \
   --Transform=Flatten --Functions=fib   --FlattenObfuscateNext=true --FlattenDispatch=goto \
   --Transform=CleanUp --CleanUpKinds=annotations \
   --out=gen/virtualize-flatten.c test1.c

test1.c ⇒ .virtualize-flatten.sh.txt ⇒ virtualize-flatten.c

Issues

Several dispatch methods make use of gcc's and clang labels-as-values. For other compilers only the switch and ifnest dispatch methods should be used.
--VirtualizeEncodeByteArray=true does not work for the direct dispatch method.
Our current implementation of reentrant interpreters doesn't handle function results, so make sure your function is void, and returns the result in a global or in a formal parameter.
The --VirtualizeConditionalKinds=flag option seems to have multiple issues on MacOS/llvm. Presumably this is due to some compiler problem related to inline assembly.
Consider this example taken from gcc's comp-goto-1.c torture test:
```
goto *(base_addr + insn.f1.offset);
```
This kind of arithmetic on the program counter is going to fail for transformations that completely restructure the code, such as virtualization.

Function Virtualization

Diversity and Stealth

Dispatch Method Selection

Instruction Set Architecture Generation

ISA Diversity through Duplicate Operators and Superoperators

Instruction Handler Obfuscation

VPC Obfuscation

Bogus Functions

Bogus Loops

Reentrant Interpreters

Encoding the Program Array

Dynamic Program Arrays

Options

Examples

Issues