Who?
- Sebastian Millius
- Software engineer at Google (but I did this in my spare time); malware analysis
- MSc Computer Science ETHZ - focus on theoretical computer science
- Solution received: Tue, 14 Apr 2015 08:48:03 -0700 (PDT)
How?
I wrote a disassembler & "decompiler" for the VM (decompiler.zip). I manually (using IDA Pro) mapped out the opcodes and extracted the embedded program. The disassembler really just outputs, e.g.:
0x9: push(0); 0x12: push(8); 0x1b: push(0); 0x1c: a1 = pop(); a0 = pop(); push(a1 * a0); 0x1d: a1 = pop(); a0 = pop(); push(a1 + a0); 0x22: push(r_16); 0x23: a1 = pop(); a0 = pop(); push(a0 + a1); 0x2c: push(8); 0x35: push(0);
In order to get closer to the original code, I just did constant propagation. This produces some "pseudo" C, that I manually fixed to get the source code, e.g (challenge 3):
r_16[0] = (1103356944384 + ((in << 1) | (in >> 63))); r_16[1] = ((in >> (((7 & (r_16[0] >> 4)) | 1) & 0xff)) | 1021570277); r_16[2] = ((in >> 1) | (in << 63)); r_16[3] = (191560872 + in); r_16[0] = (r_16[2] * r_16[2]); r_16[2] = (r_16[2] + r_16[2]); out[0] = (((r_16[0] - r_16[1]) << (((15 & ((r_16[2] << (((7 & (((r_16[3] >> 3) | (r_16[3] << 61)) >> 1)) | 1) & 0xff)) >> 1)) | 1) & 0xff)) | ((r_16[0] - r_16[1]) >> ((64 - ((15 & ((r_16[2] << (((7 & (((r_16[3] >> 3) | (r_16[3] << 61)) >> 1)) | 1) & 0xff)) >> 1)) | 1)) & 0xff)));
I didn't cleanup the generated codes too much (this looks close enough from what I can see comparing to the examples posted on your website), but did manually transform the jumps into loops - doing some graph/flow analysis should be easily possible, but I didn't implement this for these simple challenges in this set. I attached all the scripts that produce the pseudo C output for all 5 challenges (decompiler_0.py, decompiler_1.py, decompiler_2.py, decompiler_3.py, decompiler_3.py). I wrote the code mostly in one sitting and I didn't clean-up too much (my appologies for that). Maybe I try to improve it, if I find some time looking at the other harder sets.
Tools used
- IDA Pro - manually map the opcodes
- Python - "decompiler" script
- Debugger/gdb - debugging
Time spent
- Roughly 8 hours
- 4 hours to the basic "decompiler" script working.
- ~ 2 hours getting it to work for all 5 challenges, clean-up and improvements.
- ~ 2 hours debugging (it's easy to get the parameter ordering in the wrong way), fixing, write-up
Comments
This was an interesting challenge; more from a code analysis point of view - the reversing was pretty basic.