Just came back from AthCon2013, where the organizers were generous enough to give me a free ticket for solving this year's reverse-engineeing challenge.
In all fairness, I was not the first to solve the challenge, but a mere 3rd. I'll try to find the names of all the people who solved the challenge and post them here.
The challenge was written by Kyriakos Economou & Nikolaos Tsapakis (check out their blog "The A.R.F. Project"), great job guys!
For anyone who's interesting following the writeup with the challenge or just giving it a go yourself, it's available here.
As the challenge was a fairly interesting one, I thought I'd post a writeup.
Which after closing, presents you with the following message-box containing the "bad-boy" message:
Inside the EXE, at the entry point (0x40107B) we find two function calls:
Also, note that edi now points to _blob_ptr.
OK, let's see what comes next:
The jump takes us here:
Interesting...let's try to take a look at another piece of code, supposing we took a different branch on the byte at _blob_ptr:
This is our moment of clarity.
All the underscored registers are actually a VM's register state, where _blob_ptr is the VM's eip, the BLOB is the bytecode, and the entire code we saw so far, is a single instruction cycle.
Now that we can name and role of all the variables and locations, we can decode all the instructions (there are quite a few of them and it takes a lot of patience, you can find the complete list in my repository link to github).
However, even if we do that, there's a small catch. Let's take a look at this little condition right there in the middle of the VM's switch:
Next, we scan what appears to be a 256-byte table at 0x407608 for vm_eip[0], and store the index in ecx.
Let's just call this area with the 16 bytes of bytecode the staging area.
Now, armed with this visual aid, we can take a look at the function at 0x409210:
So we have two questions now, 1) What's at 0x409FC3? and 2) What do the functions in that table do?.
Starting with the second question, I'll pick for example the 5th entry in the table: 0x409308.
Well, combined with what we know the previous function does, this just result in eax containing 2 when the program's flow continues at 0x409FC3:
In all fairness, I was not the first to solve the challenge, but a mere 3rd. I'll try to find the names of all the people who solved the challenge and post them here.
The challenge was written by Kyriakos Economou & Nikolaos Tsapakis (check out their blog "The A.R.F. Project"), great job guys!
For anyone who's interesting following the writeup with the challenge or just giving it a go yourself, it's available here.
As the challenge was a fairly interesting one, I thought I'd post a writeup.
SPOILER ALERT!!!
On running the EXE, you are greeted with a console window:Which after closing, presents you with the following message-box containing the "bad-boy" message:
Inside the EXE, at the entry point (0x40107B) we find two function calls:
- func@402CBC - This is the function that opens the console window and displays the greeting. Not very interesting.
- func@401086 - Triggers a jump to 0x407730, where the real program starts.
00407730 pushf 00407731 push esi 00407732 call $+5 00407737 pop esi 00407738 sub esi, 2Fh 0040773B push eax 0040773C mov eax, ebx 0040773E pop dword ptr [esi] 00407740 add eax, 24h 00407743 push ebx 00407744 mov ebx, ecx 00407746 pop dword ptr [esi+4] 00407749 add ebx, 20h 0040774C push ecx 0040774D mov ecx, edx 0040774F pop dword ptr [esi+8] 00407752 add ecx, 8 00407755 push edx 00407756 mov edx, esi 00407758 pop dword ptr [esi+0Ch] 0040775B add edx, 4 0040775E mov eax, esi 00407760 push eax 00407761 pop ebx 00407762 pop dword ptr [eax+10h] 00407765 add ebx, 0Ch 00407768 pop dword ptr [esi+20h] 0040776B push edi 0040776C add edi, eax 0040776E pop dword ptr [eax+14h] 00407771 add edi, 10h 00407774 push ebp 00407775 add ebp, edi 00407777 pop dword ptr [eax+18h] 0040777A add edi, 18h 0040777D push esp 0040777E add edi, 4 00407781 pop dword ptr [eax+1Ch] 00407784 add esi, 18h 00407787 call $+5 0040778C pop edi 0040778D sub edi, 18Ch 00407793 mov esi, [edi] 00407795 xor edi, edi 00407797 mov [eax+24h], esiThis (slightly obfuscated) code appears to save all the registers, to 0x407708 through 0x40772C. Let's name these addresses:
- 0x407708: _eax
- 0x40770C: _ebx
- 0x407710: _ecx
- 0x407714: _edx
- 0x407718: _esi
- 0x40771C: _edi
- 0x407720: _ebp
- 0x407724: _esp
- 0x407728: _eflags
- 0x40772C: _blob_ptr, contains the value 0x40CADE, which is an address to some binary data (tried disassembling - not code) within the EXE
0040779A sub esp, 800h 004077A0 call $+5 004077A5 pop edi 004077A6 sub edi, 79h 004077A9 jmp short @0x4077ACThis just loads edi with &_blob_ptr and jumps to 0x4077AC.
004077AC mov edi, [edi] 004077AE call $+5 004077B3 pop esi 004077B4 add esi, 28C1h 004077BA xor ebx, ebx 004077BC mov ebp, esp 004077BE mov ecx, edi 004077C0 add ecx, ebx 004077C2 mov esp, ecx 004077C4 pop edx 004077C5 mov esp, ebp 004077C7 mov ebp, esp 004077C9 mov ecx, esi 004077CB add ecx, ebx 004077CD mov esp, ecx 004077CF pop ecx 004077D0 mov cl, dl 004077D2 push ecx 004077D3 mov esp, ebp 004077D5 inc ebx 004077D6 push ebx 004077D7 xor ebx, 0Fh 004077DA pop ebx 004077DB jz short @0x4077DF 004077DD jmp short @0x4077BC 004077DF ...This is basically a loop (again, obfuscated, and I'll stop mentioning it now, as all the code is obfuscated) which copies 16 bytes from _blob_ptr to 0x40A074. Also, remember for later that esi ends up being set to 0x40A074, and this register will remain unchanged throughout the code.
004077DF mov ebp, esp 004077E1 mov edi, esi 004077E3 mov esp, edi 004077E5 pop ebx 004077E6 mov esp, ebp 004077E8 mov cl, bl 004077EA mov ebp, esp 004077EC mov edi, esi 004077EE inc edi 004077EF mov esp, edi 004077F1 pop ebx 004077F2 mov esp, ebp 004077F4 mov bh, bl 004077F6 mov bl, cl 004077F8 call $+5 004077FD pop edi 004077FE sub edi, 0D1h 00407804 xor bl, 8AhThis loads the byte at 0x40A074 to ebx and XORs it with 0x8A.
Also, note that edi now points to _blob_ptr.
OK, let's see what comes next:
00407807 push ebx 00407808 pop ecx 00407809 push ecx 0040780A xor cl, 1Fh 0040780D pushf 0040780E add al, 4 00407810 xor al, 18h 00407812 pop edx 00407813 pop ebx 00407814 and edx, 40h 00407817 jnz short @0x407822This is just a fancy way to check whether bl - i.e. the byte at 0x40A074 - i.e. the byte at _blob_ptr, XORed with 0x8A is equal to 0x1F. Let's follow that path:
00407822 push ebx 00407823 pop ecx 00407824 push ecx 00407825 xor ch, 0E0h 00407828 pushf 00407829 add al, 4 0040782B xor al, 18h 0040782D pop edx 0040782E pop ebx 0040782F and edx, 40h 00407832 jnz @0x40A7FFNow ch - i.e. bh - i.e. the second byte at 0x40A074 - i.e. the byte at _blob_ptr+1, is compared to 0xE0. Again, we swallow the bait:
0040A7FF mov edx, [edi-24h] 0040A802 push eax 0040A803 push ebx 0040A804 push ecx 0040A805 pop ecx 0040A806 pop ebx 0040A807 pop eax 0040A808 mov [edi], edx 0040A80A jmp @0x40A0CDAh, finally something interesting. Remember edi points to _blob_ptr? well, that means edi-0x24 points to _eax! And this piece of code copies _eax to _blob_ptr.
The jump takes us here:
0040A0CD mov ebx, eax 0040A0CF sub ecx, 10h 0040A0D2 sub edx, 14h 0040A0D5 add esp, 400h 0040A0DB sub esi, 18h 0040A0DE add esp, 400h 0040A0E4 sub edi, 1Ch 0040A0E7 jmp @0x40779AThis code basically does nothing, because if we follow the jump, we loop back to where we started (here - add a # link to where we jump)
Interesting...let's try to take a look at another piece of code, supposing we took a different branch on the byte at _blob_ptr:
00407819 xor cl, al 0040781B pushf 0040781C pop edx 0040781D and dl, 40h 00407820 jmp 0x40783F ... 0040783F push ebx 00407840 pop ecx 00407841 push ecx 00407842 xor cl, 6Dh 00407845 pushf 00407846 add al, 4 00407848 xor al, 18h 0040784A pop edx 0040784B pop ebx 0040784C and edx, 40h 0040784F jnz 0x40785A ... 0040785A push ebx 0040785B pop ecx 0040785C push ecx 0040785D xor ch, 1 00407860 pushf 00407861 add al, 4 00407863 xor al, 18h 00407865 pop edx 00407866 pop ebx 00407867 and edx, 40h 0040786A jnz 0x40A80F ... 0040A80F mov eax, [edi-4] 0040A812 mov ebx, [edi] 0040A814 add ebx, 2 0040A817 mov cl, [ebx] 0040A819 mov edx, [edi-24h] 0040A81C push eax 0040A81D popf 0040A81E shr edx, cl 0040A820 pushf 0040A821 pop edx 0040A822 mov [edi-4], edx 0040A825 mov [edi-24h], edx 0040A828 mov eax, [edi] 0040A82A add eax, 3 0040A82D mov [edi], eax 0040A82F jmp 0x40A0CD (go_to_start)So, just to be on the same page, the path I took was:
- _blob_ptr[0] ^ 0x8A != 0x1F
- _blob_ptr[0] ^ 0x8A == 0x6D
- _blob_ptr[1] == 0x01
eflags = _eflags; edx = _eax; edx <<= _blob_ptr[2]; _eflags = eflags; _eax = edx; _blob_ptr += 3;Ahhh, so basically, _eax is SHRed by the byte at _blob_ptr[2], and then _blob_ptr is advanced by 3, which is just the amount of byte we just processed. This looks exactly as if those 3 bytes just defined a SHR instruction.
This is our moment of clarity.
All the underscored registers are actually a VM's register state, where _blob_ptr is the VM's eip, the BLOB is the bytecode, and the entire code we saw so far, is a single instruction cycle.
Now that we can name and role of all the variables and locations, we can decode all the instructions (there are quite a few of them and it takes a lot of patience, you can find the complete list in my repository link to github).
However, even if we do that, there's a small catch. Let's take a look at this little condition right there in the middle of the VM's switch:
0040803B jnz 0x40BD7B ; PUSH IMM8 (with sign extend) 00408041 xor cl, al 00408043 pushf 00408044 pop edx 00408045 and dl, 40h 00408048 push ebx 00408049 mov ebx, [edi] 0040804B add ebx, 1200h 00408051 mov bl, [ebx] 00408053 cmp bl, 1 00408056 jnz 0x408145This extra condition looks at *(vm_eip+0x1200). If it's 1, then we just continue with the switch. However, if it's 0, then we go to some special handling. One important thing to note before we continue, is that the bytecode is probably 0x1200 bytes long. Also, the jump target is also the "default" handler for the switch statement.
00408145 pop ebx 00408146 xor bl, 8Ah 00408149 call $+5 0040814E pop eax 0040814F sub eax, 0B46h 00408154 xor ecx, ecx loop: 00408156 mov ebp, esp 00408158 mov edi, eax 0040815A mov esp, edi 0040815C pop edx 0040815D mov esp, ebp 0040815F cmp bl, dl 00408161 jz 0x408167 00408163 inc ecx 00408164 inc eax 00408165 jmp 0x408156 (loop)So we take vm_eip[0]^0x8A, and XOR it with 0x8A again, so we are left with so we are left with vm_eip[0] in bl.
Next, we scan what appears to be a 256-byte table at 0x407608 for vm_eip[0], and store the index in ecx.
00408167 mov ebp, esp 00408169 mov eax, esi 0040816B mov esp, eax 0040816D pop eax 0040816E mov al, cl 00408170 push eax 00408171 mov esp, ebpIf we recall that esi points to area to which those 16 bytes copied from the bytecode, we see that the first byte is replaced by the index found in the previous loop. In effect, the first byte of the current opcode is passed through a map.
Let's just call this area with the 16 bytes of bytecode the staging area.
00408173 call $+5 00408178 pop eax 00408179 add eax, 1E4Bh ; eax = 0x409FC3 0040817E push 0 00408180 push esi 00408181 push eax 00408182 push ebp 00408183 sub esp, 27h 00408186 mov ebp, esp 00408188 push ecx 00408189 push edx 0040818A push esi 0040818B call 0x409210The stack is prepared so that when entering the function at 0x409210, the stack would look like this:
Now, armed with this visual aid, we can take a look at the function at 0x409210:
00409210 pop esi 00409211 push dword ptr [ebp+2Fh] 00409214 pop dword ptr [ebp+23h] 00409217 mov byte ptr [ebp+22h], 0 0040921B mov dword ptr [ebp+2], 20h 00409222 mov dword ptr [ebp+6], 20h 00409229 cmp dword ptr [ebp+33h], 40h 0040922D jnz 0x409236 (label1) 0040922F mov dword ptr [ebp+6], 40h 00409236 label1: 00409236 mov eax, [ebp+23h] 00409239 movzx ecx, byte ptr [eax] 0040923C lea eax, [esi+ecx*4] 0040923F add eax, [eax] 00409241 add eax, 4 00409244 call eax 00409246 cmp eax, 0FFFFFFFFh 00409249 jz 0x409251 (label2) 0040924B mov eax, [ebp+23h] 0040924E sub eax, [ebp+2Fh] 00409251 label2: 00409251 pop esi 00409252 pop edx 00409253 pop ecx 00409254 add esp, 27h 00409257 pop ebp 00409258 retn 8That tangle of code, actually does something very simple. I'll break it down:
- Note that ebp+0x33 contains 0, which means that the first branch will always be taken.
- esi swallows the function's return address.
- The first push-pop pair puts the staging area's pointer in ebp+0x23.
- The first byte in the staging area then serves as an index to some function table that starts at esi - now the address right after the current function's call.
- On a return value different from -1 (which I can only imagine to be a failure of the looked-up function), eax will contain the difference between ebp+0x2F and ebp+0x23 (Odd...didn't we say the contain the same value?)
- Finally, the stack is unwound in a way, that on executing ret, the flow returns to code_ptr1=0x409FC3.
So we have two questions now, 1) What's at 0x409FC3? and 2) What do the functions in that table do?.
Starting with the second question, I'll pick for example the 5th entry in the table: 0x409308.
00409308 add dword ptr [ebp+23h], 2 0040930C retn
00409FC3 call $+5 00409FC8 pop edi 00409FC9 add edi, 0AChNow edi points to the staging area.
00409FCF xor ecx, ecx 00409FD1 mov ecx, 0Fh 00409FD6 sub ecx, eax 00409FD8 sub ecx, 2 00409FDB add edi, eaxAdvance edi by look_up_func_result. And load ecx with 15-look_up_func_result-2.
00409FDD push ebx 00409FDE mov ebp, esp 00409FE0 mov ebx, edi 00409FE2 mov esp, ebx 00409FE4 pop ebx 00409FE5 mov bl, 0EBh 00409FE7 push ebxStore 0xEB at staging_area+look_up_func_result.
00409FE8 mov esp, ebp 00409FEA pop ebx 00409FEB push ebx 00409FEC mov ebp, esp 00409FEE mov ebx, edi 00409FF0 inc ebx 00409FF1 mov esp, ebx 00409FF3 pop ebx 00409FF4 mov bl, cl 00409FF6 push ebxAnd store 15-look_up_func_result-2 at staging_area+look_up_func_result+1.
Very weird, let's see where this is going.
00409FF7 mov esp, ebp 00409FF9 pop ebx 00409FFA jmp 0x40A00B (label1) 00409FFC push eax 00409FFD pop ebx 00409FFE inc ebx 00409FFF push ecx 0040A000 pop edx 0040A001 inc edx 0040A002 push esi 0040A003 pop edi 0040A004 inc edi 0040A005 pop esi 0040A006 pop ebx 0040A007 pop edx 0040A008 pop eax 0040A009 leave 0040A00A retn 0040A00B label1: 0040A00B call $+5 0040A010 pop esi 0040A011 sub esi, 14h 0040A014 add esi, 0Eh 0040A017 inc edi 0040A018 add edi, ecxSo now esi=0x40A00A (which is the retn right before label1), and edi=staging_area+2+look_up_func_result+(15-look_up_func_result-2)=staging_area+15.
0040A01A loop: 0040A01A push ebx 0040A01B mov ebp, esp 0040A01D mov ebx, esi 0040A01F mov esp, ebx 0040A021 pop edx 0040A022 mov esp, ebp 0040A024 pop ebx 0040A025 push ebx 0040A026 mov ebp, esp 0040A028 mov ebx, edi 0040A02A mov esp, ebx 0040A02C pop ebx 0040A02D mov bl, dl 0040A02F push ebx 0040A030 mov esp, ebp 0040A032 pop ebx 0040A033 dec esi 0040A034 dec edi 0040A035 dec cl 0040A037 jz 0x40A03B 0040A039 jmp 0x40A01A (loop)This just copies backward ecx=15-look_up_func_result-2 bytes from 0x40A00A, to where edi points now in the staging area.
0040A03B call $+5 0040A040 pop edi 0040A041 sub edi, 2914h 0040A047 add [edi], eax 0040A049 mov ebx, [edi-24h] 0040A04C mov eax, [edi-24h] 0040A04F mov ecx, [edi-20h] 0040A052 mov ebx, [edi-20h] 0040A055 mov edx, [edi-14h] 0040A058 mov ecx, [edi-1Ch] 0040A05B mov esi, [edi-0Ch] 0040A05E mov edx, [edi-18h] 0040A061 mov ebp, [edi-8] 0040A064 mov esi, [edi-14h] 0040A067 mov ebp, [edi-0Ch] 0040A06A mov esp, [edi-8] 0040A06D push dword ptr [edi-4] 0040A070 popf 0040A071 mov edi, [edi-10h]If we calculate edi, then we'll see it's the address of vm_eip. Now it's easy to see that this just loads all the machine registers from the VM's registers.
And now we come, right in the middle of our program flow, to the staging area (?!).
So let's just pause right here, and try to think what the contents of the staging area should be at this point:
- Originally, the staging area had 15 bytes of bytecode.
- Then the first byte was translated via some table.
- Then some value, X, was calculated based on running a function from some other table.
- And then 0xEB and (13-X) were written in the middle.
Since the is now actually the code that gets executed, we can only guess, that what we have here is a native instruction which was "hidden" in the bytecode, followed by 0xEB, which if we look at the x86 opcode table, we see that it's actually the opcode for jmp rel8, where the jump would take us exactly beyond the staging area, right here:
0040A083 push edi 0040A084 pushf 0040A085 sub esp, 800h 0040A08B call $+5 0040A090 pop edi 0040A091 sub edi, 2964h 0040A097 add esp, 800h 0040A09D pop dword ptr [edi-4] 0040A0A0 pop dword ptr [edi-10h] 0040A0A3 mov [edi-24h], eax 0040A0A6 mov eax, [edi-8] 0040A0A9 mov [edi-20h], ebx 0040A0AC mov ebx, [edi-14h] 0040A0AF mov [edi-1Ch], ecx 0040A0B2 mov ecx, [edi-0Ch] 0040A0B5 mov [edi-18h], edx 0040A0B8 mov edx, ecx 0040A0BA mov [edi-14h], esi 0040A0BD mov esi, eax 0040A0BF mov [edi-0Ch], ebp 0040A0C2 mov ebp, [edi-24h] 0040A0C5 mov [edi-8], esp 0040A0C8 jmp 0x40779A (start_of_vm_cycle)Which stores the resulting native machine state into the VM's state, and goes on to process the next VM instruction.
Now we understand that the function table is just a crooked way to encode the length of native instructions based on their opcode.
So to summarize:
- We have a virtual machine whose state (all the standard registers) are stored at 0x407708 for eax, to 0x40772C for eip.
- The VM's bytecode is 0x1200 bytes long and is at 0x40CADE.
- In each cycle, 15 bytes of bytecode are copied to a staging area at 0x40A074.
- The first byte of the current instruction is XORed with 0x8A, and serves as a switch parameter to the instruction's handler.
- At a certain point in the switch, if no match has been found yet, a mask corresponding to the current instruction (current VM eip + 0x1200) is tested to decide wheather to continue oescending the switch, or to fall to default.
- The default handler is native execution of the bytecode, but opcode must first be decoded by looking-up the table at 0x407608, and the corresponding instruction length is calculated using the functions in the table at 0x409FC3.
You can find the disassembler code here, I'm warning you, it ain't pretty, but it gets the job done.
Now we can look at what the machine is trying to do. But that's for next time.
Now we can look at what the machine is trying to do. But that's for next time.