Dany's 'blog: June 2013

Just came back from AthCon2013, where the organizers were generous enough to give me a free ticket for solving this year's reverse-engineeing challenge.
In all fairness, I was not the first to solve the challenge, but a mere 3rd. I'll try to find the names of all the people who solved the challenge and post them here.
The challenge was written by Kyriakos Economou & Nikolaos Tsapakis (check out their blog "The A.R.F. Project"), great job guys!
For anyone who's interesting following the writeup with the challenge or just giving it a go yourself, it's available here.
As the challenge was a fairly interesting one, I thought I'd post a writeup.

SPOILER ALERT!!!

On running the EXE, you are greeted with a console window:

Which after closing, presents you with the following message-box containing the "bad-boy" message:

Inside the EXE, at the entry point (0x40107B) we find two function calls:

func@402CBC - This is the function that opens the console window and displays the greeting. Not very interesting.
func@401086 - Triggers a jump to 0x407730, where the real program starts.

So let's start dissecting that program:

00407730    pushf
00407731    push    esi  
00407732    call    $+5  
00407737    pop     esi  
00407738    sub     esi, 2Fh     
0040773B    push    eax  
0040773C    mov     eax, ebx     
0040773E    pop     dword ptr [esi]
00407740    add     eax, 24h     
00407743    push    ebx  
00407744    mov     ebx, ecx     
00407746    pop     dword ptr [esi+4]
00407749    add     ebx, 20h     
0040774C    push    ecx  
0040774D    mov     ecx, edx     
0040774F    pop     dword ptr [esi+8]
00407752    add     ecx, 8
00407755    push    edx  
00407756    mov     edx, esi     
00407758    pop     dword ptr [esi+0Ch]
0040775B    add     edx, 4
0040775E    mov     eax, esi     
00407760    push    eax  
00407761    pop     ebx  
00407762    pop     dword ptr [eax+10h]
00407765    add     ebx, 0Ch     
00407768    pop     dword ptr [esi+20h]
0040776B    push    edi  
0040776C    add     edi, eax     
0040776E    pop     dword ptr [eax+14h]
00407771    add     edi, 10h     
00407774    push    ebp  
00407775    add     ebp, edi     
00407777    pop     dword ptr [eax+18h]
0040777A    add     edi, 18h     
0040777D    push    esp  
0040777E    add     edi, 4
00407781    pop     dword ptr [eax+1Ch]
00407784    add     esi, 18h     
00407787    call    $+5  
0040778C    pop     edi  
0040778D    sub     edi, 18Ch    
00407793    mov     esi, [edi]   
00407795    xor     edi, edi     
00407797    mov     [eax+24h], esi

This (slightly obfuscated) code appears to save all the registers, to 0x407708 through 0x40772C. Let's name these addresses:

0x407708: _eax
0x40770C: _ebx
0x407710: _ecx
0x407714: _edx
0x407718: _esi
0x40771C: _edi
0x407720: _ebp
0x407724: _esp
0x407728: _eflags
0x40772C: _blob_ptr, contains the value 0x40CADE, which is an address to some binary data (tried disassembling - not code) within the EXE

0040779A    sub     esp, 800h    
004077A0    call    $+5  
004077A5    pop     edi  
004077A6    sub     edi, 79h     
004077A9    jmp     short @0x4077AC

This just loads edi with &_blob_ptr and jumps to 0x4077AC.

004077AC    mov     edi, [edi]      
004077AE    call    $+5  
004077B3    pop     esi             
004077B4    add     esi, 28C1h      
004077BA    xor     ebx, ebx        
004077BC    mov     ebp, esp        
004077BE    mov     ecx, edi        
004077C0    add     ecx, ebx     
004077C2    mov     esp, ecx        
004077C4    pop     edx             
004077C5    mov     esp, ebp        
004077C7    mov     ebp, esp     
004077C9    mov     ecx, esi        
004077CB    add     ecx, ebx     
004077CD    mov     esp, ecx        
004077CF    pop     ecx             
004077D0    mov     cl, dl
004077D2    push    ecx             
004077D3    mov     esp, ebp     
004077D5    inc     ebx  
004077D6    push    ebx  
004077D7    xor     ebx, 0Fh     
004077DA    pop     ebx  
004077DB    jz      short @0x4077DF
004077DD    jmp     short @0x4077BC
004077DF    ...

This is basically a loop (again, obfuscated, and I'll stop mentioning it now, as all the code is obfuscated) which copies 16 bytes from _blob_ptr to 0x40A074. Also, remember for later that esi ends up being set to 0x40A074, and this register will remain unchanged throughout the code.

004077DF    mov     ebp, esp     
004077E1    mov     edi, esi     
004077E3    mov     esp, edi     
004077E5    pop     ebx  
004077E6    mov     esp, ebp     
004077E8    mov     cl, bl
004077EA    mov     ebp, esp     
004077EC    mov     edi, esi     
004077EE    inc     edi  
004077EF    mov     esp, edi     
004077F1    pop     ebx  
004077F2    mov     esp, ebp     
004077F4    mov     bh, bl
004077F6    mov     bl, cl
004077F8    call    $+5  
004077FD    pop     edi  
004077FE    sub     edi, 0D1h    
00407804    xor     bl, 8Ah

This loads the byte at 0x40A074 to ebx and XORs it with 0x8A.
Also, note that edi now points to _blob_ptr.
OK, let's see what comes next:

00407807    push    ebx
00407808    pop     ecx
00407809    push    ecx
0040780A    xor     cl, 1Fh
0040780D    pushf
0040780E    add     al, 4
00407810    xor     al, 18h
00407812    pop     edx
00407813    pop     ebx
00407814    and     edx, 40h
00407817    jnz     short @0x407822

This is just a fancy way to check whether bl - i.e. the byte at 0x40A074 - i.e. the byte at _blob_ptr, XORed with 0x8A is equal to 0x1F. Let's follow that path:

00407822    push    ebx  
00407823    pop     ecx  
00407824    push    ecx  
00407825    xor     ch, 0E0h     
00407828    pushf
00407829    add     al, 4
0040782B    xor     al, 18h
0040782D    pop     edx  
0040782E    pop     ebx  
0040782F    and     edx, 40h     
00407832    jnz     @0x40A7FF

Now ch - i.e. bh - i.e. the second byte at 0x40A074 - i.e. the byte at _blob_ptr+1, is compared to 0xE0. Again, we swallow the bait:

0040A7FF    mov     edx, [edi-24h]
0040A802    push    eax  
0040A803    push    ebx  
0040A804    push    ecx  
0040A805    pop     ecx  
0040A806    pop     ebx  
0040A807    pop     eax  
0040A808    mov     [edi], edx   
0040A80A    jmp     @0x40A0CD

Ah, finally something interesting. Remember edi points to _blob_ptr? well, that means edi-0x24 points to _eax! And this piece of code copies _eax to _blob_ptr.
The jump takes us here:

0040A0CD    mov     ebx, eax     
0040A0CF    sub     ecx, 10h     
0040A0D2    sub     edx, 14h     
0040A0D5    add     esp, 400h    
0040A0DB    sub     esi, 18h     
0040A0DE    add     esp, 400h    
0040A0E4    sub     edi, 1Ch     
0040A0E7    jmp     @0x40779A

This code basically does nothing, because if we follow the jump, we loop back to where we started (here - add a # link to where we jump)
Interesting...let's try to take a look at another piece of code, supposing we took a different branch on the byte at _blob_ptr:

00407819    xor     cl, al
0040781B    pushf
0040781C    pop     edx  
0040781D    and     dl, 40h
00407820    jmp     0x40783F     
            ...
0040783F    push    ebx  
00407840    pop     ecx  
00407841    push    ecx  
00407842    xor     cl, 6Dh
00407845    pushf
00407846    add     al, 4
00407848    xor     al, 18h
0040784A    pop     edx  
0040784B    pop     ebx  
0040784C    and     edx, 40h     
0040784F    jnz     0x40785A     
            ...
0040785A    push    ebx  
0040785B    pop     ecx  
0040785C    push    ecx  
0040785D    xor     ch, 1
00407860    pushf
00407861    add     al, 4
00407863    xor     al, 18h
00407865    pop     edx  
00407866    pop     ebx  
00407867    and     edx, 40h     
0040786A    jnz     0x40A80F
            ...
0040A80F    mov     eax, [edi-4] 
0040A812    mov     ebx, [edi]   
0040A814    add     ebx, 2
0040A817    mov     cl, [ebx]    
0040A819    mov     edx, [edi-24h]
0040A81C    push    eax  
0040A81D    popf
0040A81E    shr     edx, cl
0040A820    pushf
0040A821    pop     edx  
0040A822    mov     [edi-4], edx 
0040A825    mov     [edi-24h], edx
0040A828    mov     eax, [edi]   
0040A82A    add     eax, 3
0040A82D    mov     [edi], eax   
0040A82F    jmp     0x40A0CD (go_to_start)

So, just to be on the same page, the path I took was:

_blob_ptr[0] ^ 0x8A != 0x1F
_blob_ptr[0] ^ 0x8A == 0x6D
_blob_ptr[1] == 0x01

And this brings us to that last piece of code. Again, remember edi = &_blob_ptr, this makes the code equivalent to:

eflags = _eflags;
edx = _eax;
edx <<= _blob_ptr[2];
_eflags = eflags;
_eax = edx;
_blob_ptr += 3;

Ahhh, so basically, _eax is SHRed by the byte at _blob_ptr[2], and then _blob_ptr is advanced by 3, which is just the amount of byte we just processed. This looks exactly as if those 3 bytes just defined a SHR instruction.
This is our moment of clarity.
All the underscored registers are actually a VM's register state, where _blob_ptr is the VM's eip, the BLOB is the bytecode, and the entire code we saw so far, is a single instruction cycle.
Now that we can name and role of all the variables and locations, we can decode all the instructions (there are quite a few of them and it takes a lot of patience, you can find the complete list in my repository link to github).
However, even if we do that, there's a small catch. Let's take a look at this little condition right there in the middle of the VM's switch:

0040803B    jnz     0x40BD7B ; PUSH IMM8 (with sign extend)
00408041    xor     cl, al
00408043    pushf 
00408044    pop     edx   
00408045    and     dl, 40h
00408048    push    ebx   
00408049    mov     ebx, [edi] 
0040804B    add     ebx, 1200h 
00408051    mov     bl, [ebx] 
00408053    cmp     bl, 1 
00408056    jnz     0x408145

This extra condition looks at *(vm_eip+0x1200). If it's 1, then we just continue with the switch. However, if it's 0, then we go to some special handling. One important thing to note before we continue, is that the bytecode is probably 0x1200 bytes long. Also, the jump target is also the "default" handler for the switch statement.

00408145    pop     ebx  
00408146    xor     bl, 8Ah
00408149    call    $+5  
0040814E    pop     eax  
0040814F    sub     eax, 0B46h   
00408154    xor     ecx, ecx     
          loop:
00408156    mov     ebp, esp     
00408158    mov     edi, eax     
0040815A    mov     esp, edi     
0040815C    pop     edx  
0040815D    mov     esp, ebp     
0040815F    cmp     bl, dl
00408161    jz      0x408167
00408163    inc     ecx  
00408164    inc     eax  
00408165    jmp     0x408156 (loop)

So we take vm_eip[0]^0x8A, and XOR it with 0x8A again, so we are left with so we are left with vm_eip[0] in bl.
Next, we scan what appears to be a 256-byte table at 0x407608 for vm_eip[0], and store the index in ecx.

00408167    mov     ebp, esp
00408169    mov     eax, esi
0040816B    mov     esp, eax
0040816D    pop     eax
0040816E    mov     al, cl
00408170    push    eax
00408171    mov     esp, ebp

If we recall that esi points to area to which those 16 bytes copied from the bytecode, we see that the first byte is replaced by the index found in the previous loop. In effect, the first byte of the current opcode is passed through a map.
Let's just call this area with the 16 bytes of bytecode the staging area.

00408173    call    $+5
00408178    pop     eax
00408179    add     eax, 1E4Bh ; eax = 0x409FC3
0040817E    push    0
00408180    push    esi
00408181    push    eax
00408182    push    ebp
00408183    sub     esp, 27h
00408186    mov     ebp, esp
00408188    push    ecx
00408189    push    edx
0040818A    push    esi
0040818B    call    0x409210

The stack is prepared so that when entering the function at 0x409210, the stack would look like this:

Now, armed with this visual aid, we can take a look at the function at 0x409210:

00409210    pop     esi
00409211    push    dword ptr [ebp+2Fh]
00409214    pop     dword ptr [ebp+23h]
00409217    mov     byte ptr [ebp+22h], 0
0040921B    mov     dword ptr [ebp+2], 20h
00409222    mov     dword ptr [ebp+6], 20h
00409229    cmp     dword ptr [ebp+33h], 40h
0040922D    jnz     0x409236 (label1)
0040922F    mov     dword ptr [ebp+6], 40h
00409236 label1:
00409236    mov     eax, [ebp+23h]
00409239    movzx   ecx, byte ptr [eax]
0040923C    lea     eax, [esi+ecx*4]
0040923F    add     eax, [eax]
00409241    add     eax, 4
00409244    call    eax
00409246    cmp     eax, 0FFFFFFFFh
00409249    jz      0x409251 (label2)
0040924B    mov     eax, [ebp+23h]
0040924E    sub     eax, [ebp+2Fh]
00409251 label2:
00409251    pop     esi
00409252    pop     edx
00409253    pop     ecx
00409254    add     esp, 27h
00409257    pop     ebp
00409258    retn    8

That tangle of code, actually does something very simple. I'll break it down:

Note that ebp+0x33 contains 0, which means that the first branch will always be taken.
esi swallows the function's return address.
The first push-pop pair puts the staging area's pointer in ebp+0x23.
The first byte in the staging area then serves as an index to some function table that starts at esi - now the address right after the current function's call.
On a return value different from -1 (which I can only imagine to be a failure of the looked-up function), eax will contain the difference between ebp+0x2F and ebp+0x23 (Odd...didn't we say the contain the same value?)
Finally, the stack is unwound in a way, that on executing ret, the flow returns to code_ptr1=0x409FC3.

The only visible side effects are whatever the function at the table did, and the result stored in eax. I have a feeling the two are connected.
So we have two questions now, 1) What's at 0x409FC3? and 2) What do the functions in that table do?.
Starting with the second question, I'll pick for example the 5th entry in the table: 0x409308.

00409308    add     dword ptr [ebp+23h], 2
0040930C    retn

Well, combined with what we know the previous function does, this just result in eax containing 2 when the program's flow continues at 0x409FC3:

00409FC3    call    $+5
00409FC8    pop     edi
00409FC9    add     edi, 0ACh

Now edi points to the staging area.

00409FCF    xor     ecx, ecx
00409FD1    mov     ecx, 0Fh
00409FD6    sub     ecx, eax
00409FD8    sub     ecx, 2
00409FDB    add     edi, eax

Advance edi by look_up_func_result. And load ecx with 15-look_up_func_result-2.

00409FDD    push    ebx
00409FDE    mov     ebp, esp
00409FE0    mov     ebx, edi
00409FE2    mov     esp, ebx
00409FE4    pop     ebx
00409FE5    mov     bl, 0EBh
00409FE7    push    ebx

Store 0xEB at staging_area+look_up_func_result.

00409FE8    mov     esp, ebp
00409FEA    pop     ebx
00409FEB    push    ebx
00409FEC    mov     ebp, esp
00409FEE    mov     ebx, edi
00409FF0    inc     ebx
00409FF1    mov     esp, ebx
00409FF3    pop     ebx
00409FF4    mov     bl, cl
00409FF6    push    ebx

And store 15-look_up_func_result-2 at staging_area+look_up_func_result+1.
Very weird, let's see where this is going.

00409FF7    mov     esp, ebp
00409FF9    pop     ebx
00409FFA    jmp     0x40A00B (label1)
00409FFC    push    eax
00409FFD    pop     ebx
00409FFE    inc     ebx
00409FFF    push    ecx
0040A000    pop     edx
0040A001    inc     edx
0040A002    push    esi
0040A003    pop     edi
0040A004    inc     edi
0040A005    pop     esi
0040A006    pop     ebx
0040A007    pop     edx
0040A008    pop     eax
0040A009    leave
0040A00A    retn
0040A00B label1:
0040A00B    call    $+5
0040A010    pop     esi
0040A011    sub     esi, 14h
0040A014    add     esi, 0Eh
0040A017    inc     edi
0040A018    add     edi, ecx

So now esi=0x40A00A (which is the retn right before label1), and edi=staging_area+2+look_up_func_result+(15-look_up_func_result-2)=staging_area+15.

0040A01A loop:
0040A01A    push    ebx
0040A01B    mov     ebp, esp
0040A01D    mov     ebx, esi
0040A01F    mov     esp, ebx
0040A021    pop     edx
0040A022    mov     esp, ebp
0040A024    pop     ebx
0040A025    push    ebx
0040A026    mov     ebp, esp
0040A028    mov     ebx, edi
0040A02A    mov     esp, ebx
0040A02C    pop     ebx
0040A02D    mov     bl, dl
0040A02F    push    ebx
0040A030    mov     esp, ebp
0040A032    pop     ebx
0040A033    dec     esi
0040A034    dec     edi
0040A035    dec     cl
0040A037    jz      0x40A03B
0040A039    jmp     0x40A01A (loop)

This just copies backward ecx=15-look_up_func_result-2 bytes from 0x40A00A, to where edi points now in the staging area.

0040A03B    call    $+5
0040A040    pop     edi
0040A041    sub     edi, 2914h
0040A047    add     [edi], eax
0040A049    mov     ebx, [edi-24h]
0040A04C    mov     eax, [edi-24h]
0040A04F    mov     ecx, [edi-20h]
0040A052    mov     ebx, [edi-20h]
0040A055    mov     edx, [edi-14h]
0040A058    mov     ecx, [edi-1Ch]
0040A05B    mov     esi, [edi-0Ch]
0040A05E    mov     edx, [edi-18h]
0040A061    mov     ebp, [edi-8]
0040A064    mov     esi, [edi-14h]
0040A067    mov     ebp, [edi-0Ch]
0040A06A    mov     esp, [edi-8]
0040A06D    push    dword ptr [edi-4]
0040A070    popf
0040A071    mov     edi, [edi-10h]

If we calculate edi, then we'll see it's the address of vm_eip. Now it's easy to see that this just loads all the machine registers from the VM's registers.
And now we come, right in the middle of our program flow, to the staging area (?!).
So let's just pause right here, and try to think what the contents of the staging area should be at this point:

Originally, the staging area had 15 bytes of bytecode.
Then the first byte was translated via some table.
Then some value, X, was calculated based on running a function from some other table.
And then 0xEB and (13-X) were written in the middle.

This kind of looks like this:

Since the is now actually the code that gets executed, we can only guess, that what we have here is a native instruction which was "hidden" in the bytecode, followed by 0xEB, which if we look at the x86 opcode table, we see that it's actually the opcode for jmp rel8, where the jump would take us exactly beyond the staging area, right here:

0040A083    push    edi
0040A084    pushf
0040A085    sub     esp, 800h
0040A08B    call    $+5
0040A090    pop     edi
0040A091    sub     edi, 2964h
0040A097    add     esp, 800h
0040A09D    pop     dword ptr [edi-4]
0040A0A0    pop     dword ptr [edi-10h]
0040A0A3    mov     [edi-24h], eax
0040A0A6    mov     eax, [edi-8]
0040A0A9    mov     [edi-20h], ebx
0040A0AC    mov     ebx, [edi-14h]
0040A0AF    mov     [edi-1Ch], ecx
0040A0B2    mov     ecx, [edi-0Ch]
0040A0B5    mov     [edi-18h], edx
0040A0B8    mov     edx, ecx
0040A0BA    mov     [edi-14h], esi
0040A0BD    mov     esi, eax
0040A0BF    mov     [edi-0Ch], ebp
0040A0C2    mov     ebp, [edi-24h]
0040A0C5    mov     [edi-8], esp
0040A0C8    jmp     0x40779A (start_of_vm_cycle)

Which stores the resulting native machine state into the VM's state, and goes on to process the next VM instruction.
Now we understand that the function table is just a crooked way to encode the length of native instructions based on their opcode.
So to summarize:

We have a virtual machine whose state (all the standard registers) are stored at 0x407708 for eax, to 0x40772C for eip.
The VM's bytecode is 0x1200 bytes long and is at 0x40CADE.
In each cycle, 15 bytes of bytecode are copied to a staging area at 0x40A074.
The first byte of the current instruction is XORed with 0x8A, and serves as a switch parameter to the instruction's handler.
At a certain point in the switch, if no match has been found yet, a mask corresponding to the current instruction (current VM eip + 0x1200) is tested to decide wheather to continue oescending the switch, or to fall to default.
The default handler is native execution of the bytecode, but opcode must first be decoded by looking-up the table at 0x407608, and the corresponding instruction length is calculated using the functions in the table at 0x409FC3.

You can find the disassembler code here, I'm warning you, it ain't pretty, but it gets the job done.
Now we can look at what the machine is trying to do. But that's for next time.

Dany's 'blog

Monday, June 17, 2013

AthCon2013 RE challenge - part 1