Hello to all the computer enthusiasts out there!
In this article,we will look into a simple C program intended to perform a particular task is MADE to perform a task it was not supposed to perform. SO,WE WILL BE TRYING TO FOOL THE COMPUTER!
Pre-requisites :
- A computer which runs Linux.
- A curious mind which wants to know how stuff actually works!
We will be using GDB(GNU Debugger) to understand the C program at the assembly level.
If GDB is not installed in your box,you can type in this command to install it.
sudo apt-get install gdb
Here is the source code of the executable we will be dealing with. It is named overflow.c. You will know by the end of this article,Why it was named so.
What does this program do?
a. In the main function,a printf() function is called to print the string “Before function call” .
b. The main function simply calls another function “print_string()” .
c. In the print_string() function , We have an uninitialised character array and WE HAVE ALLOCATED 30 BYTES OF SPACE for it.
d. In this function,Using gets() function , it asks us to enter a string.
e. It will simply print the string that we have entered.
f. After the “print_string” function is executed completely,CONTROL IS TRANSFERED BACK TO THE main function.
g. At the end,there is another printf () statement which prints “After function call”.
Think for a moment why the 2 printf() statements are present before and after the function call.
Compile the program(normally) using gcc :
gcc overflow.c -o overflow
This gives an executable named “overflow”.
It is important to note the second warning(the overflow.c : (.text + 0x39)) It says “the gets() function is dangerous and should not be used”.
You would have used gets() before. Have you given a thought of why this warning came up?
Running the executable with random inputs and observing what happens.
Let us go step by step.
a. The first 2 times it was executed,These are normal cases because the number of characters were less than 30.
b. The third time,The string length is greater than 30 but nothing wierd happened.
c. The last time it was executed,we get this “ stack smashing detected “ and the program is terminated right there. IF YOU NOTICE,The string “After function call” is not printed. This means BEFORE THE CONTROL GOT TRANSFERRED TO THE main function,THE PROGRAM WAS TERMINATED. Why did this happen? One obvious guess is that our string length was way more than 30 bytes. But what exactly happened??
d. To analyse this,we will have to compile our code in the following manner.
gcc overflow.c -o overflow -fno-stack-protector -zexecstack -g
e. One more thing,we will have to give random inputs like above to analyse the behavior of the executable. So everytime typing “aaaaaaaaaaaaaa….” 100 a’s for example is very cumbersome.WE HAVE A SOLUTION!
We will use python to solve this problem. Execute the program in following manner : $ python -c “print ‘a’ * 35” | ./overflow
The 35 can be replaced by any number you want as showed in the above screenshot.
Analyse the above screenshot
a. Though the string storage capacity is 30 bytes,in the first case,the program is able to take 35 bytes.
b. In the third case,something wierd happened. It says “Illegal Instruction” and “After function call” was not printed. So,control didn’t get transferred to the main program.
c. In the last case,again something wierd happened. It said “segmentation fault” and the program was terminated right away.
We obviously have to investigate the wierd cases and know if we can do something about it.
Finally, it is time to fireup your debugger!
$ gdb -q overflow
Follow the instruction in the screenshot.
a. (gdb)set disassembly-flavor intel : This means there are other flavors also right?Research about them.
b. (gdb)disass main : This dumped the assembly equivalent of the main function written in C.
c. (gdb)disass print_string() : This dumped the assembly equivalent of the print_string() function.
Analysis OF main() :
a. In the main function,we had 3 main tasks 1.printf() , 2.function call 3.again a printf().
b. We can easily figure out that tha main<+9> instruction is the printf(“\nBefore function call\n”), and main<+29> is printf(“\nAfter function call\n”);
c. What does main<+19> do? It says <+19> call 0x40058f < print_string > . From this,it is clear that our “print_string” function is called at <+19> instruction.(Address of this instruction is 0x400579).
Analysis of print_string() :
a. push rbp mov rbp,rsp sub rsp,0x20
What this does is,The system is making space(in the stack) to store our 30 bytes(string) .
b. print_string<+20> ,the gets() is called.(important!)
c. print_string<+32> ,the puts() is called.
After the execution of print_string(), How does the computer know that control should return ed back to main() again? Notice that there is something called “ret” in print_string<+39> .
(NOTE : The 0x0000000000400566 is the starting point of main() IN MY COMPUTER. IT MIGHT BE DIFFERENT IN DIFFERENT COMPUTERS).
Note our observation points and then run the program.
a. Let us stop before print_string() is called.
b. Let us stop after print_string() is called and then go step by step.
We can stop using the “break” instruction of gdb.
First,we break at main() and then at print_string() .Now we run the program.$(gdb)run
c. (gdb)ni-> means next instruction.Using “ni”,we can analyse each instruction as we go. AT 0X400579,print_string() is present.This is important.
Finally,we have stopped at gets(a).This instruction is not yet executed.
In the above screenshot, d. (gdb) x/32xw $rsp shows us the memory space (the space in stack) that system has allocated to store the string we input. There is something that is highlighted in the screenshot. Can you guess what this could be??
e. 0x0040057e -> Address of the instruction after “call print_string() “ in the main() function . [Go back to the assembly code of main() and check] . This means,when print_string() function is called, THIS “0X0040057e” or THE RETURN ADDRESS is stored.
f. After a few “ni”s,gets(a) is executed and we input our string. Then again,check the stack by using $(gdb) x/32xw $rsp . (refer the above screenshot).
g. We will run it again with nore ‘a’s and see what happens.
The space where return address or 0x0040057e is supposed to be present has now become 0x00400061. 0x61 is the ascii equivalent of ‘a’.So, Our string this time tried to overwrite the return address.
The computer now thinks that the return address is 0x00400061 ,but that is an invalid address (Or an address which is not used in this particular program).
Check out the screenshot below .
Let us see and check what happens . After a few “ni”s,The string is printed and then “Segmentation fault“Remember we had got the same error in the very beginning . (before we started analysing with gdb) .
Now we know why the error occurred. The address “0x00400061” was not accessible(or not a valid address) and that’s why we got an error. The control was NOT transferred back to main() .
Think about it once,What if we overwrite the actual return address by a new VALID address where the new address has instructions that WE want to execute?
THIS WAY,WE CAN EXECUTE WHICHEVER INSTRUCTIONS WE WANT.The statement we had at the beginning (trying to fool the computer) is slowly coming to our light right?
h. Ok,now that we know the trick,we can give any valid address and the instruction in that address will be executed.
What if we give the address of “print_string()”, will it be executed twice,Will it print the inputed string twice? . Let us see..
Let us note that the address of the instruction which calls print_string() is at 0x0000000000400579 .
Instead of using python,we will use another useful tool,the printf .
printf "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\x00\x79\x05\x40"|./overflow
There are 39 a’s and the address which calls print_string() in the reverse order.(WHY IS THAT??)
AND WE HAVE ACHIEVED IT!!The print_string() function was executed twice silently(with no wierdness) and control was returned back to main() again silently.
(Note:The number of a’s that I have put to reach the space where return address is stored may or may not be same as the number of a’s you might have to put to reach it. AND AGAIN,Our addresses need not be same.)
So,analyse the assembly code properly,make a note of all the important addresses and then proceed.
There are a few things that were left unexplained:
1. Why the source code was named “overflow.c” :What you have just done is a simple but an authentic example of what is known as “BUFFER OVERFLOW” . The a[30] is the buffer we had to store the string. When the strig length exceeded 30,the string is said to have overflown the buffer.
2. When compiled normally,we could not have done all the analysis that we did now. The latest compilers keep a check on buffer sizes and protects us from buffer overflows.
3. Check why we added -fno-stack-protector and -zexecstack while compiling the sourcecode.
4. Why we typed in the address in the reverse order in the last part?
And a few more for you to research.
There was one more thing. Executing instruction that WE wanted. That is definitely possible but is beyond the scope of this article. READ ABOUT THE ACTUAL IMPLICATIONS OF A BUFFER OVERFLOW. It is very interesting!
I know this is a lot of stuff to know at once.Go through the article several times,understand each and every bit.
I hoped you enjoyed the article and learnt something new out of it. Any kind of suggestion,feedback or appreciation:P ,leave a comment below.