Altiris-La-Vista: The Secrets Within…
By Kevin O'Reilly, 29 May 2014
Recently at Context we were asked by a client to perform an infrastructure test on an environment which made use of a deployment solution called Altiris by Symantec. One of the many facets of this wide-ranging solution is a Linux PXE boot image which is made available to the network, and one of its treasure troves from the perspective of a network penetration test is the presence of PWL files within the image. These are Microsoft password files which were originally used back in the days of Windows 9x, and aren't used on NT-based systems any more. But Altiris still uses them, although with their own implementation and encryption, and they contain credentials which are of interest to a ‘pentester’ or attacker. Unfortunately their contents are encrypted, but at some stage they will be decrypted so that they can be used...
So the question is what decrypts them in this environment? The answer is a command line tool also contained with the Linux PXE boot image called asmbmount. This tool takes a PWL file and mounts an SMB/CIFS share as per its command line arguments. So this tool must at some stage decrypt the PWL file's contents. What would be useful from the perspective of a network penetration tester or attacker is a tool which just displayed the decoded credentials to the standard out.
This is where my interest comes in. We could, of course, reverse engineer the decryption algorithm from this tool and re-implement it to do what we want. But is there any easier way? The ideal solution would be to find some simple modification we could make to the original binary to just have it print out the decrypted credentials used to mount the share. Perhaps we can tweak the code to print the credentials to stdout.
So what do we need? The reverse engineer's most prized tool: a disassembler. IDA Pro is the disassembler of choice and it will also allow us to do some remote debugging if needs be, connecting to the GNU debugger, gdb, via gdbserver. So after making sure gdbserver is installed, we load up our asbmount binary into IDA and let the reversing begin.
Once we’ve loaded the executable up in IDA and let the disassembly process complete, the first task is to inspect the code to see if our aim is feasible and hopefully fairly straightforward. The most obvious place to start is to list the strings in the executable, to see if anything like ‘username’ or ‘password’ stands out.
To list the strings, we can go to ‘View’ in the menu bar, select ‘Open subviews’ then ‘Strings’ or use the shortcut Shift+F12. Once in the strings window we can search by hitting Alt-T and entering something like ‘username’ to affect the search.
We see that we arrive immediately at a very interesting string which contains both ‘username’ and ‘password’ as well as C style format specifiers for strings. This suggests a printf-like function which is just the sort of thing we are hoping for.
If we hit enter or double-click on this entry IDA takes us to the location of the string in the binary:
To find out where this string is used in the code, we need to look at any cross references to it. IDA has done the hard work for us here, and has even listed the only cross reference in the right-hand column. If we double click on it, we are taken to the actual code that uses this string:
We see that the string precedes a call to a function called asprintf. If we look up this function we find that it is in the same family as printf, with the additional letters being derived from the mnemonic “Allocating String Print Formatted”. It is a GNU extension, and not in the C or POSIX specification.
The description from gnu.org tells us:
“This function is similar to sprintf, except that it dynamically allocates a string to hold the output, instead of putting the output in a buffer you allocate in advance.”
It also gives us the function definition:
int asprintf (char **ptr, const char *template, ...)
“The ptr argument should be the address of a char * object, and a successful call to asprintf stores a pointer to the newly allocated string at that location.”
If we contrast this with the function definition for printf:
int printf (const char *template, ...)
We see the only difference is the additional argument which is the first, namely ptr, a pointer which is set to the address of the newly allocated buffer if the function is successful. So it would appear that if we were able to remove this first argument and then modify this function call to invoke printf instead of asprintf, we could achieve our goal of having the decrypted credentials printed to the screen instead of some internal memory buffer.
If we look at bit further down the code, we see that there is a call to the C library function write. This function has the following definition:
ssize_t write (int filedes, const void *buffer, size_t size)
In this code, the file descriptor filedes relates to a real temporary file (which is unfortunately deleted immediately after it has been used) to which the buffer allocated and printed to by asprintf is then written. But on the Linux platform, new processes are given a handle to stdout upon creation and this handle has the value 1. So perhaps we could switch the filedes argument from the real file to the value of 1 to have it write to the stdout.
But how would be achieve either of these options by actually patching the binary? To answer this, we first need a bit of background in how calls to functions, or code in general, are encoded by the compiler in the executable binary in instructions the processor can understand and execute, and how this implementation relates to the original source code.
Opcodes, Calling Conventions and Stack Frames
We need to understand how instructions and function call are implemented at the level of the processor’s instruction set, for it is these instructions or ‘opcodes’ that are encoded with hexadecimal values in the binary. Each instruction can be interpreted as a human readable mnemonic, and the language of these mnemonics is known as assembly. The result of translating the opcodes from binary form into assembly language is done by the disassembler.
When we look at code at the low level of assembly language, we enter the world of the processor’s instruction set, its inner storage spaces, or registers, and the stack. The registers are where values of variables are stored so they can be acted upon by instructions, or where addresses in memory are held as the processor accesses this memory. The stack is an area of memory which the processor uses for temporary storage of values or addresses, often prior to copying them into registers so they can be acted upon.
The way in which a call to a function is implemented in assembly is called the calling convention and this encapsulates where the arguments to the function are stored, in what order, when the function is called, as well as other details such as when any stack space that is used is ‘cleaned’ up once the function completes.
There are a multitude of calling conventions depending on the specific platform, architecture and compiler. The Win32 API typically uses the standard calling convention, denoted by the label stdcall. More often in the case of POSIX-compliant operating systems, at least where the source code is written in is C, the C calling convention, denoted cdecl, is used. Both these calling conventions have in common the feature that the arguments to functions are passed on the stack (as opposed to the registers or a mix of the two) and that they order they are passed in is from right to left considering their representation at the level of C source code. Thus, a function call of the following form:
function(argument1, argument2, argument3);
might result in assembly code in the following form:
Here, the push instruction copies the 32-bit value of the argument to the ‘top’* of the stack, and shifts the stack pointer along one ‘slot’ accordingly. This is sometimes seen on 32-bit Windows platforms. An alternative which is equivalent and may be seen on Linux, due to the workings of the GNU compiler GCC, is the following:
*We note that the stack grows downwards in memory, so the ‘top’ of the stack is really the bottom in terms of size of memory address.
Here the mov instruction copies the argument to the stack (denoted by the address held in the ESP register which is known as the stack pointer) but the difference is it doesn’t change the value of the stack pointer as a push does. This is because GCC has already moved the stack pointer in advance, at the beginning of the current subroutine, to allow for all the arguments and local variables that are needed. This might be for efficiency reasons, or possibly because GCC supports multiple platforms and hence has more generic intermediate code-generation before translating into the processor-specific instruction set.
Both of these methods result in the same layout of the stack when the call to the function is made. Since the arguments are held in a processor register, such as EAX, before being copied to the stack, it might be expected to see these instructions interspersed with other mov instructions that copy the arguments from somewhere in memory into a register, such that copying an argument from somewhere in the application’s heap memory to the stack is a two-step process:
x86 Addressing Modes
It’s probably worth looking at a quick overview of the addressing modes available on x86 processors. These cover what operations are allowed in terms of copying values from registers to memory addresses, registers to addresses pointed to by other registers, and all the various permutations.
In terms of register addressing modes, it’s pretty straightforward. As long as the source and destination registers are the same size, most combinations are possible. For example, all the following are possible:
The various memory addressing modes can be separated into five overall groups. The most simple of these is the displacement only addressing mode, where a register is loaded with the value located at an address in memory. This value is interpreted as that held in a size of memory that matches the register in question:
Then there is the register indirect addressing mode, where the processor allows memory to be accessed indirectly using the address held in a register:
Indexed addressing modes are similar to register indirect addressing with the addition of an offset to add to the base address held by a register:
If, instead of a constant value, another register value is added to the base address, we have based index addressing mode:
Finally the most complex is a combination of the two preceeding modes, known as based index plus displacement mode:
This should give us more than enough theory to understand the instructions we find in the asmbmount binary in the area that we are interested in modifying
If we look at the code from the disassembly of asmbmount, we can see code doing the same thing with the four arguments for asprintf:
Here I have used IDA’s facility to label variables for Username, Password and Buffer for ease of understanding.
This assembly code is the result of compiling the following statement in C:
asprintf(&Buffer, “username=%s\npassword=%s\n”, Username, Password);
This code will result in a stack with the following layout just prior to the call to asprintf:
Similarly for the code which performs the write instruction:
This code results from compiling:
write(FileDescriptor, Buffer, BytesToWrite);
Just prior to this call the stack will look like:
In order to change the file descriptor argument to 1 for stdout, we would have to change the value of the ESI register just prior to it being copied to the stack. Instructions to do this would take at least 3 bytes, but unfortunately we don’t have room in the binary just before the write call to patch these in without overwriting other instructions we need. The reason we can’t just insert additional instructions is that the code contains lots of relative offsets for jumps and calls which would all need correcting if it was shifted, not to mention updating the ELF header and section tables. This would be a massive job. We could overwrite other instructions earlier relating to opening of the temporary file, or the copying of its descriptor into ESI, but I think it will be a more interesting challenge as well as more aesthetically pleasing to modify the asprintf call to a call to printf.
To do this, we will need to remove the first argument which is a buffer pointer, as well as change the actual call to the function, such that it becomes:
printf( “username=%s\npassword=%s\n”, Username, Password);
While this seems simple enough, remember that as the arguments are written to the stack in right-to-left order, so we will need to shift the destination of the fourth, third then second arguments along one position on the stack, as well as prevent the final (first) argument being written to the stack at all, as we will have the second argument there already taking its place:
To achieve this, we will need to change the offsets relative to the stack pointer ESP of the fourth, then the third, then the second argument, subtracting 4 (the size in bytes of each stack entry as this is a 32-bit system) from each. So the instructions for the fourth parameter which is the pointer to the decrypted password should be changed from:
mov eax, ds:Password
mov [esp+C], eax
mov eax, ds:Password
mov [esp+8], eax
We only wish to modify the second of these two instructions, simply changing the offset relative to ESP that we wish to copy to from 0xC (or 12 in decimal) to 8. But how do we actually achieve this in the executable binary when we have hexadecimal opcodes? Well we will have to understand how this instruction is actually represented in ‘machine code’ or x86 opcodes. IDA tells us that the bytes for this instruction are:
89 44 24 0C
Now it’s beyond the scope of this piece to go into all the details of how Intel have designed the opcode syntax for the x86 series of processors (for those who are interested this is described in detail in the Intel Architecture Software Developer’s Manual, a version of which can be found at http://www.cs.cmu.edu/~410/doc/intel-isr.pdf. For those interested, Sandpile may be a more approachable reference to start with (http://www.sandpile.org). But it is sufficient for our purposes to understand that 89 encodes the mov instruction with source and destinations as registers, 44 and 24 encode the source as being the EAX register, and the destination as being ESP plus some offset, which is held in our final byte 0C. So we see it’s quite simple to modify this instruction to that which we want by changing this last byte from 0C to 08.
The same is true for the third parameter which is the pointer to the decrypted username. We need to change:
mov [esp+8], eax mov [esp+4], eax
The change to the hexadecimal opcodes is similarly straightforward, changing the final instruction byte from 08 to 04:
89 44 24 08 89 44 24 04
We wish to repeat this for the second original parameter which is the pointer to the string “username=%s\npassword=%s\n” containing the format specifiers for our decrypted username and password. But here the instruction is slightly more complicated as the whole operation is done in a single instruction, moving the pointer to the string from a location in memory straight to the stack+offset address:
mov [esp+4], offset <UsernamePasswordString>
The hexadecimal opcodes for this instruction are:
C7 44 24 04 70 E2 04 08
Here, C7 encodes mov from an area in memory pointed to by a 4-byte value to a register. The next two bytes 44 and 24 now encapsulate the destination as ESP plus the offset held in the next byte, using the next four bytes as the pointer to the address from which to copy the double word value. So here, we can change the fourth byte 04 to change the offset relative to ESP from 4 to 0:
C7 44 24 00 70 E2 04 08
This gives us:
mov [esp+0], offset <UsernamePasswordString>
The zero here is superfluous and the same instruction could be encoded without it in fewer bytes, but here we are looking for the simplest and most efficient modifications we can make to this binary to achieve our goal. So far we have modified just three bytes.
Now we come to the last argument to be written to the stack, which is of course the first argument in the call to asprintf when written as a C statement, namely the buffer pointer which is to receive the address of the buffer allocated by asprintf. This is the argument we wish to do away with, as we have already pushed the second argument into its place. We cannot leave the instructions as they are:
lea eax, [ebp+Buffer]
mov [esp], eax
The second of these will overwrite the argument we have redirected to this address on the stack. But we can in fact leave the first instruction, as it does not matter that EAX is overwritten with a buffer pointer, as this register is not, by the C calling convention for IA32, a register that is either to be used or left unmodified by a call to a subroutine or function. So we need just to get rid of the second instruction, encoded with these three opcode bytes:
89 04 24
To do this, we can simply overwrite them with the opcode for ‘no operation’ or nop which is 90 in hexadecimal. We could alternatively choose a three-byte instruction that similarly has no effect, but three single-byte nops will do nicely.
So we see that with a change to just six bytes in our binary we will be able to change the arguments to asprintf from four to three and shift their positions on the stack to be set up for a call to printf instead. But to make it all work, we will need to change the actual call itself. If we look at IDA’s representation of the instruction address, the hexadecimal opcodes for the call, and the assembly mnemonic, we see:
.text:08049F82 E8 AD F0 FF FF call _asprintf
This can be interpreted as follows; the instruction lies in the .text section of the ELF binary at virtual address 0x8049F82 (this is the address that the instruction will have in memory). The E8 opcode byte encodes a near call which is a call to a memory location using an offset relative to the address of the next instruction, whose value is held in the following double word, or four bytes. Thus here the meaning is a call to an offset which is held in the four bytes AD F0 FF FF, but these bytes are in byte-swapped order since our target system is little endian. Thus the value of the offset is 0XFFFFF0AD which is equivalent to the signed value -0xF53. This means that the address of the called function is 0xF53 below the address of this next instruction in memory. Since this instruction is at address 0x8049F82 and is a 5-byte instruction, the next instruction from which this relative offset is to be calculated is 0x8049F82+5 which is of course 0x8049F87.
If we perform this subtraction on the address of the instruction we have:
0x8049F87 – 0xF53 = 0x8049034
If we follow this call in IDA, we are indeed taken to this address. There we notice that we are in a section in the binary called .plt. This section contains the jump-table for functions from shared libraries. We see this as a list of jmp <offset of jmp target> entries, where the instruction does not jump directly to an address, but instead it takes what's found at the address as a pointer. We note the address of the jmp instruction here is 0x8050774.
As a slight aside, if we go to the address of the pointer, we are taken to another section within the binary image, this time called .got.plt. This section contains the addresses for the actual target of the calls once the dynamic linker has done its job, either when the executable image is loaded, or later when the imported function is actually called in the case of 'lazy symbol binding'.
Anyway, our job here is simply to change the relative offset of the call instruction to instead point to the printf jump-table entry instead of that for asprintf. So we need to find the entry in the .plt section which corresponds to the printf function. If we look up printf in the imports list in IDA, and follow the cross reference back to the corresponding jump-table in the .plt section, we have our address, 0x8048F24.
So the address of the printf entry in our jump-table is lower still in memory than that of asprintf. We subtract this address from the asprintf entry (0x8049034 - 0x8048F24) and see that the difference is 0x110. So to modify the asprintf call instruction, we need to subtract a further 0x110 from the relative offset 0xFFFFF0AD. The result of this is 0xFFFFEF9D so we see that here we need a two byte patch of the call instruction from:
E8 AD F0 FF FF
E8 9D FE FF FF
We now have all we need to successfully modify the entire function call from
asprintf(&Buffer, "username=%s\npassword=%s\n", username, password);
printf("username=%s\npassword=%s\n", username, password);
This is our objective, and can be done in what seems like a nice concise series of patches totalling only 8 bytes.
Making The Changes
Having decided which bytes we wish to change, there are various ways in which we might write the changes to the binary. IDA offers a facility to do this itself, even offering to assemble instructions from their assembly mnemonics into hexadecimal opcodes, or alternatively directly modify the hexadecimal opcode values directly.
But I am a traditionalist, and like to make my modifications in a tool that is specialised for the job: a hex editor. There are many options here, with a decent free example on Windows being HxD. But since our target platform is Linux, it will be simpler to make our changes using this operating system so we can immediately test our results. So I am going to use GHex which is basic but will do the job nicely.
We note that IDA gives us the offset to the instruction we have highlighted in the bottom of left of ncthe window:
So we go to the offset in our hex editor and make the relevant changes:
Once we’ve made all the patches we need, we just save and we’re ready to test. At this point it’s well worth launching the executable in the debugger.
One of the great facilities IDA allows us is to debug our executable remotely, so that we can put breakpoints on any area of code we choose, step through instructions, into or over calls to subroutines or functions, and view or modify memory or register values as we go. This is extremely useful to understand what is going on, as well as to diagnose any unforeseen problems our patches may have made.
On our target Linux machine, we launch the executable under gdbserver, which is a stub to allow IDA to connect to a local instance of the GNU debugger, gdb, from another machine on the network via TCP. We supply gdbserver with an IP address, or in this case just localhost, and a port number. The default port number that IDA uses is 23946 so we will just use that:
When we go to the Debugger menu in IDA, we are first prompted to choose our debugger, so we choose the remote GDB option.
We then just need to make sure we have the IP address of our target system selected under Process Options:
Now we are ready to launch, so we pick a location to set a breakpoint in the code, selecting the instruction and hitting F2, which highlights the instruction in read for us:
Then we launch by choosing Debugger->Start process or alternatively hit F9, and after a confirmation dialog, we are now debugging the executable:
The debugger automatically breaks at the entry point of the executable, which is the first instruction to actually be executed. At this point we can see the values of the registers, the stack, a raw hexadecimal view of the process’s memory as well as the main view showing us the instructions in disassembled form and their memory location, as well as optionally the opcode bytes which are particularly useful in our case.
Now we would like to advance to our previously selected breakpoint. So to do this, we just let the executable run by selecting Debugger->Continue process or again hitting F9. Unfortunately when we do, we hit upon a tiny snag:
The SIGCHILD signal is sent in POSIX systems such as Linux from a child process to its parent upon exit, when it is interrupted or resumed. But what child process is this? Well by analysing the disassembled code a bit further, we spot the following:
We see that there is a call to fork. This function creates a new child process which is an exact duplicate of the parent process with a few minor exceptions. We also note that we never got to our breakpoint, this is because the parent process, which is the one we were debugging, never executed the code we wanted to break upon.
If we look up the documentation for the fork function, we can understand why. If the call is successful, the return value for the parent function is the process identifier, or PID, of the child process. However in the newly created child process which has just been created and runs from the same point, the return value is zero. So at the conditional jump shown above, the test eax, eax instruction tests to see if the return value held in eax is zero, and the jnz instruction will jump to the address that follows it is not (hence the acronym jump if not zero). This means that the parent process will jump in the direction of the green arrow, the child process (which we do not have control over in the debugger) will not jump and follow the path of the red arrow.
Of course, the decryption functionality and the call to asprintf follow from the path the child process takes, along the red arrow, which is why we never reach our breakpoint in the debugger. But among the many useful things a debugger allows is the possibility to skip instructions by changing the instruction pointer EIP, so we can overcome this hurdle by skipping the fork function completely and moving the instruction pointer to the beginning of the path which follows the red arrow, which leads to the functionality we are interested in and where we have made our patches. It’s as easy as selecting the instruction we would like to jump to, right-clicking and selecting Set IP, or alternatively just hitting Ctrl-N.
Once we’ve done this, we can just hit F9 to continue and we will end up at our breakpoint where we can see the result of our patches:
We can see that the instructions copying the arguments to the stack have been updated as we wished, as well as the patched nop instructions. Finally we can see that the call has been modified to printf so everything looks good.
Tying Up Loose Ends
Although we have done what we set out to do, there remains a few niggling issues with the binary as it stands currently. To understand what these are, we need to remember the nature of the asprintf function:
"The function asprintf() allocates a string large enough to hold the output including the terminating null byte, and returns a pointer to it via the first parameter. This pointer should be passed to free() to release the allocated storage when it is no longer needed."
So asprintf allocates and then prints to a buffer. If we have changed the call to this function to call printf instead, we will no longer allocate this buffer. So what will happen when the executable subsequently tries to free a buffer that has not been allocated? Well we are bound to have an exception and the tool will crash. We might not care too much if the credentials are already printed to the screen, but let’s see if we can clean this binary up a bit so it runs smoothly without crashing.
If we look again at the disassembly around the asprintf function in IDA, we see it is followed by calls to write, close and free functions. Using our understanding of calling conventions, we are able to rewrite this code in C in a form likely to be very similar to the original source code:
We see that the buffer that is allocated and printed to is then freed, following the advice given in the asprintf documentation. So in order to prevent the application crashing when it attempts to free that buffer, we should get rid of that function call. We do so in the same way we got rid of our buffer argument, we simply replace the instruction opcodes with nop instructions encoded by the hexadecimal opcode 90.
E8 27 F0 FF FF call _free
We nop these bytes out taking our patch count up to 13 bytes, and resulting in an executable that runs, performs the task we wish, and doesn’t crash:
So you could say the task is complete, we now have a tool that decrypts and spits out the credentials from an Altiris PWL file. But if you're a perfectionist like me, you'll have noticed that the executable does not actually exit, instead we are left with a password prompt. Not only this, but we still have to supply useless dummy command line arguments to get it to run.
The reason for the password prompt can be understood by reading a bit further in the disassembled code:
We see that the way in which this tool mounts the remote share once it has decrypted the credentials is to launch the mount.cifs executable using execve, passing it the temporary file containing the decrypted credentials as an argument. To prevent this we could nop the entire section. But this seems unappealing and cumbersome. Instead we can use a two-byte patch to insert a near jump instruction overwriting the first instruction in the above block to the address of the conditional jump (jz) at the bottom, which is the address that is jumped to if the execve call is successful. This takes care of the password prompt nicely and the executable now exits cleanly.
But we’re still stuck with having to enter superfluous dummy arguments. We note that if we don't supply them, we get given the 'usage' instructions:
All we really need is the -f argument for the PWL file. Perhaps the domain option -d might be useful, and there’s nothing wrong with the verbose option -b. But we definitely don't need the UNC path and local mount point arguments, it would be really nice is if we could get rid of these.
This usage text gives us a quick
way in to finding the code that deals with the command line parameters. We go
to the list of strings in IDA (Ctrl-F12) and search (Alt-T) for 'usage'. We
then follow the entry that is found, which takes us to the location of this
string in the binary. We then look up any cross-references to this string with Ctrl-X
and see that there is only one. We follow it and we arrive at the code for the
usage function in the binary.
We see a small function, but we really want to know where it is being called from. So we go to the beginning of the function, and again search for cross-references. This time we find two, quite near to each other. It is likely that these calls are from the same function, and when we follow them we find this is indeed the case.
This parent function calls getopt which is a GNU C function which
shoulders some of the burden of having to deal with command line arguments,
followed by a switch statement for the various command line parameters
themselves, so we know we are in the right place.
We then notice a comparison involving the optind parameter, followed by a conditional jump, one branch of which takes us to the usage function. The GNU documentation for getopt tells us that the optind parameter will end up with the number of ‘option arguments’, in other words those with a switch like our –b or –f as well as their respective arguments if they have them. We see that the other variable in the comparison comes from argc which is the total number of strings in the command line, which will include our two superfluous strings for the //server/share and /mount/point arguments. So we can infer that the comparison is saying if the total number of strings – 2 is equal to the number of option arguments, then all is well, otherwise print the usage text then exit. Well, without the superfluous strings, we need the total number of strings to equal the number of option arguments, so all we need to do here is patch the -2 in this comparison to become a zero and we are able to drop our dummy arguments.
The final icing on the cake comes in the form of changing
the strings and the name of the executable file from asmbmount, which describes a function that it no longer really
performs, to something more suitable, and updating the usage instructions. Now we
have a nice tool ready for easy use in infrastructure penetration tests
I hope this description has been helpful to explain various aspects of how code and functions at the source level relate to their ‘machine’ code at the level of processor with its registers and the stack, as well as how it can be possible to make changes to compiled binaries in just a few bytes using a a disassembler and a hex editor with some understanding of how things work at this level.
Of course there are many different ways the end goal could have been achieved, perhaps with a patch of fewer bytes, or using different tools. But hopefully the value here was as much in the journey as the destination.
Additional Note for Altiris Customers
Context has disclosed the contents of this article to Symantec to ensure that they are aware of the repercussions of this issue, and they have published advice for customers at the following location:
In summary, Symantec Altiris customers using PXE boot images should minimise the impact of the presence of Windows credentials within the images by minimising the privileges of the accounts concerned to the minimum that is required for the desired deployment.