Automating Removal of Java Obfuscation

Automating Removal of Java Obfuscation

In this post we detail a method to improve analysis of Java code for a particular obfuscator, we document the process that was followed and demonstrate the results of automating our method. Obscurity will not stop an attacker and once the method is known, methodology can be developed to automate the process.

By David Klein

Security Consultant

16 Feb 2015

Obfuscation is the process of hiding application logic during compilation so that the logic of an application is difficult to follow. The reason for obfuscation is usually a result of vendors attempting to protect intellectual property, but serves a dual purpose in slowing down vulnerability discovery.

Obfuscation comes in many shapes and forms, in this post we focus on a particular subset: strings obfuscation; more specifically, encrypted string constants. Strings within an application are very useful for understanding logic, for example logging strings and exception handling is an excellent window into how an application handles certain state and can greatly speed up efforts to understand functionality.

For more information on what obfuscation is within the context of Java, see [0].

Note that the following entry assumes the reader has a rudimentary understanding of programming.

Decompilation

Firstly, we extract and load the Java archive (jar) using the tool JD-GUI [1] (a Windows based GUI tool that decompiles Java “.class” files), this is done by drag-dropping the target jar into the GUI window. The following is what is shown after scrolling down to an interesting looking class:

Figure 1 - JD-GUI showing the output from the disassembly

The first observation we can make is that JD-GUI has not successfully disassembled the class file entirely. The obfuscator has performed some intentional modifications to the bytecode which has hampered disassembly.

If we follow the sequence of events in the first z function we can see that it does not flow correctly, incorrect variables are used where they shouldn’t be, and a non-existing variable is returned. The second z function also seems very fishy; the last line is somehow indexing into an integer, which is definitely not allowed in Java. Screenshot shown below.

Figure 2 - Showing the suspicious second 'z' function

Abandoning JD-GUI and picking up trusty JAD [2] (a command line Java .class decompiler) yields better results, but still not perfect:

Figure 3 - Showing the output that JAD generates

We can see that disassembly has failed as JAD inserts the JVM instructions (as opposed to high level Java source); in fact JAD tells us as such in the command line output. Fortunately it appears that the decoding failures only exist in a consistent but limited set of functions and not the entire class. Secondly, we can see that the strings are not immediately readable; it is quite obvious that there is some encryption in use. The decryption routine appears to be the function z, as it is called with the encrypted string as the input.

As shown in Figure 2 there are two functions sharing the name (z), this is allowed in Object Oriented languages (Function Overloading [3]) and it is common for obfuscators to exploit such functionality. It is however possible to determine the true order of the called functions by looking at the types or the count of the parameters. Since our first call to z provides string as the parameter, we can derive the true order and better understand its functionality.

We can see in Figure 4 (below) that the first z converts the input string ‘s’ to a character array: if the length of the array is 1 it performs a bitwise XOR with 0x4D, otherwise it returns the char array as-is. JAD was unable to correctly disassemble the function, but in this case such a simple function is easy to analyse.

Figure 4 - Showing the first 'z' function

The second z function (seen in Figure 5 below) appears to be where the actual decryption is done.

Figure 5 - Second 'z' function, highlighting the interesting integer values

To know what happens with the input we must understand that Java is a stack based language. Operations are placed on the stack and operated upon when unrolled.

The first important instruction we see is that the variable i is set to 0; we then see the instruction caload, which loads a character from an array at a given index. While JAD has not successfully decompiled it, we can see that the index is the variable i and the array is the input character array ac (and in fact, ac pushed onto the stack at the very start of our function). Next, there is a switch statement, which determines the value of byte0.

After the switch statement, byte0 is pushed onto the stack. For the first iteration, its value will be value 0x51. The proceeding operations perform a bitwise XOR between the byte0 value and the character in ac at index i, Then i is incremented and compared with the length of ac, if the index is greater than the length of ac, the ac array is converted to a string and returned, if the index is less thank the length of ac the code jumps back to L3 and performs another iteration on the next index.

In summary, this z function takes the input and loops over it, taking the current index within the input and performing a bitwise XOR against a key that changes depending on the current index. We also note that there is a modulus 5 function involved against the current index, indicating that there are 5 possible keys (shown in red in Figure 5).

To neaten this up, we will convert the second z to pseudocode:

keys = [81,54,2,113,77]
        // below input is "#Sq\0368#Ug\002b\"Oq\005(<\030r\003\"!Sp\005$4E" 
        input = [
          0x23, 0x53, 0x71, 0x1e, 0x38, 0x23, 0x55, 0x67, 
          0x02, 0x62, 0x22, 0x4f, 0x71, 0x05, 0x28, 0x3c, 
          0x18, 0x72, 0x03, 0x22, 0x21, 0x53, 0x70, 0x05, 
          0x24, 0x34, 0x45
        ]

        for i in 0..input.length-1 do
            printf "%c" (keys[i%keys.length] ^ input<i>)

As you can see from the above code, it converts to a simple loop that performs the bitwise XOR operation on each character within the input string; we have replaced the switch with an index into the keys array.

The code results in the string "resources/system.properties" being printed - not at all an interesting string - but we have achieved decryption.

Problem analysis

With knowledge of the key and an understanding of the encryption algorithm used, we should now be able to extract all the strings from the class file and decrypt them. Unfortunately this approach fails; this is a result of each class file within the Java archive using a different XOR key. To decrypt the strings en-masse, a different approach is required.

Ideally, we should programmatically extract the key from every class file, and use the extracted key to decrypt the strings within that file. One approach could be to perform the disassembly using JAD, and then write a script to extract out the switch table – which holds the XOR key - and the strings using regexes.

This would be reasonably simple but error prone and regex just does not seem like an elegant solution. An alternative approach is to write our own Java decompiler which gives us a nice abstracted way of performing program analysis. With a larger time investment, this is certainly a more elegant solution.

To perform this task, we chose the second option. As it turns out, the JVM instruction set is quite simple to parse and is well documented [4, 5, and 6], so the process of writing the disassembler was not difficult.

Parsing the class file - overview

First we parse the class file format, extracting the constants pool, fields, interfaces, classes and methods. We then disassemble the methods body (mapping instructions to a set of opcodes), the resulting disassembly looks like the below (snippet):

Figure 6 - Showing the byte to opcode translation, each section is divided into a grouping (e.g. Constants,Loads,Maths,Stack) an operation (e.g. bipush) and an optional argument (instruction dependent, such as ‘77’).

As you can see, the above shows the tagged data that resulted from parsing the JVM bytecode into a list of opcodes with their associated data.

Extracting encryption function

We are after the switch section of the disassembled code, as this contains the XOR key that we will use to decrypt the ciphertext. We can see based on the documentation that it maps back to the instruction tableswitch [7], which is implemented as a jump table, as one would expect.

Now it is a matter of mapping over the opcodes to locate the tableswitch instruction. Below is the section of the opcode list we are interested in:

As you can see, the tableswitch instruction contains arguments: the first argument is the default case (67), and the second argument is the jump table, which maps a 'case' to a jump. In this example, case 0 maps to the jump 48. The last argument (not in screenshot) is the padding which we discard.

Our algorithm for extracting this table is as follows:

  1. Detect if a control section contains a tableswitch.
  2. Extract the tableswitch.
  3. Extract the jumps from the tableswitch.
  4. Build a new jump table containing the original jump table with the default jump case appended on the end.
  5. We now have all the jumps to the keys.
  6. Map over the method body and resolve the jumps to values.
  7. We now have all the key values and the XOR function name.

Figure 7 – Code(F#) Showing the pattern matching function which implements the algorithm to extract switch tables.

Figure 8 - Showing the resulting extracted XOR keys from the switch tableThe next step is to locate the section of the class where the strings are held. In the case of this obfuscator, we have determined through multiple decompilations that the encrypted strings are stored within the static initialization section [8], which JAD generally does not handle effectively. At runtime, when the class is initialised, the strings are decrypted and the resulting plaintext is assigned to the respective variable.

Extracting the static initialization section is trivial, we map over the code body and find sections where the name is `' [9] and the descriptor is `()V' which denotes a method with no parameters that returns void [10].

Once we have extracted this, we resolve the 'private static' values making sure to only select the values where our decryption function is being called (we know the name of the function as we saved it). It is now just a process of resolving the strings within the constants pool.

At this stage we have:

  1. Extracted the decryption key;
  2. The actual decryption algorithm implemented (XOR); and
  3. Encrypted strings.

We can now decrypt the strings and replace the respective constant pool entry with the plaintext. Since the decryption uses a basic bitwise XOR, the plaintext length is equal to the ciphertext length, which means we don't have to worry about truncation or accidentally overwriting non relevant parts of the constant pool. Later we plan to update the variable names throughout the classes and remove the decryption functions.

Figure 9 - Example decryption, plaintext bytes, cipher bytes, and plaintext result.

The particular class file we chose to look at, turned out to not have any interesting strings, but we are now able to see exactly what it does. The next stage is to loop over all class files and decrypt all the strings, then analyse the results so that we can hopefully find vulnerabilities, which is a story for another day.

Conclusion

In conclusion, we have shown that by investing time into our reversing, we are able to have higher confidence of the functionality of the target application, and by automating the recovery of obfuscated code, we have shown that obfuscation alone is not an adequate protection mechanism, but it does slow an attacker down.

In addition to the automated recovery, we now have a skeleton Java decompiler, which will eventually be lifted into our static analysis tool.

Finally, we have shown that if you try hard enough, everything becomes a fun programming challenge.

Contact and Follow-Up

David is part of our Assurance team in Context's Australian office. See the Contact page for how to get in touch.

References

[0] http://www.excelsior-usa.com/articles/java-obfuscators.html

[1] http://jd.benow.ca/

[2] http://varaneckas.com/jad

[3] http://en.wikipedia.org/wiki/Function_overloading

[4] https://github.com/Storyyeller/Krakatau

[5] https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html

[6] http://docs.oracle.com/javase/specs/jvms/se8/html/jvms-6.html

[7] http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.tableswitch

[8] http://docs.oracle.com/javase/tutorial/java/javaOO/initial.html

[9] http://stackoverflow.com/questions/8517121/java-vmspec-what-is-difference-between-init-and-clinit

[10] http://stackoverflow.com/questions/14721852/difference-between-byte-code-initv-vs-initzv

About David Klein

Security Consultant

CREST
CREST STAR
CHECK IT Health Check Service
CTAS - CESG Tailored Assurance Service
CBEST
Cyber Essentials
CESG Certified Product
CESG Certified Service
First - Improving Security Together
BSI ISO 9001 FS 581360
BSI ISO 27001 IS 553326
PCI - Approved Scanning Vendor