Reviving XOR encryption with neural networks

Ottobre 2022

The main purpose of encrypting our payloads is to hide them from the eyes of AV engines and make the detection harder when using binary matching techniques or string patterns.

One of the most common ways to encrypt payloads is using the XOR technique. Here’s a typical implementation in C/C++.

Despite the absence of malicious code here, this set of instructions is easily traceable to a malware XOR behaviour and for this reason it is flagged by most EDR/AV vendors. This means that in a real engagement our injector would be caught immediately just by static analysis.

To verify this, we can use ThreatCheck

If we disassemble our PE file, we can notice that the flagged instructions are indeed relatable to the XOR decryption routine.

The need for dependency-free XOR

Static analysis is not limited to identification of explicit XOR instructions, but instead tries to detect any behavior that reproduces a XOR, like alternative logical operations such as (a OR b) AND NOT (a AND b). It could as well become as sophisticated as to find approximated XOR implementations that do not rely on logical operators at all.

Our intuition for evading this kind of analysis is to rely on some tool which performs a series of basic operations to achieve the XOR result in a non-explanable way.

As it happens, neural networks are known both for their ability to approximate any nonlinear function and the difficulty of directly interpreting the “reasoning” behind their output. We can thus try to train a simple neural network to approximate a XOR operation.

Even the dependency on libraries (e.g., math.h) may be flagged by an analysis engine, so we should only rely on basic operators without calling complex mathematical functions, and even more so without using any out-of-the-box neural network library.

 

Context on neural networks

To give some context, we first need to mention a neural network basic structure and the operations involved, at least for the simplest case (which is all we need for the application at hand).

A fully connected, feedforward neural network consists of:

  • An input layer that accepts numeric values, sometimes called “features”.
  • An output layer, which produces one or more numeric values that (try to) achieve the result of the desired operation on the inputs. For example, this can be the approximation of some function, or a (quasi-)probabilistic estimate of the input object to belong to some categories.
  • Possibly, additional “hidden” layers between the input and the output layer, which may be useful to increase the flexibility of the approximations made by the neural network. In practice, this means that the network can perform more complex operations, at the cost of an increased number of tunable parameters.

The hidden and output layers are made of perceptrons, simple operators that perform a weighted sum of their inputs and a bias, and give the result to a so-called transfer function, which is a nonlinear function with a few desirable properties (that we won’t discuss in this context).

 

One of the most classical transfer functions, that we will use in this application, is the sigmoid.

A neural network gets the inputs and manipulates them through the layers to approximate the desired output. This approximation is achieved by properly tuning the weights and biases used for the sums performed by the perceptrons with a training process.

The (simplified) procedure for training a neural network involves:

  • A dataset, including inputs and the corresponding desired outputs. This can be divided in subsets for training, testing and validation.
  • A loss function used to compute the prediction error, that is the error between the given and the desired output.
  • The backpropagation algorithm, which alters the network parameters in a way that minimizes the prediction error with a gradient descent approach.
  • A possibly big number of epochs, in which the parameter tuning process is performed on all the samples in the training set until convergence.

A POC neural network based XOR implementation

As we only want to use basic mathematical operators to avoid library dependencies, we need to approximate the exponentiation operator e^x that appears in the sigmoid function. Luckily, there’s a mathematical tool especially designed for this task: Taylor series expansion. Without going into the theoretical details of this operation, we just mention that it allows us to approximate a function of interest to an arbitrary precision with a polynomial having a chosen (and possibly large) number of terms.

The exponential function has an easy expression for this:

We wish the algorithm execution to take some time while performing seemingly useful operations, to elude sandboxes or similar mechanisms, so we expand the generic exponential function instead, leveraging the identity a^x = e^(x ln a)  and using e as a generic basis.

We then also need to approximate the logarithm function, again with a Taylor series expansion.

We now go back to the XOR application: we perform it bitwise, thus the input layer consists of two numbers that can either be 0 or 1. We then put a hidden layer with 2 perceptrons, and finally an output layer with one perceptron. During actual operations, this is followed by a threshold function which returns 1 if the network output exceeds 0.5, 0 otherwise.

Our dataset is simply the XOR truth table:

We perform the training procedure for e.g. 1000 epochs, until full convergence (there’s no overfitting phenomenon involved, as the dataset comprises all the possible inputs) and we save the network parameters in the payload decryption algorithm.

It is then sufficient to explode the characters of the obfuscated shellcode and the key into their composing bits, perform the XOR bitwise and transform back to characters, so we can recover the original shellcode.

Analysing again our injector, we can now verify that it successfully passes the static analysis, and it is conveniently slow in its execution.

Conclusions

In this article we explored one of the techniques we developed to strengthen our Red Team activities, whether they be manual or automated by ZAIUX.

You can find the source code of a PoC implementation on GitHub: click here!