GGM
Pair Trading

created at

Pair Trading with Graphical Lasso [Application]

In the previous article, we introduced the idea of using two weapons, the Kullback-Leibler information content (KL distance) and the graphical lasso, to detect the degree of abnormal price movements of individual stocks. In this article, we will actually derive the KL distance between two probability distributions using a model called the graphical Gaussian model.

By analytically calculating the KL distance here, we will be able to incorporate the model into codes such as Python. We will write about the practical part in the next article.

Graphical Gaussian Model

To begin with, what is a graph in graphical lasso? In this context, a graph is a tool for representing conditional dependencies among variables in a multivariate data set.

Definition of Graph

A graph $G$ is defined by two sets $V$ and $E$ pairs. Where,

Set $V$ of vertices (nodes): the basic elements of a graph, represented as points. Each vertex represents a specific element or object in the graph.
Set $E$ of edges: represented as a line connecting two vertices, indicating a relationship or connection between vertices. An edge can be either directed (indicated by an arrow, with direction) or undirected (just a line, without direction).

This graphical model is called the Graphical Gaussian Model (GGM), especially when the dependencies among variables are estimated using a multivariate normal distribution.

Graph Creation and Meaning in GGM

Use of precision matrix: In GGM, the relationship between variables is represented using a precision matrix (the inverse of the covariance matrix). Each element of this matrix indicates the strength of the relationship between the two corresponding variables.
Interpretation of the relationship: If the elements of the precision matrix are non-zero, it means that there is a direct relationship between the two variables and they are connected by edges on the graph. If the element is zero, it is interpreted as no relationship and no edges are drawn.
Data Structure Visualization: This graph can be used to visualize complex relationships between variables in a data set. For example, if a variable is related to many other variables, it will have many edges and can be interpreted as likely to play an important role in the data.

Multivariate Normal Distribution in GGM

The multivariate normal distribution is generally expressed in the form $N(\mu, \Sigma)$ , but in the GGM framework it may be expressed as $N(\bm{x}|\bm{\mu}, \Lambda^{-1})$ using the precision matrix $\Lambda$ .

In particular, if the mean is a zero vector, the probability density function of the multivariate normal distribution becomes

p(\bm{x}) = \frac{1}{\sqrt{(2\pi)^k |\Lambda^{-1}|}} \exp\left(-\frac{1}{2} \bm{x}^T \Lambda \bm{x}\right)

The use of precision matrices is often used in contexts such as graphical models because direct dependencies between variables are more clearly represented.

Analytically derive KL distances between variables in GGM

Here is where we start.

Let $\bm{z_i}$ be a vector of other variables on the vector $\bm{x}$ except for the variable $x_i$ . Since the conditional distribution of $x_i$ given the probability distribution of $\bm{z_i}$ is our interest, we evaluate the KL distance between $p_A(x_i|\bm{z_i})$ and $p_B(x_i|\bm{z_i})$ by the distribution $p_A(\bm{z_i})$ . (ETH, SOL, BNB, ARB..... measure the distance between the probability distribution of BTC's price movement at two points given $p_A(\bm{z_i})$ )

d^{AB}_i = \int d\bm{z_i} p_A(\bm{z_i}) \int dx_i p_A(x_i|\bm{z_i}) \ln \frac{p_A(x_i|\bm{z_i})}{p_B(x_i|\bm{z_i})}

First, the accuracy matrix $\Lambda$ and its inverse variance-covariance matrix $\Sigma$ are divided as follows.

\Lambda = \begin{pmatrix} L_A &\bm{ l_A^{\top}} \\ \bm{l_A} & \lambda_A \end{pmatrix}, \quad \Sigma \equiv \Lambda^{-1} = \begin{pmatrix} W_A & \bm{w_A^{\top}} \\ \bm{w_A} & \sigma_A \end{pmatrix}

Find the distribution of $P(x_i|\bm{z_i})$ .

If $\bm{x}$ is an $M$ -dimensional vector, then $\bm{z_i}$ is ${M-1}$ -dimensional. Since the normal distribution divided by the normal distribution is a normal distribution, only the terms related to $x_i$ need to be expanded within exp.

\begin{aligned} p(x_i|\bm{z_i}) &= \frac{\left(\frac{1}{\sqrt{2 \pi}}\right)^{M} \cdot \operatorname{det}(\Lambda)^{\frac{1}{2}} \cdot \exp \left(-\frac{1}{2} x^{\top} \Lambda x\right)}{\left(\frac{1}{\sqrt{2 \pi}}\right)^{M-1} \cdot \operatorname{det}(L)^{\frac{1}{2}} \cdot \exp \left(-\frac{1}{2} \bm{z_i} L \bm{z_i}\right)} \\ & \propto \exp \left(-\frac{1}{2}\left(\begin{array}{ll}\bm{z_i} \\ x_i\end{array}\right)^{\top}\left(\begin{array}{ll}L_A & \bm{l_A}^{\top} \\ \bm{l_A} & \lambda_A \end{array}\right)\left(\begin{array}{ll}\bm{z_i} \\ x_i\end{array}\right) \div \frac{1}{2}\bm{z_i}^{\top}L\bm{z_i}\right) \\ & \propto \exp \left\{-\frac{1}{2}\left(\lambda x_i^2+2 \bm{z_i}^{\top} \bm{l_A} x_i\right)\right\} \\ & \propto \exp \left\{-\frac{\lambda}{2}\left(x_i-\frac{\bm{z_i}^{\top} \bm{l_A} }{\lambda}\right)^2\right\} \\ & \end{aligned}

and $x_i$ follows a normal distribution with mean $\frac{\bm{z_i}\bm{l_A}}{\lambda}$ and variance $\frac{1}{\lambda}$ .

x_i \sim \mathcal{N}\left(\frac{\bm{z_i}\bm{l_A}}{\lambda},\frac{1}{\lambda}\right)

Also, the probability density function is,

p(x_i|\bm{z_i}) = \frac{\sqrt{\lambda}}{2\pi} \exp \left( -\frac{\lambda}{2} \left( x_i - \frac{\bm{z_i}^{\top} \bm{l_A}}{\lambda} \right)^2 \right)

We weill use this equation to solve KL distance.

Let's solve for $d^{AB}_i$ .

Note that $\bm{x} = (\bm{z_i}, x_i)$ from the definition of $\bm{z_i}$ :

\begin{aligned} d^{AB}_i &= \int d\bm{z_i} p_A(\bm{z_i}) \int dx_i p_A(x_i|\bm{z_i}) \ln \frac{p_A(x_i|\bm{z_i})}{p_B(x_i|\bm{z_i})} \\ &= \int \int \ln \frac{p_A(x_i|\bm{z_i})}{p_B(x_i|\bm{z_i})} p_A(\bm{z_i}) d\bm{z_i} p_A(x_i) dx_i \\ &= \int \ln \frac{p_A(x_i|\bm{z_i})}{p_B(x_i|\bm{z_i})} d\bm{x} \\ &= \int \ln \left( \sqrt{\frac{\lambda_A}{\lambda_B}} \exp \left( -\frac{\lambda_A}{2} \left(x_i - \frac{\bm{z_i}^{\top} \bm{l_A}}{\lambda_A} \right)^2 + \frac{\lambda_B}{2} \left(x_i - \frac{\bm{z_i}^{\top} \bm{l_B}}{\lambda_B} \right)^2 \right) \right) p_A(\bm{x}) d\bm{x}. \\ &=\int \frac{1}{2} \ln \frac{\lambda_A}{\lambda_B}+\left\{-\frac{1}{2 \lambda_A}\left(\bm{x}\left(\begin{array}{c}-\bm{l_A} \\ \lambda_A\end{array}\right)\right)^2+\frac{1}{2 \lambda_B}\left(\bm{x}\left(\begin{array}{c}-\bm{l_B} \\ \lambda_B\end{array}\right)\right)^2\right\} p_A(\bm{x}) d\bm{x} \\ &=\frac{1}{2} \ln \frac{\lambda_A}{\lambda_B}+E_A\left[-\frac{1}{2 \lambda_A}\left(\begin{array}{c}-\bm{l_A} \\ \lambda_A\end{array}\right)^{\top} \bm{x} \cdot \bm{x}^{\top}\left(\begin{array}{c}-\bm{l_A} \\ \lambda_A\end{array}\right)+\frac{1}{2 \lambda_B}\left(\begin{array}{c}-\bm{l_B} \\ \lambda_B\end{array}\right)^{\top} \bm{x} \cdot \bm{x}^{\top}\left(\begin{array}{c}-\bm{l_B} \\ \lambda_B\end{array}\right)\right] \quad (1) \end{aligned}

Since $E[\bm{x} \cdot \bm{x}^{\top}] = \begin{pmatrix} W_A & \bm{w_A}^{\top} \\ \bm{w_A} & \sigma_A \end{pmatrix}$ , (1) is:

\begin{aligned} (1)& =\frac{1}{2} \ln \frac{\lambda_A}{\lambda_B}-\frac{1}{2 \lambda_A}\left(\bm{l_A}^{\top} W_A \bm{l_A}-2 \bm{w_A}^{\top} \bm{l_A} \lambda_A+\lambda_A^2 \cdot \sigma_A\right)+\frac{1}{2 \lambda_B} \left(\bm{l_B}^{\top} W_A \bm{l_B}-2 \bm{w_A}^{\top} \bm{l_B}+\lambda_B \sigma_A\right) \\ & ={\bm{w_A}}^{\top}\left(\bm{l_A}-\bm{l_B}\right)+\frac{1}{2}\left(\frac{\bm{l_B}^{\top} W_A \bm{l_B}}{\lambda_B}-\frac{l_1^{\top} \bm{w_A} \bm{l_A}}{\lambda_A}\right)+\frac{1}{2}\left\{\ln \frac{\lambda_A}{\lambda_B}+\sigma_A\left(\lambda_B-\lambda_A\right)\right\}\end{aligned}

Finally, the KL distance $d^{AB}_i$ can be expressed as follows

d^{AB}_i ={\bm{w_A}}^{\top}\left(\bm{l_A}-\bm{l_B}\right)+\frac{1}{2}\left(\frac{\bm{l_B}^{\top} W_A \bm{l_B}}{\lambda_B}-\frac{l_1^{\top} \bm{w_A} \bm{l_A}}{\lambda_A}\right)+\frac{1}{2}\left\{\ln \frac{\lambda_A}{\lambda_B}+\sigma_A\left(\lambda_B-\lambda_A\right)\right\}

By interchanging $A$ and $B$ , $d^{BA}_i$ can be obtained as well.

I referred to Detecting Correlation Anomalies by Learning Sparse Correlation Graphs for the derivation. However, in the original paper, the sign of the first term is reversed, which I think is an error on the part of the paper. If there are any errors in my derivation process, please let me know via Twitter or in the comments. kl_with_glasso

Now, how can the equation consisting of the three terms obtained here be viewed qualitatively? Using the GGM framework introduced in Chapter 1, each term can be interpreted as follows.

Term 1 - Anomaly Detection of Neighborhood Creation and Extinction:. This measures how many other variables $x_i$ is related to a variable $x_i$ , i.e., the degree (number of direct connections) of that variable. A neighborhood is a collection of other variables that are directly linked to that variable. Since the number of nonzero elements in $l_A$ is the same as the degree of $x_i$ , this term serves as an indicator to detect changes in the neighborhood of $x_i$ , i.e., the creation of new connections or the disappearance of existing ones.
Term 2 - "closeness" of the neighborhood graph:. This is the strength of the relationship between variables in the neighborhood of $x_i$ , i.e., how "tightly" connected are the edge weights in the graph. If $x_i$ has just one edge to another variable $j$ , then this term is the difference in the correlation coefficient between $x_i$ and $j$ divided by the precision $\lambda_A$ or $\lambda_B$ associated with $x_i$ . This measures how the strength of the correlations between variables varies.
Term 3 - Change in precision or variance of each variable:. This term shows how the precision (or variance) of each variable changes rather than the change in correlation between variables. Precision is the inverse of variable uncertainty; the higher the precision, the lower the uncertainty. Therefore, this term is a measure that captures how the uncertainty of individual variables varies.

In essence, these three terms capture different aspects of the GGM to detect the relationships among variables, the closeness of those relationships, and how the uncertainty of individual variables is changing. Through these indicators, it is possible to quantitatively assess structural changes in the network, changes in the strength of the relationships between variables, and changes in the certainty of individual variables.

Pair Trading with Graphical Lasso [Application]

Comments

Lastest Posts

GGMPair Trading