Online Estimation
System Trade

created at 1/1/2024, 7:06:53 AM

System Trade with Online Estimation

What is online estimation?

When you design trading algorithms, you will need to set various parameters. Typical examples are machine learning and statistical models, but also market making strategies and arbitrage (like threshold setting).

How do you select these parameters? Of course, the ideal approach would be to use historical data and estimate values that are statistically valid. Or, you may choose to execute arbitrage based on historical data and use a rule of thumb that you will execute arbitrage when the spread is at this level.

What about situations where there is not enough historical data, or no historical data at all, or where it is difficult to obtain such data? In such situations, it becomes difficult to estimate parameters with traditional models.

I would like to introduce a method called online estimation, which estimates parameters without using historical data. By not using historical data, we can also save memory when actually implementing the algorithms. You can also set up dynamic parameters that update parameters as data comes in one by one.

Offline and Online estimation

The stimation methods which are commonly used for parameter estimation such as the ordinary least squares (OLS) are called batch estimation (offline estimation), whereas what this article deals with is sequential estimation (online estimation).

When a model is built using batch estimation, the model is trained once from historical data and then put into practice based on the parameters obtained from the training. The parameters optimized here are basically unchanged (these are sometimes referred to as static parameters).

In the case of online estimation, the parameters are updated one after another as the data is streamed one after another.

Simple Example: Let's estimate the mean of a time series.

Suppose we have time series data $X_1,.... X_n$ . For example, if we want to find the mean $\mu_n$ of this in the usual way,

\mu_n = \frac{X_1 + .... + X_n}{n}

But you already know the value $\mu_{n-1}$ of the average one period ago at time $T=n$ . If we try to use this to represent $\mu_n$ ,

\begin{aligned} \mu_n &= \frac{X_1 + .... + X_n}{n} \frac &= \frac{(\frac{X_1 + .... + X_{n-1}}{n-1}) \times (n-1) + X_n}{n} \frac &= \frac{\mu_{n-1} \times (n-1) + X_n}{n} \end{aligned}

and the data until one period ago $X_1,.... ,X_{n-1}$ can be used to find the average without using $X_{n-1}$ . Also, when new data $X_{n+1}$ comes here, we can use $\mu_n$ to find $\mu_{n+1}$ without referring to the historicals. This is a simple example, but it shows how these formulaic techniques can be used to estimate the parameters you want to find more efficiently.

Various Parameter Estimation Methods

There are many different approaches to parameter estimation in different fields. The same is true for online estimation, where the appropriate model must be selected depending on the type of data used and the situation. The following is a brief introduction to these methods (leaving aside the explanation of specific formulas).

Sequential Least Squares

Sequential least squares method is an improved version of the least squares method that allows sequential estimation. Sequential least squares has been successfully improved by expressing the relationship between time $t$ and $t-1$ of the least squares method in terms of an incremental formula. Although it is a classical model, it is therefore relatively easy to implement and runs at high speed.

Python Implementation of Sequential Least Squares

class SequentialLeastSquares:
    def __init__(self):
        self.theta = 0  
        self.P = 1000  

    def update(self, x, y):
        K = self.P * x / (x**2 * self.P + 1)
        self.theta += K * (y - x * self.theta)
        self.P -= K * x * self.P

    def predict(self, x):
        return x * self.theta

Online Convex Optimization

Online Convex Optimization (OCO) is a sequential optimization technique used in machine learning to solve optimization problems using sequentially available data. It is effective for a wide range of optimization problems, as long as the objective function is convex, regardless of the shape of the distribution.

Although it may not be familiar to those in the statistics field, the online Newton method and the ADAM family of algorithms often used in neural network training are well-known.

Python class for solving simple linear regression problems with the online Newton method

import numpy as np

class OnlineNewtonStep:
    def __init__(self, dim, eta, lambda_reg):
        self.theta = np.zeros(dim)
        self.A = lambda_reg * np.eye(dim)
        self.eta = eta
        self.lambda_reg = lambda_reg

    def update(self, x, y):
        grad = -x * (y - np.dot(x, self.theta))
        self.A += np.outer(x, x)
        A_inv = np.linalg.inv(self.A)
        self.theta -= self.eta * np.dot(A_inv, grad)

    def predict(self, x):
        return np.dot(x, self.theta)

Bayesian estimation

Bayesian estimation is an estimation technique in statistics based on Bayes' theorem. The method combines prior beliefs about unknown parameters (prior probabilities) with newly obtained data (likelihoods) to update parameter estimates (posterior probabilities).

Bayesian is characterized by the fact that the data $X$ is considered a probability distribution rather than a mere variable. This allows quantitative evaluation of the reliability of parameters and models. Bayesian has many other advantages, such as the ability to incorporate parameters obtained from historical data into the model as a prior distribution.

Simplest example: Bayesian estimation of the probability that a coin will turn up Heads.

import numpy as np
from scipy.stats import beta
import matplotlib.pyplot as plt

# Bayesian update function, where alpha and beta are parameters of the beta distribution
def bayesian_update(alpha, beta, heads, tails):
    return alpha + heads, beta + tails

# Initial parameters (both 1 if no prior knowledge)
alpha_prior = 1
beta_prior = 1

# Coin tossing data (e.g., 1 is heads, 0 is tails)
coin_flips = [0, 1, 1, 0, 1, 1, 0, 1, 1, 1]

# Sequential Bayesian update
# Parameters alpha and beta are updated each time new data arrives.
for flip in coin_flips:
    heads, tails = int(flip == 1), int(flip == 0)
    alpha_prior, beta_prior = bayesian_update(alpha_prior, beta_prior, heads, tails)
    print(f"Updated alpha: {alpha_prior}, Updated beta: {beta_prior}")

# Drawing the posterior probability distribution
x = np.linspace(0, 1, 100)
plt.plot(x, beta.pdf(x,  alpha_prior, beta_prior))
plt.show()

We have used the beta distribution (the distribution we deal with in binary problems) as an example here, but of course it can be applied by using other distributions such as the normal distribution.

Kalman Filter

One drawback of online estimation is that it is sensitive to noise. If there are outliers in the new data, the parameters will be pulled by the bad data and deteriorate drastically.

The Kalman filter is the most famous sequential estimation method that combines a mechanism for dealing with such outliers. By passing a variable for error correction called the Kalman gain through the new input values, the Kalman filter is able to handle outliers despite being a sequential estimation method.

It is important to note that the parameters of the model used in the Kalman filter are not themselves sequentially estimated. The Kalman filter is used for sequential estimation of state variables in a state-space model, but it is not directly used for estimating the parameters of the model itself. Therefore, when estimating the parameters themselves, it is common to use the maximum likelihood method, which is not an online estimation in the strict sense of the word. (When estimating parameters, the log-likelihood of the model at a given parameter is maximized.)

In Python, it is implemented in pykalman and statsmodels. However, since the original state space representation is often used when actually building the model, it is often implemented without a package.

The Kalman filter itself is a linear model, but there are various models that can be applied to nonlinear situations, such as the extended Kalman filter and the unscented Kalman filter.

Particle Filter

A particle filter is a type of state-space model estimation method that uses a type of Monte Carlo method (a method based on stochastic simulation).

While the simpler Kalman filter is often used in practice, the particle filter is often used at the laboratory level. It can also estimate the state of nonlinear and non-Gaussian systems.

Samples called particles are used to represent the probability distribution of the system state, and parameters are estimated by sampling each time new data comes in, sorting particles according to their importance. If the number of particles is sufficiently large, it can produce higher accuracy than the extended Kalman filter or the unscented Kalman filter. However, its high computational cost is a drawback, and it is unlikely to be used in practice.

Let's incorporate online estimation into your trading algorithms.

I will present some examples of applications of online estimation to your trading algorithms here .

High-frequency Market Making

A typical model in which online estimation is used would be a market making strategy. Bayesian updating is useful for efficiently estimating parameters for data that comes one after another.

Marcos Lopez's famous book "Finance Machine Learning" also introduced the use of Bayesian updating when estimating PIN (Probability of Information trading).

This equation tells us that the critical factor that determines the price range at which market makers provide liquidity is
$PIN_t = \frac{\alpha_t \mu }{\alpha_t \mu + 2\epsilon}$
The subscript $t$ indicates that the probabilities $\alpha$ and $\delta$ are estimated at that point in time. The authors apply a Bayesian updating process to incorporate information after each trade arrives to the market.

For high frequency strategies, it is a great advantage to estimate parameters such as volatility and PIN dynamically. Especially the high computational efficiency and memory savings are a plus.

New Cryptocurrency Listings

Online estimation is also a powerful weapon in the market-making strategy for newly listed tokens.

For example, I heard that a token that is already listed on Binance will now be listed on Bybit. I want to create a market-making bot for that token on Bybit based on the more liquid Binance price. However, of course, we don't have any data, so we can't determine how much the price difference between Binance and Bybit is.

If we can sequentially obtain price data after listing and estimate the probability distribution drawn by the price difference between exchanges, it can be useful for market making of newly listed tokens. If the variance itself can be incorporated into the model as an initial value, such as with an online convex optimization algorithm or Bayesian estimation, you can gradually narrow the bid-offer spread with each data update, by initially setting a large value on the variance factor.

Incorporating price information into online estimation is also useful for market making using inventory models on multiple exchanges, even if not for newly listed tokens.

Trading stable coins on Dexes

Online estimation is also useful on decentralized exchanges (DEX) where historical data is not readily available.

USDT and USDC are listed on one DEX. Since they are stable coins, theoretically the prices of the two coins should both be roughly around $1.00, but we have observed that the price difference occasionally opens up. You want to build a bot that takes the difference when this price difference opens up to a certain point. So how should we set that threshold?

When information on prices is lacking, such as for newly listed DEXs or new pools, this is where online estimation comes in. If you are estimating a distribution with only one variable, the online Newton method or the sequential least squares method will be computationally efficient because they do not require the computation of an inverse matrix. Bayesian estimation assuming a normal distribution as posterior distribution is also efficient.

Since there are no particularly complicated calculations, you can also build an on-chain BOT using smart-contracts. From the USDC-USDT spread obtained from every block, parameters (in this case, the distribution of price differences) can be estimated without storing historical data.

The Conclusion

I wrote this introductory article because I thought that there are many people who don’t usually make use of these methods. Recently, not only in the field of finance, the strategy of neural nets and machine power is becoming popular, and less importance tends to be placed on academic tactics. In this context, I believe online estimation, which seeks efficiency in a faster temporal time frame, has different advantages. In addition to the applications listed above, the model can be applied to various other areas, such as estimation of overall market trends and NFT floor estimation, and is easy to implement.

Comments

Lastest Posts

Online EstimationSystem Trade