Figures for Alpha-AS 1 and 2 are given in green if their value is higher than that for the AS-Gen model for the same day. Figures in parenthesis are the number of days the Alpha-AS model in question was second best only to the other Alpha-AS model (and therefore would have computed another overall ‘win’ had it competed alone against the baseline and AS-Gen models). We performed genetic search at the beginning of the experiment, aiming to obtain the values of the AS model parameters that yield the highest Sharpe ratio, working on the same orderbook data. Γd is a discount factor (γd∈) by which future expected rewards are given less weight in the current Q-value than the latest observed reward. (γd is usually denoted simply as γ, but in this paper we reserve the latter to denote the risk aversion parameter of the AS procedure). Likert-type scales are commonly used in both academia and industry to capture human feelings since they are user-friendly, easy-to-develop and easy-to administer.

- To ameliorate this, a novel weakly-consistent pure-jump market model that ensures that the price dynamics are consistent with the LOB dynamics with respect to direction and timing is proposed in .
- And then we show how to incorporate those tiers into the model,” says Barzykin.
- The proposed approach leverages the advantages of Monte Carlo backtesting and contributes to the line of research on market making under weakly consistent limit order book models.
- Are the related depths at which the market maker posts the limit orders.
- Table 6 compares the results of the Alpha-AS models, combined, against the two baseline models and Gen-AS.

Increment means that more buy market orders arrived and are filled by sell orders which causes larger spreads. For a fixed inventory level q and a representation of the asset volatility which are obtained from one simulation. Is the set of the admissible strategies, F and G are the instantaneous and terminal reward functions, respectively. Are the related depths at which the market maker posts the limit orders. 5) Why do you opt for discretized large action space instead of simply using a continuous action space and an appropriate RL algorithm, especially given there is a great selection of RL algorithms capable of tackling continuous action spaces?

## Views

Topics in https://www.beaxy.com/ control with applications to algorithmic trading. PhD Thesis, The London School of Economics and Political Sciences. And for the stock price dynamics which are provided in each model definition.

### Top 10 Quant Professors 2022 – Rebellion Research

Top 10 Quant Professors 2022.

Posted: Thu, 13 Oct 2022 07:00:00 GMT [source]

Indeed, this result is particularly noteworthy as the Avellaneda-Stoikov method sets as its goal precisely to minimize the inventory risk. Nevertheless, the flexibility that the Alpha-AS models are given to move and stretch the bid and ask price spread entails that the Alpha-AS models can, and sometimes do, operate locally with higher risk. Overall performance is more meaningfully obtained from the other indicators (Sharpe, Sortino and P&L-to-MAP), which show that, at the end of the day, the Alpha-AS models’ strategy pays off. Nevertheless, it is still interesting to note that AS-Gen performs much better on this indicator than on the others, relative to the Alpha-AS models. This means that, provided its parameter values describe the market environment closely enough, the pure AS model is guaranteed to output the bid and ask prices that minimise inventory risk, and any deviation from this strategy will entail a greater risk.

## Sortino ratio

It also leaves sufficient to submit and execute orders before the next tick-report. Besides, we find that the number of signals generated from the system can be used to rank stocks for the preference of LOB trading. We test the system with simulation experiments and real data from the Chinese A-share market.

The original Avellaneda-Stoikov model was chosen as a starting point for our research. We plan to use such approximations in further tests with our RL approach. The performance results for the 30 days of testing of the two Alpha-AS models against the three baseline models are shown in Tables 2–5. All ratios are computed from Close P&L returns (Section 4.1.6), except P&L-to-MAP, for which the open P&L is used. Figures in bold are the best values among the five models for the corresponding test days.

## IEEE Transactions on Knowledge and Data Engineering

The ensuing deep reinforcement learning controller is compared to multiple market making benchmarks, with the results indicating its superior performance with respect to various risk-reward metrics, even under significant transaction costs. Data normalization for features and labeling for signals are required for classification. Instead of simply labeling the mid-price movement as in Kercheval and Zhang and Tsantekidis et al. , we consider the direct trading actions, including long, short, and none. This approach is inspired by the previous application of deep learning to trade signals in the context of VIX futures (Avellaneda et al., 2021). The signals are determined by the approximate wealth changes during a fixed and limited holding period, during which we set stop-loss and take-profit points.

Consequently, we support our findings by comparing the models proposed within this research with the stock price impact models existing in literature. Last but not least, we have substantially improved the performances of a market maker with the proposed models. Table13 which is achieved from all simulations demonstrates that the Model C which is the stock price modeling with stochastic volatility, has relatively larger expected return, but also a relatively larger standard deviation. Meanwhile, the other stock price modelings in Table13 produce higher Sharpe ratios. GALA The Sharpe ratio is a measure of mean returns that penalises their volatility.

## 2 Action space

It is demonstrated that the Model d has a Gaussian normal distribution while the others are positively skewed. We relied on random forests to filter state-defining features based on their importance according to three indicators. Various techniques are worth exploring in future work for this purpose, such as PCA, Autoencoders, Shapley values or Cluster Feature Importance .

- However, on 13 of those days Alpha-AS-1 achieved a better P&L-to-MAP score than Gen-AS, substantially so in many instances.
- Through repeated exploration the agent gradually learns the relationships between states, actions and rewards.
- In the training phase we fit our two Alpha-AS models with data from a full day of trading .
- An ε-greedy policy is followed to determine the action to take during the next 5-second window, choosing between exploration , with probability ε, and exploitation , with probability 1-ε.

As regards market making, the AS algorithm, or versions of it , GAL have been used as benchmarks against which to measure the improved performance of the machine learning algorithms proposed, either working with simulated data or in backtests with real data. The literature on machine learning approaches to market making is extensive. Inventory management is therefore central to market making strategies , and particularly important in high-frequency algorithmic trading. In an influential paper , Avellaneda and Stoikov expounded a strategy addressing market maker inventory risk.

## 5 Training

With the same assumptions and quadratic utility function as in Case 1 in Sect. Therefore, the corresponding HJB equation can be obtained by applying the stochastic control approach. Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Although 2 reviewers consider that the manuscript is suitable of publication in its current stand, one of the reviewers still show some concerns that need to be addressed before to deserve this manuscript for publication. These concerns are referred to the methodological part of the research and the writing style. However, I am sure that the author will be able to solve these issues.

Double DQN is a deep RL approach, more specifically deep Q-learning, that relies on two neural networks, as we shall see shortly (in Section 4.1.7). In this paper we present a double DQN applied to the market-making decision process. The RL agents (Alpha-AS) developed to use the Avellaneda-Stoikov equations to determine their actions are described in Section 4.1. An agent that simply applies the Avellaneda-Stoikov procedure with fixed parameters (Gen-AS), and the genetic algorithm to obtain said parameters, are presented in Section 4.2. Random forest is an efficient and accurate classification model, which makes decisions by aggregating a set of trees, either by voting or by averaging class posterior probability estimates.

Problema: Las acciones cotizantes en el mercado de capitales argentino poseen liquidez limitada.

Solución: Desarrollo de un bot de trading algorítmico con una estrategia de market making basada en el paper y desarrollo matemático de Avellaneda & Stoikov.

A ver que sale. https://t.co/Q0BrFLaJaK

— AVW @ ETHDenver (@vwandres) November 10, 2021

MWCVC is a very suiavellaneda-stoikov paper infrastructure for energy-efficient link monitoring and virtual backbone formation. In this paper, we propose a novel metaheuristic algorithm for MWCVC construction in WANETs. Our algorithm is a population-based iterated greedy approach that is very effective against graph theoretical problems.

Avellaneda-Stoikov, can Google. Hummingbot has nice articles but should prolly read the original paper if you’re seriously interested

paper is academic perfect frictionless environment crap but still good baseline model go build from

— David Holt 🌴 (@IDrawCharts) April 15, 2022

At each training step the parameters of the prediction DQN are updated using gradient descent. An early stopping strategy is followed on 25% of the training sets to avoid overfitting. The architecture of the target DQN is identical to that of the prediction DQN, the parameters of the former being copied from the latter every 8 hours.