How do I design high-frequency trading systems and its architecture. Part I

In this article, I would like to talk about how to implement a low latency system, modules needed and its system architecture and be focus on the software development side.

There are other parts that need to be covered in order to have a complete system, and they might be things like network communications, switches/routers, specialized servers, FPGAs, system operating tweaking, like kernel-bypassing, etc. But, as I said, in this article I will be more focus on the software design and its architecture.

This is a basic approach, on what I consider best practices for this kind of trading systems. Depending on different factors, markets, compliances, this may vary, but some basics will remain the same for any trading system.

Of course, that the key factor here is being able to handle big amount of incoming information, response time to external events, internal response time, and capabilities to provide the highest throughput and lowest latency.

When I design low latency systems, I always found the following challenges:

  • The system must be able to handle multiple strategies, making sure that the system will not underperform as we add more strategies.
  • Order book reconstruction from different venues. If we take as an example like the forex market, there is tons of different venues and ECNs, all of them using different API and connectivity standards. The system must be able to handle those different sources and aggregate them into an internal and well-defined data structure.

 

 

  1. Data Feed Handler

The feed handler is one of the most important components of any algorithmic trading system. Without accurate market data, any high-frequency or algorithmic strategy won’t be able to make correct decisions. So, the idea of this module is to capture market data from different venues, to allow our strategies, generate correct decisions, and keep a representation of venues’ limit order book.

A large amount of market updates can be received per second for each market venue, and your internal representation of each limit order book should change accordingly.

This process should go like this:

_fig01

The link between venues and your system is usually made using FIX protocol, so I have to put a FIX engine, in order to communicate with them (receive and send messages) and provide the core infrastructure that interfaces and manages the connectivity into the venues using FIX protocol.

Receiving message will be receiving market updates, and sending, will be sending orders generated by my strategies.

There are two options to implement a FIX Engine: custom made, or a commercial option. To implement a custom FIX engine, you will need to put too many man-hours and you can make sure that you will optimize the communication for a low latency environment. Firms like large banks and institution will prefer their own FIX engine, so they can have ownership of the entire code.

I prefer to go with the commercial option, since there are too many good options out there, and you just need to plug their library, and you will be ready to go.

Note that besides of network and communication latency, there also will be a decode/parse latency, so I’ve to take care of it as well. Parsing is a string manipulation, hence very expensive in terms of CPU cycles and memory management.

 

There are venues that also started to provide connectivity through newer and better protocols since FIX is becoming too slow for nowadays algorithms. FAST, ITCH/OUTCH are binary protocols and I always try to use them as long they are available. They will use a different engine to communicate, but the concept remains the same.

 

 

  1. Limit Order Book

Once the system has its connectivity between the venues, I will need to update all the events reported by them: order updates/adds or deletions (trades if needed as well).

For each event received I will go into my internal order book and make the changes it needs to be made.

Usually, all venues send these updates using a unique identifier (EntryID) and the update type (update is an insert, update or delete) so you can accurately replicate their limit order book.

 

Here I show how the limit order book reconstruction works:

_fig02

One important thing to keep in mind here, is what type of data structure we want to use to hold the order book data. Remember that we can get several million of updates per seconds, so our data structure should be fast enough to find an entry or delete it.

 

From previous recommendations, I stated that the best data structure is to use plain arrays: one for the bids, one for the asks. They provide the best performance.

Moreover, pre-allocate as much memory as you can.

Dynamic allocation is expensive, should not be used in critical paths like updating a limit order book data structure and you will end up having a large amount of overhead per allocation.

But since we are building a low latency system, we need to pre-allocate our data structures. And I’m going to show you the difference between pre-allocating and dynamic allocation, and its performance impact.

On the system initialization phase, pre-allocate these arrays: bidsArray and askArrays, and let’s say we are going to store a 10-level depth of the book, hence, we will need to pre-allocate 10-element bid/ask array. And we are going to move/replace elements, NOT removing and creating since the allocation process will consume too much time.

 

The following code show how the dynamic allocation works:

_fig03

As you can see, we add or delete, and then I sort the entire array, so we can keep an ordered structure.

In the next section, I show how to pre-allocate and reuse.

_fig04

You define the array with 10 elements on it and you will reuse all those elements. You may notice that instead of deleting the element, I’m clearing its values, and once ordered, sent them at the beginning of the array.

 

Once I run a performance test between them, counting CPU cycles taken on each case on a sample of 100,000 iterations,  I get the following results.

_fig05

Wow, a 60% difference. Not a surprise for me, allocation/deallocation is expensive. And this was a very basic/simple example!!!

 

You can check this code example on my gist repository https://gist.github.com/silahian

 

To be continued …

 

Ariel Silahian

http://www.sisSoftwareFactory.com/quant

https://twitter.com/sisSoftware

Keywords: #hft #quants #forex #fx #risk $EURUSD $EURGBP $EURJPY

Advertisements

One thought on “How do I design high-frequency trading systems and its architecture. Part I

  1. Rahul Arora

    Why can’t we use pre-allocated hash-maps for this ?
    It will also save us time to find the element in case of deletes and updates.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s