Best Practices on HFT low-latency software

After several years developing high-performance trading systems I come up with some rules of thumb. When talking about low latency/high frequency trading, I’m talking about software that must make a buy or sell decision within 20us (microseconds).

In order to achieve these things, I’ve learned that I need to forget everything about modern software engineering. You have to change your mind entirely and forget everything learned in this field: latency is the king, no matter how ugly is your code.

As a result, I will summarize all the obstacles I’ve found developing these kind of systems.

Programming Language: No, there is no perfect language for this kind of operations, but choose it carefully. Not only you have to understand how to use it but master it! Understand what it does on each instruction, how the memory is managed each time you call an object, etc. IF you are using C# or Java, you have to master the Garbage Collector, this could be killer. My choice always was C/C++.

Choose your types: tell me what types are you using, and I will tell you how slow you can be. Avoid strings, dates, bigDecimal, autoboxing, complex data structures (e.g. ArrayList grows, stacks, Maps rehash).

Avoid Exception Handing: YES, avoid it! It’s expensive. Exception handling adds 10-20% execution time at least. Compilers needs to add additional code, take care of additional stack management to handle exceptions. That cost time. And before somebody tells me about GCC uses the zero-cost model, I would say, please profile your system and measure it! Remember, each microsecond counts.

Threads: threads block/context switch, the scheduler will intervene, difficult to reason about performance when there are many threads. Understand how they behaves on your OS. Understand how your hardware architecture works with threads… I know, It’s boring, but essential. You don’t need to design fancy thread systems (e.g. ring buffers, etc). In most of the cases, the simplest, the better. My best approach: pinned threads to a core – use busy spinning so the core is always looking at the queue.

Caches: L1 Cache at 5ns up to disk at 10ms. Main memory is 100ns. To be fast enough one needs to consider where data are stored. Make sure that your algorithms and data structure take advantage of the L1 caches as much as you can.

Layers of abstraction: Forget encapsulating, making your code nice, clean and reusable… When data is passed from one layer to another the data are copied. The scheduler de-prioritizes our process to give other processes their “fair share”, meaning tons of CPU cycles lost!

Warming up the data: Make sure you pre-allocate all your data structures before the main system starts. Also keep in mind reusable objects, so you don’t have to allocate them later. Remember, allocation is expensive.

Ariel Silahian
http://www.sisSoftwareFactory.com/quant
https://twitter.com/sisSoftware

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s