Programming languages: Facebook open-sources its fast F14 hash table written in C++

Facebook has open-sourced one of the tools it uses to help manage an enormous amount of data on a daily basis.

The social media giant has added its F14 hash table to its Folly open source library of components written in the C++ programming language.

Hash tables are widely used by software to rapidly store and look-up data, and are prized for their ability to look-up data in a timely fashion, even as the volume of data stored in the table increases.

"Folly's F14 is widely used inside Facebook," write Facebook software engineers Xiao Shi and Nathan Bronson in a blog post, who go on to spell out how F14 has improved the efficiency of everyday hash table operations.

"F14 works well because its core algorithm leverages vector instructions to increase the load factor while reducing collisions, because it supports multiple memory layouts for different scenarios, and because we have paid attention to C++ overheads near the API. F14 is a good default choice - it delivers CPU and RAM efficiency that is robust across a wide variety of use cases."

As the volume of data stored by companies continues to increase, and with hash tables being so widely used, Facebook believes that releasing F14 - a 14-way probing hash table - could have a significant impact on software's capabilities.

"They are such a ubiquitous tool in computer science that even incremental improvements can have a large impact," the write.

SEE: 10 ways to prevent developer burnout (free PDF) (TechRepublic)

In particular, Facebook highlight how F14 uses a strategy called chunking to reduce the likelihood of collisions, where multiple keys in the hash table resolve to the same array index, which can slow down operations by requiring additional processing time and memory access.

Other performance gains are made possible by cutting the memory and CPU overhead when handling empty slots in the hash table, which can be particularly helpful for a hash table handling a lot of insert and erase workloads.

Memory use is trimmed in various ways - which in turn can improve performance by allowing more of a program's data to fit in cache - by increasing the efficiency of array use, as well as by reducing the metadata associated with allocating a table and the insert and rehash operations.

F14 uses a hybrid storage strategy called F14Fast, which choose the most efficient approach for storing data depending on the value types.

Facebook says the performance boost delivered by F14 also stems from using less code than some competing approaches, with F14 reducing the need for object construction and copies, both inside the hash table and the surrounding code.

When converting existing code to F14, Facebook encountered both unit test failures and production crashes, with the company's blog post detailing the nature of the failures it encountered and how it worked around them.

If you're a developer using C++ then check out the major new C++-related features in Microsoft's Visual Studio 2019 IDE, or if you want to learn the up and coming C++ alternative Rust, TechRepublic has rounded up the best free resources online.

To experiment with F14 yourself, check out the project page on GitHub.

Also see