How can we do better than our naive implementation? Obviously, there's scope for algorithmic improvement—we could implement any kind of probing—but if we're trying to compete with standard library's HashMap it's likely that we have a specialized reason for doing so. A specialized reason implies we know something unique about our data, or are willing to make a trade-off that a general data structure cannot. Speaking broadly, the main goals one should have when building software at the limit of machine performance are as follows:
- Improve the underlying algorithm, reducing total work done.
- Improve cache locality of data accesses. This may mean:
- Keeping your working-set in L1 cache
- Compressing your data to fit better ...