Hash generation
The standard practice for most PL/Proxy implementations is to partition based on the hashtext of the field you're splitting on, which allows splitting among a number of nodes fairly without knowing the distribution of the dataset in advance. hashtext is a PostgreSQL-provided internal function that takes in any text input and generates an integer as output with a hash code. If you AND the result at the bit level, to only take the lowest few bits, this turns out to be a very quick and fair way to distribute load among multiple systems. For this to work, the number of partitions needs to be a power of 2 (2, 4, 8, 16, and so on) and then you bitwise AND against one less than that number. So for two partitions that's & 1, for four ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access