Design optimization covers several data layout and design strategies to improve performance.
Hive partitioning is one of the most effective methods to improve the query performance on larger tables. The query with partition filtering will only load the data in the specified partitions (subdirectories), so it can execute much faster than a normal query that filters by a non-partitioning field. The selection of partition key is always an important factor for performance. It should always be a low cardinal attribute to avoid many subdirectories overhead.
The following are some commonly used dimensions as partition keys: