162 Automated Physical Database Design and Tuning
even a very conservative cost fraction of 2% results in a large fraction
of column subsets being pruned (because many column subsets are not
referenced often enough or are mentioned only in cheap queries). To
further decrease the set of candidates we rank column subsets by some
effectiveness metric. A possible definition of effectiveness for a column
subset is given by the VPC metric, which captures the fraction of the
scanned data in a partition that would be useful in answering queries
(VPC is short for vertical partitioning confidence, and it is discussed
in detail in the references provided in Section 9.6). We then take the
top-k highest-ranked candidate subsets for each table in the workload
and can ...