Supported Versions: Current (17) / 16 / 15 / 14 / 13
Development Versions: devel
Unsupported versions: 12 / 11 / 10 / 9.6 / 9.5 / 9.4 / 9.3 / 9.2 / 9.1 / 9.0 / 8.4 / 8.3 / 8.2 / 8.1
This documentation is for an unsupported version of PostgreSQL.
You may want to view the same page for the current version, or one of the other supported versions listed above instead.

48.6. Index Cost Estimation Functions

The amcostestimate function is given a list of WHERE clauses that have been determined to be usable with the index. It must return estimates of the cost of accessing the index and the selectivity of the WHERE clauses (that is, the fraction of parent-table rows that will be retrieved during the index scan). For simple cases, nearly all the work of the cost estimator can be done by calling standard routines in the optimizer; the point of having an amcostestimate function is to allow index access methods to provide index-type-specific knowledge, in case it is possible to improve on the standard estimates.

Each amcostestimate function must have the signature:

void
amcostestimate (PlannerInfo *root,
                IndexOptInfo *index,
                List *indexQuals,
                Cost *indexStartupCost,
                Cost *indexTotalCost,
                Selectivity *indexSelectivity,
                double *indexCorrelation);

The first four parameters are inputs:

root

The planner's information about the query being processed.

index

The index being considered.

indexQuals

List of index qual clauses (implicitly ANDed); a NIL list indicates no qualifiers are available. Note that the list contains expression trees, not ScanKeys.

The last four parameters are pass-by-reference outputs:

*indexStartupCost

Set to cost of index start-up processing

*indexTotalCost

Set to total cost of index processing

*indexSelectivity

Set to index selectivity

*indexCorrelation

Set to correlation coefficient between index scan order and underlying table's order

Note that cost estimate functions must be written in C, not in SQL or any available procedural language, because they must access internal data structures of the planner/optimizer.

The index access costs should be computed in the units used by src/backend/optimizer/path/costsize.c: a sequential disk block fetch has cost 1.0, a nonsequential fetch has cost random_page_cost, and the cost of processing one index row should usually be taken as cpu_index_tuple_cost. In addition, an appropriate multiple of cpu_operator_cost should be charged for any comparison operators invoked during index processing (especially evaluation of the indexQuals themselves).

The access costs should include all disk and CPU costs associated with scanning the index itself, but not the costs of retrieving or processing the parent-table rows that are identified by the index.

The "start-up cost" is the part of the total scan cost that must be expended before we can begin to fetch the first row. For most indexes this can be taken as zero, but an index type with a high start-up cost might want to set it nonzero.

The indexSelectivity should be set to the estimated fraction of the parent table rows that will be retrieved during the index scan. In the case of a lossy index, this will typically be higher than the fraction of rows that actually pass the given qual conditions.

The indexCorrelation should be set to the correlation (ranging between -1.0 and 1.0) between the index order and the table order. This is used to adjust the estimate for the cost of fetching rows from the parent table.

Cost Estimation

A typical cost estimator will proceed as follows:

  1. Estimate and return the fraction of parent-table rows that will be visited based on the given qual conditions. In the absence of any index-type-specific knowledge, use the standard optimizer function clauselist_selectivity():

    *indexSelectivity = clauselist_selectivity(root, indexQuals,
                                               index->rel->relid, JOIN_INNER);
    
  2. Estimate the number of index rows that will be visited during the scan. For many index types this is the same as indexSelectivity times the number of rows in the index, but it might be more. (Note that the index's size in pages and rows is available from the IndexOptInfo struct.)

  3. Estimate the number of index pages that will be retrieved during the scan. This might be just indexSelectivity times the index's size in pages.

  4. Compute the index access cost. A generic estimator might do this:

        /*
         * Our generic assumption is that the index pages will be read
         * sequentially, so they have cost 1.0 each, not random_page_cost.
         * Also, we charge for evaluation of the indexquals at each index row.
         * All the costs are assumed to be paid incrementally during the scan.
         */
        cost_qual_eval(&index_qual_cost, indexQuals);
        *indexStartupCost = index_qual_cost.startup;
        *indexTotalCost = numIndexPages +
            (cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;
    
  5. Estimate the index correlation. For a simple ordered index on a single field, this can be retrieved from pg_statistic. If the correlation is not known, the conservative estimate is zero (no correlation).

Examples of cost estimator functions can be found in src/backend/utils/adt/selfuncs.c.