Table AM modifications to accept column projection lists

From: Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: pchampion(at)vmware(dot)com, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
Subject: Table AM modifications to accept column projection lists
Date: 2020-11-13 18:01:22
Message-ID: CAE-ML+9RmTNzKCNTZPQf8O3b-UjHWGFbSoXpQa3Wvuc8YBbEQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

This patch introduces a set of changes to the table AM APIs, making them
accept a column projection list. That helps columnar table AMs, so that
they don't need to fetch all columns from disk, but only the ones
actually needed.

The set of changes in this patch is not exhaustive -
there are many more opportunities that are discussed in the TODO section
below. Before digging deeper, we want to elicit early feedback on the
API changes and the column extraction logic.

TableAM APIs that have been modified are:

1. Sequential scan APIs
2. Index scan APIs
3. API to lock and return a row
4. API to fetch a single row

We have seen performance benefits in Zedstore for many of the optimized
operations [0]. This patch is extracted from the larger patch shared in
[0].

------------------------------------------------------------------------
Building the column projection set:

In terms of building the column projection set necessary for each of
these APIs, this patch builds off of the scanCols patch [1], which
Ashwin and Melanie had started earlier. As noted in [1], there are cases
where the scanCols set is not representative of the columns to be
projected. For instance, in a DELETE .. RETURNING query, there is
typically a sequential scan and a separate invocation of
tuple_fetch_row_version() in order to satisfy the RETURNING clause (see
ExecDelete()). So for a query such as:

DELETE from foo WHERE i < 100 && j < 1000 RETURNING k, l;

We need to pass the set (i, j) to the scan and (k, l) to the
tuple_fetch_row_version() invocation. This is why we had to introduce
the returningCols field.

In the same spirit, separate column projection sets are computed for any
operations that involve an EPQ check (INSERT, DELETE, UPDATE, row-level
locking etc), the columns involved in an ON CONFLICT UPDATE etc.

Recognizing and collecting these sets of columns is done at various
stages: analyze and rewrite, planner and executor - depending on the
type of operation for which the subset of columns is calculated. The
column bitmaps are stored in different places as well - such as the ones
for scans and RETURNING are stored in RangeTblEntry, whereas the set of
columns for ON CONFLICT UPDATE are stored in OnConflictSetState.

------------------------------------------------------------------------
Table AM API changes:

The changes made to the table AM API, introducing the column projection
set, come in different flavors. We would like feedback on what style
we need to converge to or if we should use different styles depending
on the situation.

- A new function variant that takes a column projection list, such as:

TableScanDesc (*scan_begin) (Relation rel,
Snapshot snapshot,
int nkeys, struct ScanKeyData *key,
ParallelTableScanDesc pscan,
uint32 flags);
->

TableScanDesc (*scan_begin_with_column_projection)(Relation relation,
Snapshot snapshot,
int nkeys, struct ScanKeyData *key,
ParallelTableScanDesc parallel_scan,
uint32 flags,
Bitmapset *project_columns);

- Modifying the existing function to take a column projection list, such
as:

TM_Result (*tuple_lock) (Relation rel,
ItemPointer tid,
Snapshot snapshot,
TupleTableSlot *slot,
CommandId cid,
LockTupleMode mode,
LockWaitPolicy wait_policy,
uint8 flags,
TM_FailureData *tmfd);

->

TM_Result (*tuple_lock) (Relation rel,
ItemPointer tid,
Snapshot snapshot,
TupleTableSlot *slot,
CommandId cid,
LockTupleMode mode,
LockWaitPolicy wait_policy,
uint8 flags,
TM_FailureData *tmfd,
Bitmapset *project_cols);

- A new function index_fetch_set_column_projection() to be called after
index_beginscan() to set the column projection set, which will be used
later by index_getnext_slot().

void (*index_fetch_set_column_projection) (struct IndexFetchTableData *data,
Bitmapset *project_columns);

The set of columns expected by the new/modified functions is represented
as a Bitmapset of attnums for a specific base relation. An empty/NULL
bitmap signals to the AM that no data columns are needed. A bitmap
containing the single element 0 indicates that we want all data columns
to be fetched.

The bitmaps do not include system columns.

Additionally, the TupleTableSlots populated by functions such
as table_scan_getnextslot(), need to be densely filled upto the highest
numbered column in the projection list (any column not in the projection
list should be populated with NULL). This is due to the implicit
assumptions of the slot_get_***() APIs.

------------------------------------------------------------------------
TODOs:

- Explore opportunities to push the column extraction logic to the
planner or pre-planner stages from the executor stage (like scanCols and
returningCols), or at least elevate the column extraction logic to be
done once per executor run instead of once per tuple.

- As was requested in [1], we should guard column projection set
extraction logic with a table_scans_leverage_column_projection() call.
We wouldn't want a non-columnar AM to incur the overhead.

- Standardize the table AM API for passing columns.

- The optimization for DELETE RETURNING does not currently work for
views. We have to populate the list of columns for the base relation
beneath the view properly.

- Currently the benefit of passing in an empty projection set for ON
CONFLICT DO UPDATE (UPSERT) and ON CONFLICT DO NOTHING (see
ExecCheckTIDVisible()) is masked by a preceding call to
check_exclusion_or_unique_constraint() which has not yet been modified
to pass a column projection list to the index scan.

- Compute scanCols earlier than set_base_rel_sizes() and use that
information to produce better relation size estimates (relation size
will depend on the number of columns projected) in the planner.
Essentially, we need to absorb the work done by Pengzhou [2].

- Right now, we do not extract a set of columns for the call to
table_tuple_lock() within GetTupleForTrigger() as it may be hard to
determine the list of columns used in a trigger body [3].

- validateForeignKeyConstraint() should only need to fetch the
foreign key column.

- List of index scan callsites that will benefit from calling
index_fetch_set_column_projection():

-- table_index_fetch_tuple_check() does not need to fetch any
columns (we have to pass an empty column bitmap), fetching the tid
should be enough.

-- unique_key_recheck() performs a liveness check for which we do
not need to fetch any columns (we have to pass an empty column
bitmap)

-- check_exclusion_or_unique_constraint() needs to only fetch the
columns that are part of the exclusion or unique constraint.

-- IndexNextWithReorder() needs to only fetch columns being
projected along with columns in the index qual and columns in the
ORDER BY clause.

-- get_actual_variable_endpoint() only performs visibility checks,
so we don't need to fetch any columns (we have to pass an empty
column projection bitmap)

- BitmapHeapScans can benefit from a column projection list the same
way as an IndexScan and SeqScan can. We can possibly pass down scanCols
in ExecInitBitmapHeapScan(). We would have to modify the BitmapHeapScan
table AM calls to take a column projection bitmap.

- There may be more callsites where we can pass a column projection list.

Regards,

Soumyadeep & Jacob

[0] https://www.postgresql.org/message-id/CAE-ML%2B-HwY4X4uTzBesLhOotHF7rUvP2Ur-rvEpqz2PUgK4K3g%40mail.gmail.com
[1] https://www.postgresql.org/message-id/flat/CAAKRu_Yj%3DQ_ZxiGX%2BpgstNWMbUJApEJX-imvAEwryCk5SLUebg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAG4reAQc9vYdmQXh%3D1D789x8XJ%3DgEkV%2BE%2BfT9%2Bs9tOWDXX3L9Q%40mail.gmail.com
[3] https://www.postgresql.org/message-id/23194.1560618101%40sss.pgh.pa.us

Attachment Content-Type Size
0001-tableam-accept-column-projection-list.patch text/x-patch 55.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-11-13 18:11:21 Re: error_severity of brin work item
Previous Message Bruce Momjian 2020-11-13 17:42:30 Re: Add docs stub for recovery.conf