Row Visibility and Table Access Methods

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Row Visibility and Table Access Methods
Date: 2018-12-13 09:23:12
Message-ID: CANP8+jJnq=qnZQSJZ3OSCp5UkAXNf2g7rF3yPPwPX_dSUJZJqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Currently, tables provide MVCC access semantics as the only option.

A more complete list of desirable semantics in applications are

* MVCC - provide latest consistent view
* Historical - keep all old row versions so that queries can access data as
it used to be
* TTL=Duration - keep committed rows around for at most Duration seconds
* Latest - show only the most current committed row version, at the cost of
inconsistency
There might be others

Since I see these options as semantics rather than physical, I think we
should separate these operations away from Table Access Methods. This
allows those semantics to be implemented in different ways for different
storage types.

"Historical" access has been discussed many times, so no need to revisit
here. Clearly, it is a very poopular idea, just not easily feasible with
the current heap access layer. We might want an option for row_visibility
retention. For tables with this option set, we would in later releases
allow historical queries according to the SQL Standard.

"TTL" or "Time To Live" - time-limited access to data is available in many
other databases. It is simple to implement and we could easily have this in
PG12. Implementation is 1) adding the option, 2) adding a time-component
into the visibility check for scan and vacuum. This option implies an
option exists to specify row_visibility retention.

"Latest" is similar to the way EvalPlanQual works, allowing UPDATE to see
the latest version of a row before update, and similar to the way catalog
scans work in that any access to a catalog entry sees the latest row based
upon an immediate snapshot, not the one taken at the start of a query. It
makes sense to allow this as an explicit table-level option, so any data
viewed can see the latest version, just as UPDATEs already do. This also
means that juggling bloat and old row versions becomes much less of an
issue for very high volume update applications such as booking systems or
stock tickers. (Clearly, better table access methods would improve on this
further and even they would benefit from this simplification of the main
issue around MVCC).
Logic for this type of visibility already exists in PG
via HeapTupleSatisfiesSelf(), so we are just exposing what is already there
to the user; no need to discuss semantics at length.
Again, patches to implement this option are simple and viable for PG12.

User interface are 2 new table-level options
* row_visibility = MVCC (default), TTL, Latest, Historical
* row_visibility_retention_interval = 'system' (default)
For MVCC, the only valid setting would be system, i.e. current MVCC
behavior (but this might be altered by specific storage plugin parameters)
For Latest, the only valid setting would be system
For TTL, the interval to retain data for, setting of 0 is not valid
For Historical, the interval to retain old row versions for, 0 means forever

Implementation summary
1. Add new table-level option for row_visibility and
row_visibility_retention_interval
2. Add option to heap_beginscan
3. Add option handling in heap prune
4. Add option to tqual

Thoughts?

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2018-12-13 09:41:39 'infinity'::Interval should be added
Previous Message Peter Eisentraut 2018-12-13 07:20:47 Re: Upgrading pg_statistic to handle collation honestly