From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: UNDO and in-place update |
Date: | 2016-11-23 22:18:01 |
Message-ID: | CAEepm=275rxsvaM1pZTRg31=W7QBoWA_ShzH9StDmySC1B5emg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Nov 23, 2016 at 6:01 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> * Our behavior with many duplicates in secondary indexes is pretty bad
> in general, I suspect.
From the pie-in-the-sky department: I believe there are
snapshot-based systems that don't ever have more than one entry for a
logical row in a btree. Instead, they do UNDO-log based snapshot
reconstruction for btrees too, generalising what is being discussed
for heap tables in this thread. That means that whenever you descend
a btree, at each page you have to be prepared to go and look in the
UNDO log if necessary on a page-by-page basis. Alternatively, some
systems use a shadow table rather than an UNDO log for the old
entries, but still privilege queries that need the latest version by
keeping only those in the btree. I don't know the details but suspect
that both of those approaches might require CSN (= LSN?) based
snapshots, so that you can make page-level visibility determinations
while traversing the 'current' btree based on a page header. That
would reduce both kinds of btree bloat: the primary kind of being
entries for dead tuples that still exist in the heap, and the
secondary kind being the resultant permanent btree fragmentation that
remains even after vacuum removes them. Presumably once you have all
that, you can allow index-only-scans without a visibility map. Also,
I suspect that such a "3D time travelling" btree would be a
requirement for index ordered tables in a snapshot-based RDBMS.
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2016-11-23 22:47:09 | Re: amcheck (B-Tree integrity checking tool) |
Previous Message | Tom Lane | 2016-11-23 22:04:54 | Re: DROP FUNCTION of multiple functions |