Re: sample scans and predicate locking

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: sample scans and predicate locking
Date: 2019-05-19 01:57:42
Message-ID: CA+hUKGJQMyFQan=H+bHNeC4ov+JEcO6v3Z6N0FebSMr6U9UT4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 19, 2019 at 8:31 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> While looking at fixing [1] on master, I noticed the following
> codeblock:
>
> static HeapScanDesc
> heap_beginscan_internal(Relation relation, Snapshot snapshot,
> int nkeys, ScanKey key,
> ParallelHeapScanDesc parallel_scan,
> bool allow_strat,
> bool allow_sync,
> bool allow_pagemode,
> bool is_bitmapscan,
> bool is_samplescan,
> bool temp_snap)
> ...
> /*
> * For a seqscan in a serializable transaction, acquire a predicate lock
> * on the entire relation. This is required not only to lock all the
> * matching tuples, but also to conflict with new insertions into the
> * table. In an indexscan, we take page locks on the index pages covering
> * the range specified in the scan qual, but in a heap scan there is
> * nothing more fine-grained to lock. A bitmap scan is a different story,
> * there we have already scanned the index and locked the index pages
> * covering the predicate. But in that case we still have to lock any
> * matching heap tuples.
> */
> if (!is_bitmapscan)
> PredicateLockRelation(relation, snapshot);
>
> As you can see this only tests for is_bitmapscan, *not* for
> is_samplescan. Which means we afaict currently every sample scan
> predicate locks the entire relation.

Right, I just tested that. That's not wrong though, is it? It's just
overly pessimistic.

> I think there's two possibilities here:
>
> 1) It's just the comment that's inaccurate, and it should really talk
> about both seqscans and sample scans. It should not be necessary to
> lock the whole relation, but I'm not sure the code otherwise takes
> enough care.
>
> 2) We should really not predicate lock the entire relation. In which
> case I think there might be missing PredicateLockTuple/Page calls.

Yeah, we could probably predicate-lock pages in
heapam_scan_sample_next_block() and tuples in
heapam_scan_sample_next_tuple(), instead of doing this. Seems like a
reasonable improvement for 13. But... hmm.... There *might* be a
theoretical argument about TABLESAMPLE(100) behaving differently if
done per page rather than if done at relation level, wrt new pages
added to the end later and therefore missed. And then by logical
extension, TABLESAMPLE of any percentage. I'm not sure.

I made a little list of small SERIALIZABLE projects for v13 and added
this, over here:

https://wiki.postgresql.org/wiki/SerializableToDo

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sascha Kuhl 2019-05-19 02:46:34 Organisational structure
Previous Message Stephen Frost 2019-05-18 23:48:41 Re: Multivariate MCV stats can leak data to unprivileged users