Re: TABLESAMPLE patch is really in pretty sad shape

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: TABLESAMPLE patch is really in pretty sad shape
Date: 2015-07-25 01:06:38
Message-ID: 55B2E11E.1020007@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-07-25 00:36, Tom Lane wrote:
> I wrote:
>> Petr Jelinek <petr(at)2ndquadrant(dot)com> writes:
>>> The only major difference that I see so far and I'd like you to
>>> incorporate that into your patch is that I renamed the SampleScanCost to
>>> SampleScanGetRelSize because that reflects much better the use of it, it
>>> isn't really used for costing, but for getting the pages and tuples of
>>> the baserel.
>
>> Good suggestion. I was feeling vaguely uncomfortable with that name as
>> well, given what the functionality ended up being.
>
> After further thought it seemed like the right name to use is
> SampleScanGetSampleSize, so as to avoid confusion between the size of the
> relation and the size of the sample. (The planner is internally not
> making such a distinction right now, but that doesn't mean we should
> propagate that fuzzy thinking into the API spec.)
>

Right, sounds good to me.

> Attached is a more-or-less-final version of the proposed patch.
> Major changes since yesterday:
>
> * I worked over the contrib modules and docs.
>
> * I thought of a reasonably easy way to do something with nonrepeatable
> sampling methods inside join queries: we can simply wrap the SampleScan
> plan node in a Materialize node, which will guard it against being
> executed more than once. There might be a few corner cases where this
> doesn't work fully desirably, but in testing it seemed to do the right
> thing (and I added some regression tests about that). This seems
> certainly a better answer than either throwing an error or ignoring
> the problem.
>

Seems reasonable.

I was wondering if we should perhaps cache the output of GetTsmRoutine
as we call it up to 4 times in the planner now but it's relatively cheap
call (basically just makeNode) so it's probably not worth it. In any
case it's not something that should stop you from committing the patch.
Thanks for all your work on this.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2015-07-25 01:12:32 Re: fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?
Previous Message Mike Blackwell 2015-07-25 00:21:32 Re: [PROPOSAL] VACUUM Progress Checker.