Quick Links

Re: FETCH FIRST clause WITH TIES option

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Surafel Temesgen <surafel3000(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, andrew(at)tao11(dot)riddles(dot)org(dot)uk, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: FETCH FIRST clause WITH TIES option
Date:	2019-01-15 11:51:29
Message-ID:	c1430163-a5b0-2adb-c81c-29d1029cb8cf@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 1/15/19 11:07 AM, Surafel Temesgen wrote:
>
>
> On Wed, Jan 2, 2019 at 6:19 PM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>> wrote:
>
> After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this
> patch should tweak estimates in some way. Currently, the cardinality
> estimate is the same as for plain LIMIT, using the requested number of
> rows. But let's say there are very few large groups - that will
> naturally increase the number of rows produced.
>
> As an example, let's say the subplan produces 1M rows, and there are
> 1000 groups (when split according to the ORDER BY clause).
>
>
>
>
> can we use ORDER BY column raw statistic in limit node reliably?
> because it seems to me it can be affected by other operation in a
> subplan like filter condition
>

What do you mean by "raw statistic"? Using stats from the underlying
table is not quite possible, because you might be operating on top of
join or something like that.

IMHO the best thing you can do is call estimate_num_groups() and combine
that with the number of input rows. That shall benefit from ndistinct
coefficients when available, etc. I've been thinking that considering
the unreliability of grouping estimates we should use a multiple of the
average size (because there may be much larger groups), but I think
that's quite unprecipled and I'd much rather try without it first.

But maybe we can do better when there really is a single table to
consider, in which case we might look at MCV lists and estimate the
largest group. That would give us a much better idea of the worst case.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: FETCH FIRST clause WITH TIES option at 2019-01-15 10:07:58 from Surafel Temesgen

Responses

Re: FETCH FIRST clause WITH TIES option at 2019-01-16 08:45:46 from Surafel Temesgen

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dmitry Dolgov	2019-01-15 12:29:30	Re: Pluggable Storage - Andres's take
Previous Message	Peter Eisentraut	2019-01-15 11:06:48	Re: SPI Interface to Call Procedure with Transaction Control Statements?