Re: estimation problems for DISTINCT ON with FDW

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: estimation problems for DISTINCT ON with FDW
Date: 2020-06-30 17:13:43
Message-ID: 491032.1593537223@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> It doesn't cost out the plan of pushing the DISTINCT ON down to the foreign
> side, which is probably the best way to run the query. I guess it makes
> sense that FDW machinery in general doesn't want to try to push a
> PostgreSQL specific construct.

Well, that's an unimplemented feature anyway. But people hared off after
that without addressing your actual bug report:

> But much worse than that, it horribly misestmates the number of unique rows
> it will get back, having never asked the remote side for an estimate of
> that.

I poked into that and found that the problem is in estimate_num_groups,
which effectively just disregards any relation that has rel->tuples = 0.
That is the case for a postgres_fdw foreign table if use_remote_estimate
is true, because postgres_fdw never bothers to set any other value.
(On the other hand, if use_remote_estimate is false, it does fill in a
pretty-bogus value, mainly so it can use set_baserel_size_estimates.
See postgresGetForeignRelSize.)

It seems like we could make estimate_num_groups a bit more robust here;
it could just skip its attempts to clamp based on total size or
restriction selectivity, but still include the reldistinct value for the
rel into the total numdistinct. I wonder though if this is the only
problem caused by failing to fill in any value for rel->tuples ...
should we make postgres_fdw install some value for that?

(Note that the question of whether we should ask the remote server for
an estimate of ndistinct is kind of orthogonal to any of these points.
Even if we had obtained one that way, estimate_num_groups would not pay
any attention to it without a fix for the point at hand.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-06-30 17:49:04 SQL-standard function body
Previous Message Tomas Vondra 2020-06-30 16:53:37 Re: POC: postgres_fdw insert batching