estimation problems for DISTINCT ON with FDW

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: estimation problems for DISTINCT ON with FDW
Date: 2020-06-29 03:23:04
Message-ID: CAMkU=1xNo9cnan+Npxgz0eK7394xmjmKg-QEm8wYG9P5-CcaqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

If I use the attached sql file to set up the database with loop-back
postgres_fdw, and then turn on use_remote_estimate for this query:

distinct on (id) id, z from fgn.priority order by id, priority desc,z

It issues two queries for the foreign estimate, one with a sort and one
without:

EXPLAIN SELECT id, priority, z FROM public.priority

EXPLAIN SELECT id, priority, z FROM public.priority ORDER BY id ASC NULLS
LAST, priority DESC NULLS FIRST, z ASC NULLS LAST

It doesn't cost out the plan of pushing the DISTINCT ON down to the foreign
side, which is probably the best way to run the query. I guess it makes
sense that FDW machinery in general doesn't want to try to push a
PostgreSQL specific construct.

But much worse than that, it horribly misestmates the number of unique rows
it will get back, having never asked the remote side for an estimate of
that.

Result (cost=100.51..88635.90 rows=1 width=16)
-> Unique (cost=100.51..88635.90 rows=1 width=16)
-> Foreign Scan on priority (cost=100.51..86135.90 rows=1000000
width=16)

Where does it come up with the idea that these 1,000,000 rows will
DISTINCT/Unique down to just 1 row? I can't find the place in the code
where that happens. I suspect it is happening somewhere in the core code
based on data fed into it by postgres_fdw, not in postgres_fdw itself.

This leads to horrible plans if the DISTINCT ON is actually in a subquery
which is joined to other tables, for example.

If you don't use the remote estimates, it at least comes up with a roughly
sane estimate of 200 distinct rows, which is enough to inhibit selection of
the worst plans. Why does an uninformative remote estimate do so much worse
than no remote estimate at all?

Of course I could just disable remote estimates for this table, but then
other queries that use the table without DISTINCT ON suffer. Another
solution is to ANALYZE the foreign table, but that opens up a can of worms
of its own.

I see this behavior in all supported or in-development versions.

Cheers,

Jeff

Attachment Content-Type Size
distinct_on_foreign_estimate.sql text/plain 695 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-06-29 03:34:25 Re: pgsql: Enable Unix-domain sockets support on Windows
Previous Message torikoshia 2020-06-29 03:01:55 Re: Creating a function for exposing memory usage of backend process