| From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> | 
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | estimation problems for DISTINCT ON with FDW | 
| Date: | 2020-06-29 03:23:04 | 
| Message-ID: | CAMkU=1xNo9cnan+Npxgz0eK7394xmjmKg-QEm8wYG9P5-CcaqQ@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
If I use the attached sql file to set up the database with loop-back
postgres_fdw, and then turn on use_remote_estimate for this query:
distinct on (id) id, z from fgn.priority order by id, priority desc,z
It issues two queries for the foreign estimate, one with a sort and one
without:
EXPLAIN SELECT id, priority, z FROM public.priority
EXPLAIN SELECT id, priority, z FROM public.priority ORDER BY id ASC NULLS
LAST, priority DESC NULLS FIRST, z ASC NULLS LAST
It doesn't cost out the plan of pushing the DISTINCT ON down to the foreign
side, which is probably the best way to run the query.  I guess it makes
sense that FDW machinery in general doesn't want to try to push a
PostgreSQL specific construct.
But much worse than that, it horribly misestmates the number of unique rows
it will get back, having never asked the remote side for an estimate of
that.
 Result  (cost=100.51..88635.90 rows=1 width=16)
   ->  Unique  (cost=100.51..88635.90 rows=1 width=16)
         ->  Foreign Scan on priority  (cost=100.51..86135.90 rows=1000000
width=16)
Where does it come up with the idea that these 1,000,000 rows will
DISTINCT/Unique down to just 1 row?   I can't find the place in the code
where that happens.  I suspect it is happening somewhere in the core code
based on data fed into it by postgres_fdw, not in postgres_fdw itself.
This leads to horrible plans if the DISTINCT ON is actually in a subquery
which is joined to other tables, for example.
If you don't use the remote estimates, it at least comes up with a roughly
sane estimate of 200 distinct rows, which is enough to inhibit selection of
the worst plans. Why does an uninformative remote estimate do so much worse
than no remote estimate at all?
Of course I could just disable remote estimates for this table, but then
other queries that use the table without DISTINCT ON suffer.  Another
solution is to ANALYZE the foreign table, but that opens up a can of worms
of its own.
I see this behavior in all supported or in-development versions.
Cheers,
Jeff
| Attachment | Content-Type | Size | 
|---|---|---|
| distinct_on_foreign_estimate.sql | text/plain | 695 bytes | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2020-06-29 03:34:25 | Re: pgsql: Enable Unix-domain sockets support on Windows | 
| Previous Message | torikoshia | 2020-06-29 03:01:55 | Re: Creating a function for exposing memory usage of backend process |