From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Why is DEFAULT_FDW_TUPLE_COST so insanely low? |
Date: | 2022-08-02 14:56:12 |
Message-ID: | CAApHDvopVjjfh5c1Ed2HRvDdfom2dEpMwwiu5-f1AnmYprJngA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Over on [1] I was complaining that I thought DEFAULT_FDW_TUPLE_COST,
which is defined as 0.01 was unrealistically low.
For comparison, cpu_tuple_cost, something we probably expect to be in
a CPU cache is also 0.01. We've defined DEFAULT_PARALLEL_TUPLE_COST
to be 0.1, which is 10x cpu_tuple_cost. That's coming from a shared
memory segment. So why do we think DEFAULT_FDW_TUPLE_COST should be
the same as cpu_tuple_cost when that's probably pulling a tuple from
some remote server over some (possibly slow) network?
I did a little experiment in the attached .sql file and did some maths
to try to figure out what it's really likely to be costing us. I tried
this with and without the attached hack to have the planner not
consider remote grouping just to see how much slower pulling a million
tuples through the FDW would cost.
I setup a loopback server on localhost (which has about the lowest
possible network latency) and found the patched query to the foreign
server took:
Execution Time: 530.000 ms
This is pulling all million tuples over and doing the aggregate locally.
Unpatched, the query took:
Execution Time: 35.334 ms
so about 15x faster.
If I take the seqscan cost for querying the local table, which is
14425.00 multiply that by 15 (the extra time it took to pull the 1
million tuples) then divide by 1 million to get the extra cost per
tuple, then that comes to about 0.216. So that says
DEFAULT_FDW_TUPLE_COST is about 21x lower than it should be.
I tried cranking DEFAULT_FDW_TUPLE_COST up to 0.5 to see what plans
would change in the postgres_fdw regression tests and quite a number
changed. Many seem to be pushing the sorts down to the remote server
where they were being done locally before. A few others just seem
weird. For example, the first one seems to be blindly adding a remote
sort when it does no good. I think it would take quite a bit of study
with a debugger to figure out what's going on with many of these.
Does anyone have any ideas why DEFAULT_FDW_TUPLE_COST was set so low?
Does anyone object to it being set to something more realistic?
David
Attachment | Content-Type | Size |
---|---|---|
disable_fdw_grouping_hack.patch | text/plain | 664 bytes |
fdw_experiment.sql | application/octet-stream | 909 bytes |
postgres_fdw.diff | text/plain | 11.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2022-08-02 14:59:05 | Re: Add proper planner support for ORDER BY / DISTINCT aggregates |
Previous Message | Noah Misch | 2022-08-02 14:37:27 | Re: Race between KeepFileRestoredFromArchive() and restartpoint |