From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Dimitrios Apostolou <jimis(at)gmx(dot)net> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Why is a hash join preferred when it does not fit in work_mem |
Date: | 2023-01-13 01:38:26 |
Message-ID: | CAApHDvpPVydoNkEqLyBSbjWq8kq8M7YWdkA44rTeA2MNaO3jsw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, 13 Jan 2023 at 07:33, Dimitrios Apostolou <jimis(at)gmx(dot)net> wrote:
>
> I have a very simple NATURAL JOIN that does not fit in the work_mem. Why
> does the query planner prefer a hash join that needs 361s, while with a
> sort operation and a merge join it takes only 13s?
It's a simple matter of that the Hash Join plan appears cheaper based
on the costs that the planner has calculated.
A better question to ask would be, where are the costs inaccurate? and why.
One thing I noticed in your EXPLAIN ANALYZE output is that the Index
Scan to workitems_ids costed more expensively than the Seq scan, yet
was faster.
> -> Seq Scan on public.workitem_ids (cost=0.00..59780.19 rows=1373719 width=237) (actual time=0.026..1912.312 rows=1373737 loops=1)
> -> Index Scan using workitem_ids_pkey on public.workitem_ids (cost=0.43..81815.86 rows=1373719 width=237) (actual time=0.111..1218.363 rows=1373737 loops=1)
Perhaps the Seq scan is doing more actual I/O than the index scan is.
> The low work_mem and the disabled memoization are set on purpose, in order
> to simplify a complex query, while reproducing the same problem that I
> experienced there. This result is the simplest query I could get, where
> the optimizer does not go for a faster merge join.
>
> From my point of view a merge join is clearly faster, because the hash
> table does not fit in memory and I expect a hash join to do a lot of
> random I/O. But the query planner does not see that, and increasing
> random_page_cost does not help either. In fact the opposite happens: the
> merge join gets a higher cost difference to the hash join, as I increase
> the random page cost!
I'd expect reducing random_page_cost to make the Mege Join cheaper as
that's where the Index Scan is. I'm not quite sure where you think the
random I/O is coming from in a batched hash join.
It would be interesting to see the same plans with SET track_io_timing
= on; set. It's possible that there's less *actual* I/O going on with
the Merge Join plan vs the Hash Join plan. Since we do buffered I/O,
without track_io_timing, we don't know if the read buffers resulted in
an actual disk read or a read from the kernel buffers.
David
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2023-01-13 06:33:26 | Re: Support logical replication of DDLs |
Previous Message | David G. Johnston | 2023-01-12 18:41:29 | Re: gexec from command prompt? |