From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Poor memory context performance in large hash joins |
Date: | 2017-02-23 22:13:19 |
Message-ID: | CAMkU=1x1hvue1XYrZoWk_omG0Ja5nBvTdvgrOeVkkeqs71CV8g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
When doing a hash join with large work_mem, you can have a large number of
chunks. Then if ExecHashIncreaseNumBatches gets called, those chunks are
walked through, moving the tuples to new chunks (or to disk, if they no
longer match the batch's bitmask), and freeing the old chunks.
The number of new chunks can be almost as as large as the number of old
chunks, especially if there is a very popular value. The problem is that
every time an old chunk is freed, the code in aset.c around line 968 has to
walk over all the newly allocated chunks in the linked list before it can
find the old one being freed. This is an N^2 operation, and I think it has
horrible CPU cache hit rates as well.
Is there a good solution to this? Could the new chunks be put in a
different memory context, and then destroy the old context and install the
new one at the end of ExecHashIncreaseNumBatches? I couldn't find a destroy
method for memory contexts, it looks like you just reset the parent
instead. But I don't think that would work here.
Thanks,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2017-02-23 22:15:18 | Re: Poor memory context performance in large hash joins |
Previous Message | Jim Nasby | 2017-02-23 21:56:41 | Re: Faster methods for getting SPI results (460% improvement) |