Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: b8flowerfire <b8flowerfire(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller
Date: 2014-06-10 14:58:34
Message-ID: CA+TgmoZWzuX-KgKCa7hH0p2Fq9EZ6FMx5gX6z1NnnzwGFQ+sFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 10, 2014 at 10:46 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Jun 10, 2014 at 10:27 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I don't really recall any hard numbers being provided. I think if we
>>> looked at some results that said "here's the average gain, and here's
>>> the worst-case loss, and here's an estimate of how often you'd hit
>>> the worst case", then we could make a decision.
>
>> The worst case loss is that you have to rescan the entire inner
>> relation, so it's pretty darned bad. I'm not sure how to construct an
>> optimal worst case fot that being monumentally expensive, but making
>> the inner relation gigantic is probably a good start.
>
> "Rescan"? I'm pretty sure there are no cases where nodeHash reads the
> inner relation more than once. If you mean dumping to disk vs not dumping
> to disk, yeah, that's a big increment in the cost.

Sorry, that's what I meant.

>> If we could allow NTUP_PER_BUCKET to drop when the hashtable is
>> expected to fit in memory either way, perhaps with some safety margin
>> (e.g. we expect to use less than 75% of work_mem), I bet that would
>> make the people who have been complaining about this issue happy.
>
> Could be a good plan. We still need some test results though.

Sounds reasonable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-06-10 15:04:52 Re: /proc/self/oom_adj is deprecated in newer Linux kernels
Previous Message Robert Haas 2014-06-10 14:56:48 Re: /proc/self/oom_adj is deprecated in newer Linux kernels