Quick Links

Re: DBT-3 with SF=20 got failed

From:	Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: DBT-3 with SF=20 got failed
Date:	2015-08-21 00:28:41
Message-ID:	9A28C8860F777E439AA12E8AEA7694F801136458@BPXM15GP.gisp.nec.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> On 08/19/2015 03:53 PM, Tom Lane wrote:
> >
> > I don't see anything very wrong with constraining the initial
> > allocation to 1GB, or even less. That will prevent consuming insane
> > amounts of work_mem when the planner's rows estimate is too high
> > rather than too low. And we do have the ability to increase the hash
> > table size on the fly.
>
> Perhaps. Limiting the initial allocation to 1GB might help with lowering
> memory consumed in case of over-estimation, and it's high enough so that
> regular queries don't run into that.
>
> My initial thought was "if the memory consumption is a problem, lower
> the work_mem" but after thinking about it for a while I don't see a
> reason to limit the memory consumption this way, if possible.
>
> But what is the impact on queries that actually need more than 1GB of
> buckets? I assume we'd only limit the initial allocation and still allow
> the resize based on the actual data (i.e. the 9.5 improvement), so the
> queries would start with 1GB and then resize once finding out the
> optimal size (as done in 9.5). The resize is not very expensive, but
> it's not free either, and with so many tuples (requiring more than 1GB
> of buckets, i.e. ~130M tuples) it's probably just a noise in the total
> query runtime. But I'd be nice to see some proofs of that ...
>
The problem here is we cannot know exact size unless Hash node doesn't
read entire inner relation. All we can do is relying planner's estimation,
however, it often computes a crazy number of rows.
I think resizing of hash buckets is a reasonable compromise.

> Also, why 1GB and not 512MB (which is effectively the current limit on
> 9.4, because we can't allocate 1GB there so we end up with 1/2 of that)?
> Why not some small percentage of work_mem, e.g. 5%?
>
No clear reason at this moment.

> Most importantly, this is mostly orthogonal to the original bug report.
> Even if we do this, we still have to fix the palloc after the resize.
>
> So I'd say let's do a minimal bugfix of the actual issue, and then
> perhaps improve the behavior in case of significant overestimates. The
> bugfix should happen ASAP so that it gets into 9.5.0 (and backported).
> We already have patches for that.
>
> Limiting the initial allocation deserves more thorough discussion and
> testing, and I'd argue it's 9.6 material at this point.
>
Indeed, the original bug was caused by normal MemoryContextAlloc().
The issue around memory over consumption is a problem newly caused
by this. I didn't notice it two months before.

> > The real problem here is the potential integer overflow in
> > ExecChooseHashTableSize. Now that we know there is one, that should
> > be fixed (and not only in HEAD/9.5).
>
> I don't think there's any integer overflow. The problem is that we end
> up with nbuckets so high that (nbuckets*8) overflows 1GB-1.
>
> There's a protection against integer overflow in place (it was there for
> a long time), but there never was any protection against the 1GB limit.
> Because we've been using much smaller work_mem values and
> NTUP_PER_BUCKET=10, so we could not actually reach it.
>
> > But I believe Kaigai-san withdrew his initial proposed patch, and we
> > don't have a replacementas far as I saw.
>
> I believe the patch proposed by KaiGai-san is the right one to fix the
> bug discussed in this thread. My understanding is KaiGai-san withdrew
> the patch as he wants to extend it to address the over-estimation issue.
>
> I don't think we should do that - IMHO that's an unrelated improvement
> and should be addressed in a separate patch.
>
OK, it might not be a problem we should conclude within a few days, just
before the beta release.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Re: DBT-3 with SF=20 got failed at 2015-08-20 20:12:53 from Tomas Vondra

Responses

Re: DBT-3 with SF=20 got failed at 2015-08-21 19:54:47 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2015-08-21 01:00:00	Re: statistics for array types
Previous Message	Greg Stark	2015-08-21 00:02:50	Re: Using quicksort for every external sort run