From: | Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: DBT-3 with SF=20 got failed |
Date: | 2015-08-21 00:28:41 |
Message-ID: | 9A28C8860F777E439AA12E8AEA7694F801136458@BPXM15GP.gisp.nec.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On 08/19/2015 03:53 PM, Tom Lane wrote:
> >
> > I don't see anything very wrong with constraining the initial
> > allocation to 1GB, or even less. That will prevent consuming insane
> > amounts of work_mem when the planner's rows estimate is too high
> > rather than too low. And we do have the ability to increase the hash
> > table size on the fly.
>
> Perhaps. Limiting the initial allocation to 1GB might help with lowering
> memory consumed in case of over-estimation, and it's high enough so that
> regular queries don't run into that.
>
> My initial thought was "if the memory consumption is a problem, lower
> the work_mem" but after thinking about it for a while I don't see a
> reason to limit the memory consumption this way, if possible.
>
> But what is the impact on queries that actually need more than 1GB of
> buckets? I assume we'd only limit the initial allocation and still allow
> the resize based on the actual data (i.e. the 9.5 improvement), so the
> queries would start with 1GB and then resize once finding out the
> optimal size (as done in 9.5). The resize is not very expensive, but
> it's not free either, and with so many tuples (requiring more than 1GB
> of buckets, i.e. ~130M tuples) it's probably just a noise in the total
> query runtime. But I'd be nice to see some proofs of that ...
>
The problem here is we cannot know exact size unless Hash node doesn't
read entire inner relation. All we can do is relying planner's estimation,
however, it often computes a crazy number of rows.
I think resizing of hash buckets is a reasonable compromise.
> Also, why 1GB and not 512MB (which is effectively the current limit on
> 9.4, because we can't allocate 1GB there so we end up with 1/2 of that)?
> Why not some small percentage of work_mem, e.g. 5%?
>
No clear reason at this moment.
> Most importantly, this is mostly orthogonal to the original bug report.
> Even if we do this, we still have to fix the palloc after the resize.
>
> So I'd say let's do a minimal bugfix of the actual issue, and then
> perhaps improve the behavior in case of significant overestimates. The
> bugfix should happen ASAP so that it gets into 9.5.0 (and backported).
> We already have patches for that.
>
> Limiting the initial allocation deserves more thorough discussion and
> testing, and I'd argue it's 9.6 material at this point.
>
Indeed, the original bug was caused by normal MemoryContextAlloc().
The issue around memory over consumption is a problem newly caused
by this. I didn't notice it two months before.
> > The real problem here is the potential integer overflow in
> > ExecChooseHashTableSize. Now that we know there is one, that should
> > be fixed (and not only in HEAD/9.5).
>
> I don't think there's any integer overflow. The problem is that we end
> up with nbuckets so high that (nbuckets*8) overflows 1GB-1.
>
> There's a protection against integer overflow in place (it was there for
> a long time), but there never was any protection against the 1GB limit.
> Because we've been using much smaller work_mem values and
> NTUP_PER_BUCKET=10, so we could not actually reach it.
>
> > But I believe Kaigai-san withdrew his initial proposed patch, and we
> > don't have a replacementas far as I saw.
>
> I believe the patch proposed by KaiGai-san is the right one to fix the
> bug discussed in this thread. My understanding is KaiGai-san withdrew
> the patch as he wants to extend it to address the over-estimation issue.
>
> I don't think we should do that - IMHO that's an unrelated improvement
> and should be addressed in a separate patch.
>
OK, it might not be a problem we should conclude within a few days, just
before the beta release.
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2015-08-21 01:00:00 | Re: statistics for array types |
Previous Message | Greg Stark | 2015-08-21 00:02:50 | Re: Using quicksort for every external sort run |