From: | Hubert Zhang <hzhang(at)pivotal(dot)io> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Replace hashtable growEnable flag |
Date: | 2019-05-16 09:57:17 |
Message-ID: | CAB0yrenCNrA7GGg-JySPDHDyoahSFzCz=pUqOZxSiNSL5WNiYA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks Tomas.
I will follow this problem on your thread. This thread could be terminated.
On Thu, May 16, 2019 at 3:58 AM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:
> On Wed, May 15, 2019 at 06:19:38PM +0800, Hubert Zhang wrote:
> >Hi all,
> >
> >When we build hash table for a hash join node etc., we split tuples into
> >different hash buckets. Since tuples could not all be held in memory.
> >Postgres splits each bucket into batches, only the current batch of bucket
> >is in memory while other batches are written to disk.
> >
> >During ExecHashTableInsert(), if the memory cost exceeds the operator
> >allowed limit(hashtable->spaceAllowed), batches will be split on the fly
> by
> >calling ExecHashIncreaseNumBatches().
> >
> >In past, if data is distributed unevenly, the split of batch may
> failed(All
> >the tuples falls into one split batch and the other batch is empty) Then
> >Postgres will set hashtable->growEnable to false. And never expand batch
> >number any more.
> >
> >If tuples become diverse in future, spliting batch is still valuable and
> >could avoid the current batch become too big and finally OOM.
> >
> >To fix this, we introduce a penalty on hashtable->spaceAllowed, which is
> >the threshold to determine whether to increase batch number.
> >If batch split failed, we increase the penalty instead of just turn off
> the
> >growEnable flag.
> >
> >Any comments?
> >
>
> There's already another thread discussing various issues with how hashjoin
> increases the number of batches, including various issues with how/when we
> disable adding more batches.
>
> https://commitfest.postgresql.org/23/2109/
>
> In general I think you're right something like this is necessary, but I
> think we may need to rethink growEnable a bit more.
>
> For example, the way you implemented it, after reaching the increased
> limit, we just increase the number of batches just like today, and then
> decide whether it actually helped. But that means we double the number of
> BufFile entries, which uses more and more memory (because each is 8kB and
> we need 1 per batch). I think in this case (after increasing the limit) we
> should check whether increasing batches makes sense or not. And only do it
> if it helps. Otherwise we'll double the amount of memory for BufFile(s)
> and also the work_mem. That's not a good idea.
>
> But as I said, there are other issues discussed on the other thread. For
> example we only disable the growth when all rows fall into the same batch.
> But that's overly strict.
>
>
> regards
>
> --
> Tomas Vondra
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.2ndQuadrant.com&d=DwIBAg&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=lz-kpGdw_rtpgYV2ho3DjDSB5Psxis_b-3VZKON7K7c&m=y2bI6_b4EPRd9aTQGv9Pio3c_ZtCWs_jzKd4t8CtJEI&s=XHLORM8U7I6XR_EDkgSFtJDvhxIVd2rDA7r-xvJa278&e=
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
--
Thanks
Hubert Zhang
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2019-05-16 11:08:26 | Re: PSA: New intel MDS vulnerability mitigations cause measurable slowdown |
Previous Message | Fabien COELHO | 2019-05-16 08:43:08 | RE: psql - add SHOW_ALL_RESULTS option |