| From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Cory Tucker <cory(dot)tucker(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
| Subject: | Re: [GENERAL] Query Using Massive Temp Space |
| Date: | 2017-11-21 21:18:40 |
| Message-ID: | CAEepm=3e9RKr3DCv0=nGWiNTPs7_SqfqLt=pc83M03kN0KuBkA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
On Wed, Nov 22, 2017 at 7:04 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Now, there's definitely something busted here; it should not have gone as
> far as 2 million batches before giving up on splitting.
I had been meaning to discuss this. We only give up when we reach the
point when a batch is entirely entirely kept or sent to a new batch
(ie splitting the batch resulted in one batch with the whole contents
and another empty batch). If you have about 2 million evenly
distributed keys and an ideal hash function, and then you also have 42
billion keys that are the same (and exceed work_mem), we won't detect
extreme skew until the 2 million well behaved keys have been spread so
thin that the 42 billion keys are isolated in a batch on their own,
which we should expect to happen somewhere around 2 million batches.
I have wondered if our extreme skew detector needs to go off sooner.
I don't have a specific suggestion, but it could just be something
like 'you threw out or kept more than X% of the tuples'.
--
Thomas Munro
http://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Torsten Förtsch | 2017-11-21 21:35:25 | dblink surprise |
| Previous Message | Andrew Sullivan | 2017-11-21 20:27:31 | Re: migrations (was Re: To all who wish to unsubscribe) |