From: | "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: POC: GROUP BY optimization |
Date: | 2022-01-20 05:05:57 |
Message-ID: | dc933c66-41b9-8907-327e-f6361a690654@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I keep work on this patch. Here is intermediate results.
On 7/22/21 3:58 AM, Tomas Vondra wrote:
> in the first loop. Which seems pretty bogus - why would there be just
> two groups? When processing the first expression, it's as if there was
> one big "prev group" with all the tuples, so why not to just use nGroups
> as it is?
I think, heapsort code seems very strange. Look into fallback case. It
based on an output_tuples value. Maybe we should use nGroups value here,
but based on a number of output_tuples?
> 1) I looked at the resources mentioned as sources the formulas came
> from, but I've been unable to really match the algorithm to them. The
> quicksort paper is particularly "dense", the notation seems to be very
> different, and none of the theorems seem like an obvious fit. Would be
> good to make the relationship clearer in comments etc.
Fixed (See attachment).
> 3) I'm getting a bit skeptical about the various magic coefficients that
> are meant to model higher costs with non-uniform distribution. But
> consider that we do this, for example:
>
> tuplesPerPrevGroup = ceil(1.5 * tuplesPerPrevGroup / nGroups);
>
> but then in the next loop we call estimate_num_groups_incremental and
> pass this "tweaked" tuplesPerPrevGroup value to it. I'm pretty sure this
> may have various strange consequences - we'll calculate the nGroups
> based on the inflated value, and we'll calculate tuplesPerPrevGroup from
> that again - that seems susceptible to "amplification".
>
> We inflate tuplesPerPrevGroup by 50%, which means we'll get a higher
> nGroups estimate in the next loop - but not linearly. An then we'll
> calculate the inflated tuplesPerPrevGroup and estimated nGroup ...
Weighting coefficient '1.5' shows our desire to minimize the number of
comparison operations on each next attribute of a pathkeys list.
Increasing this coef we increase a chance, that planner will order
pathkeys by decreasing of uniqueness.
I think, it's ok.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachment | Content-Type | Size |
---|---|---|
fixes.txt | text/plain | 7.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2022-01-20 05:32:43 | Re: Refactoring of compression options in pg_basebackup |
Previous Message | Michael Paquier | 2022-01-20 04:38:45 | Re: pg_upgrade should truncate/remove its logs before running |