From: | Paul Jungwirth <pj(at)illuminatedcomputing(dot)com> |
---|---|
To: | jian he <jian(dot)universality(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Improve rowcount estimate for UNNEST(column) |
Date: | 2023-12-07 06:32:07 |
Message-ID: | 166ae46a-9f77-4a72-987c-54fc3e63b3c6@illuminatedcomputing.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
On 11/26/23 12:22, Tom Lane wrote:
> Yes, this regression test is entirely unacceptable; the numbers will
> not be stable enough. Even aside from the different-settings issue,
> you can't rely on ANALYZE deriving exactly the same stats every time.
> Usually what we try to do is devise a query where the plan shape
> changes because of the better estimate.
Here is a patch with an improved test. With the old "10" estimate we get a Merge Join, but now that
the planner can see there are only ~4 elements per array, we get a Nested Loop.
It was actually hard to get a new plan, since all our regress tables' arrays have around 5 elements
average, which isn't so far from 10. Adding a table with 1- or 2- element arrays would be more
dramatic. So I resorted to tuning the query with `WHERE seqno <= 50`. Hopefully that's not cheating
too much.
I thought about also adding a test where the old code *underestimates*, but then I'd have to add a
new table with big arrays. If it's worth it let me know.
On 11/26/23 23:05, jian he wrote:
> I found using table array_op_test test more convincing.
True, arrtest is pretty small. The new test uses array_op_test instead.
On 11/29/23 20:35, jian he wrote:
> I did a minor change. change estimate_array_length return type to double
I'm not sure I want to change estimate_array_length from returning ints to returning doubles, since
it's called in many places. I can see how that might give plans that are more accurate yet, so maybe
it's worth it? It feels like it ought to be a separate patch to me, but if others want me to include
it here please let me know.
I did add clamp_row_est since it's essentially free and maybe gives some safety.
Rebased onto d43bd090a8.
Yours,
--
Paul ~{:-)
pj(at)illuminatedcomputing(dot)com
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Use-statitics-for-estimating-UNNEST-column-rows.patch | text/x-patch | 8.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2023-12-07 06:33:13 | Re: catalog access with reset GUCs during parallel worker startup |
Previous Message | David Rowley | 2023-12-07 06:29:51 | Re: catalog access with reset GUCs during parallel worker startup |