From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | James Coleman <jtc331(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays |
Date: | 2021-04-08 21:32:13 |
Message-ID: | ab5b8dc0-3270-e13c-859f-463b9b9237c3@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 4/8/21 2:00 PM, David Rowley wrote:
> On Thu, 8 Apr 2021 at 22:54, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>>
>> I think the changes in the patch are fairly isolated and the test
>> coverage is now pretty good. I'm planning on looking at the patch
>> again now and will consider pushing it for PG14.
>
> I push this with some minor cleanup from the v6 patch I posted earlier.
>
I ran the same set of benchmarks on the v6, which I think should be
mostly identical to what was committed. I extended the test a bit to
test table with 0, 1, 5 and 1000 rows, and also with int and text
values, to see how it works with more expensive comparators.
I built binaries with gcc 9.2.0 and clang 11.0.0, the full results are
attached. There's a bit of difference between gcc and clang, but the
general behavior is about the same, so I'll only present gcc results to
keep this simple. I'll only throughput comparison to master, so >1.0
means good, <1.0 means bad. If you're interested in actual tps, see the
full results.
For the v5 patch (actually v4-0001 + v5-0002) and v6, the results are:
integer column / v5
===================
rows 10/p 100/p 16/p 10/s 100/s 16/s
-----------------------------------------------------------
0 97% 97% 97% 107% 126% 108%
1 95% 82% 94% 108% 132% 110%
5 95% 83% 95% 108% 132% 110%
1000 129% 481% 171% 131% 382% 165%
integer column / v6
===================
rows 10/p 100/p 16/p 10/s 100/s 16/s
-----------------------------------------------------------
0 97% 97% 97% 98% 98% 98%
1 96% 84% 95% 97% 97% 98%
5 97% 85% 96% 98% 97% 97%
1000 129% 489% 172% 128% 330% 162%
text column / v5
================
rows 10/p 100/p 16/p 10/s 100/s 16/s
-----------------------------------------------------------
0 100% 100% 100% 106% 119% 108%
1 96% 81% 95% 107% 120% 109%
5 97% 82% 96% 107% 121% 109%
1000 291% 1622% 402% 255% 1092% 337%
text column / v6
================
rows 10/p 100/p 16/p 10/s 100/s 16/s
-----------------------------------------------------------
0 101% 101% 101% 98% 99% 99%
1 98% 82% 96% 98% 96% 97%
5 100% 84% 98% 98% 96% 98%
1000 297% 1645% 408% 255% 1000% 336%
Overall, the behavior for integer and text columns is the same, for both
patches. There's a couple interesting observations:
1) For the "simple" query mode, v5 helped quite a bit (20-30% speedup),
but v6 does not seem to help at all - it's either same or slower than
unpatched master.
I wonder why is that, and if we could get some of the speedup with v6?
At first I thought that maybe v5 is not building the hash table in cases
where v6 does, but that shouldn't be faster than master.
2) For the "prepared" mode, there's a clear performance hit the longer
the array is (for both v5 and v6). For 100 elements it's about 15%,
which is not great.
I think the reason is fairly simple - building the hash table is not
free, and with few rows it's not worth it - it'd be faster to just
search the array directly. Unfortunately, the logic that makes the
decision to switch to hashing only looks at the array length only, and
ignores the number of rows entirely. So I think if we want to address
this, convert_saop_to_hashed_saop needs to compare
has_build_cost + nrows * hash_lookup_cost
and
nrows * linear_lookup_cost
to make reasonable decision.
I was thinking that maybe we can ignore this, because people probably
have much larger tables in practice. But I'm not sure that's really
true, because there may be other quals and it's possible the preceding
ones are quite selective, filtering most of the rows.
I'm not sure how much of the necessary information we have available in
convert_saop_to_hashed_saop_walker, though :-( I suppose we know the
number of input rows for that plan node, not sure about selectivity of
the other quals, though.
It's also a bit strange that we get speedup for "simple" protocol, while
for "prepared" it gets slower. That seems counter-intuitive, because why
should we see opposite outcomes in those cases? I'd assume that we'll
see either speedup or slowdown in both cases, with the relative change
being more significant in the "prepared" mode.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment | Content-Type | Size |
---|---|---|
saop gcc.ods | application/vnd.oasis.opendocument.spreadsheet | 34.9 KB |
saop llvm.ods | application/vnd.oasis.opendocument.spreadsheet | 26.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2021-04-08 21:38:13 | Re: [PATCH] force_parallel_mode and GUC categories |
Previous Message | Justin Pryzby | 2021-04-08 21:30:51 | Re: Autovacuum on partitioned table (autoanalyze) |