From: | Yuya Watari <watari(dot)yuya(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Making empty Bitmapsets always be NULL |
Date: | 2023-03-16 01:30:57 |
Message-ID: | CAJ2pMkYmPti=8V46+tmQtGcqGXAeoaiRBLS=ekF8nC6OQO42sQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
On Tue, Mar 7, 2023 at 1:07 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I adjusted the patch to remove the invariant checks and fixed up a
> couple of things I'd missed. The 0002 patch changes the for loops
> into do while loops. I wanted to see if we could see any performance
> gains from doing this.
I noticed that these patches caused significant degradation while
working on improving planning performance in another thread [1].
In the experiment, I used the query attached to this email. This
workload consists of eight tables, each of which is split into n
partitions. The "query.sql" measures the planning time of a query that
joins these tables. You can quickly reproduce my experiment using the
following commands.
=====
psql -f create-tables.sql
psql -f query.sql
=====
I show the result in the following tables. I refer to David's patches
in [2] as the "trailing-zero" patch. When n was large, the
trailing-zero patch showed significant degradation. This is due to too
many calls of repalloc(). With this patch, we cannot reuse spaces
after the last non-zero bitmapword, so we need to call repalloc() more
frequently than before. When n is 256, repalloc() was called only 670
times without the patch, while it was called 5694 times with the
patch.
Table 1: Planning time (ms)
-----------------------------------------------------------------
n | Master | Patched (trailing-zero) | Patched (bitwise-OR)
-----------------------------------------------------------------
1 | 37.639 | 37.330 | 36.979
2 | 36.066 | 35.646 | 36.044
4 | 37.958 | 37.349 | 37.842
8 | 42.397 | 42.994 | 39.779
16 | 54.565 | 67.713 | 44.186
32 | 89.220 | 100.828 | 65.542
64 | 227.854 | 269.059 | 150.398
128 | 896.513 | 1279.965 | 577.671
256 | 4241.994 | 8220.508 | 2538.681
-----------------------------------------------------------------
Table 2: Planning time speedup (higher is better)
------------------------------------------------------
n | Patched (trailing-zero) | Patched (bitwise-OR)
------------------------------------------------------
1 | 100.8% | 101.8%
2 | 101.2% | 100.1%
4 | 101.6% | 100.3%
8 | 98.6% | 106.6%
16 | 80.6% | 123.5%
32 | 88.5% | 136.1%
64 | 84.7% | 151.5%
128 | 70.0% | 155.2%
256 | 51.6% | 167.1%
------------------------------------------------------
On Fri, Mar 3, 2023 at 10:52 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> The patch also optimizes sub-optimal newly added code which calls
> bms_is_empty_internal() when we have other more optimal means to
> determine if the set is empty or not.
However, I agree with David's opinion regarding the
bms_is_empty_internal() calls, which is quoted above. I have
implemented this optimization in a slightly different way than David.
My patch is attached to this email. The difference between my patch
and David's is in the determination method of whether the result is
empty: David's patch records the last index of non-zero bitmapword to
minimize the Bitmapset. If the index is -1, we can conclude that the
result is empty. In contrast, my patch uses a more lightweight
operation. I show my changes as follows.
=====
@@ -263,6 +261,7 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b)
const Bitmapset *other;
int resultlen;
int i;
+ bitmapword bitwise_or = 0;
/* Handle cases where either input is NULL */
if (a == NULL || b == NULL)
@@ -281,9 +280,17 @@ bms_intersect(const Bitmapset *a, const Bitmapset *b)
/* And intersect the longer input with the result */
resultlen = result->nwords;
for (i = 0; i < resultlen; i++)
- result->words[i] &= other->words[i];
+ {
+ bitmapword w = (result->words[i] &= other->words[i]);
+
+ /*
+ * Compute bitwise OR of all bitmapwords to determine if the result
+ * is empty
+ */
+ bitwise_or |= w;
+ }
/* If we computed an empty result, we must return NULL */
- if (bms_is_empty_internal(result))
+ if (bitwise_or == 0)
{
pfree(result);
return NULL;
@@ -711,30 +718,6 @@ bms_membership(const Bitmapset *a)
return result;
}
=====
My idea is to compute the bitwise OR of all bitmapwords of the result
Bitmapset. The bitwise OR can be represented as a single operation in
the machine code and does not require any conditional branches. If the
bitwise ORed value is zero, we can conclude the result Bitmapset is
empty. The costs related to this operation can be almost negligible;
it is significantly cheaper than calling bms_is_empty_internal() and
less expensive than using a conditional branch such as 'if.'
In the tables above, I called my patch the "bitwise-OR" patch. The
patch is much faster than the master when n is large. Its speed up
reached 167.1%. I think just adopting this optimization is worth
considering.
[1] https://www.postgresql.org/message-id/CAJ2pMkY10J_PA2jpH5M-VoOo6BvJnTOO_-t_znu_pOaP0q10pA@mail.gmail.com
[2] https://www.postgresql.org/message-id/CAApHDvq9eq0W_aFUGrb6ba28ieuQN4zM5Uwqxy7+LMZjJc+VGg@mail.gmail.com
--
Best regards,
Yuya Watari
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Remove-bms_is_empty_internal-function-calls.patch | application/octet-stream | 4.6 KB |
create-tables.sql | application/octet-stream | 915 bytes |
query.sql | application/octet-stream | 525 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2023-03-16 01:49:34 | Re: Add macros for ReorderBufferTXN toptxn |
Previous Message | Michael Paquier | 2023-03-16 01:10:51 | Re: recovery modules |