From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Speeding Up Bitmapset |
Date: | 2023-06-26 00:55:04 |
Message-ID: | CAEudQAqArnSDczkz7a_wq9NCQFQ_r=6kHx_LODrOP+VVG0ywVA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Em dom., 25 de jun. de 2023 às 17:49, David Rowley <dgrowleyml(at)gmail(dot)com>
escreveu:
> There's no reason in the world
> that those will speed up Bitmapsets, so why include them?
>
Of course optimization is the most important thing,
but since you're going to touch the source, why not make it more readable.
>
> > v7 outperforms v4 by 29% (Query A)
>
> I tested v7 with query-a and I also see additional gains. However,
> it's entirely down to your changes to bms_is_subset(). It seems, by
> chance, with the given Bitmapsets that looping backwards for the given
> sets is able to determine the result more quickly
>
I redid the tests and it seems that most of the difference comes from
bms_subset.
>
> Here's some results from "perf top"
>
> query-a
> v4
>
> 30.08% postgres [.] bms_is_subset
> 15.84% postgres [.] create_join_clause
> 13.54% postgres [.] bms_equal
> 11.03% postgres [.] get_eclass_for_sort_expr
> 8.53% postgres [.] generate_implied_equalities_for_column
> 3.11% postgres [.] generate_join_implied_equalities_normal
> 1.03% postgres [.] add_child_rel_equivalences
> 0.82% postgres [.] SearchCatCacheInternal
> 0.73% postgres [.] AllocSetAlloc
> 0.53% postgres [.] find_ec_member_matching_expr
> 0.40% postgres [.] hash_search_with_hash_value
> 0.36% postgres [.] palloc
> 0.36% postgres [.] palloc0
>
> latency average = 452.480 ms
>
> v7
> 20.51% postgres [.] create_join_clause
> 15.33% postgres [.] bms_equal
> 14.17% postgres [.] get_eclass_for_sort_expr
> 12.05% postgres [.] bms_is_subset
> 10.40% postgres [.] generate_implied_equalities_for_column
> 3.90% postgres [.] generate_join_implied_equalities_normal
> 1.34% postgres [.] add_child_rel_equivalences
> 1.06% postgres [.] AllocSetAlloc
> 1.00% postgres [.] SearchCatCacheInternal
> 0.72% postgres [.] find_ec_member_matching_expr
> 0.58% postgres [.] palloc0
> 0.49% postgres [.] palloc
> 0.47% postgres [.] hash_search_with_hash_value
> 0.44% libc.so.6 [.] __memmove_avx_unaligned_erms
>
>
> latency average = 350.543 ms
>
> modified v7's bms_is_subset to go forwards then I get latency average
> = 445.987 ms.
>
> If I add some debugging to bms_is_subset to have it record how many
> words it checks, I see:
>
> postgres=# select sum(nwords) from forward;
> sum
> -----------
> 181490660
> (1 row)
>
> postgres=# select sum(nwords) from backwards;
> sum
> ----------
> 11322564
> (1 row)
>
> So, it took about 181 million loops in bms_is_member to plan query-a
> when looping forward, but only 11 million when looping backwards.
>
> I think unless you've got some reason that you're able to justify why
> we're always more likely to have to perform fewer word checks in
> bms_is_subset() by looping backwards instead of forwards then I think
> the performance gains you're showing here only happen to show better
> results due to the given workload. It's just as easy to imagine that
> you'll apply the equivalent slowdown for some other workload where the
> initial words differ but all the remaining words all match.
>
Have you seen bms_compare?
For some reason someone thought it would be better to loop backward the
array.
Since bms_subset performed very well with backward, I think that's a good
reason to leave it as bms_compare.
As in most cases, the array size is small, in general in both modes the
performance will be equivalent.
Anyway, I made a new patch v8, based on v4 but with some changes that I
believe improve it.
Windows 64 bits
msvc 2019 64 bits
== Query A ==
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\create-tables-a.sql
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\query-a.sql
=============
patched v4:
Time: 2305,445 ms (00:02,305)
Time: 2185,972 ms (00:02,186)
Time: 2177,434 ms (00:02,177)
Time: 2169,883 ms (00:02,170)
patched v8:
Time: 2143,532 ms (00:02,144)
Time: 2140,313 ms (00:02,140)
Time: 2138,481 ms (00:02,138)
Time: 2130,290 ms (00:02,130)
== Query B ==
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\create-tables-b.sql
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\query-b.sql
patched v4:
Time: 2684,360 ms (00:02,684)
Time: 2482,571 ms (00:02,483)
Time: 2452,699 ms (00:02,453)
Time: 2465,223 ms (00:02,465)
patched v8:
Time: 2493,281 ms (00:02,493)
Time: 2490,090 ms (00:02,490)
Time: 2432,515 ms (00:02,433)
Time: 2426,860 ms (00:02,427)
linux Ubuntu 64 bit
gcc 64 bits
== Query A ==
/usr/local/pgsql/bin/psql -U postgres -f
/home/ranier/Documentos/benchmarks/bitmapset/create-tables-a.sql
/usr/local/pgsql/bin/psql -U postgres -f
/home/ranier/Documentos/benchmarks/bitmapset/query-a.sql
=============
patched v4:
Time: 933,181 ms
Time: 931,520 ms
Time: 906,496 ms
Time: 872,446 ms
patched v8:
Time: 937,079 ms
Time: 930,408 ms
Time: 865,548 ms
Time: 865,382 ms
== Query B ==
/usr/local/pgsql/bin/psql -U postgres -f
/home/ranier/Documentos/benchmarks/bitmapset/create-tables-b.sql
/usr/local/pgsql/bin/psql -U postgres -f
/home/ranier/Documentos/benchmarks/bitmapset/query-b.sql
patched v4:
Time: 1581,317 ms (00:01,581)
Time: 1568,371 ms (00:01,568)
Time: 1468,036 ms (00:01,468)
Time: 1445,698 ms (00:01,446)
patched v8:
Time: 1437,997 ms (00:01,438)
Time: 1437,435 ms (00:01,437)
Time: 1440,422 ms (00:01,440)
Time: 1436,112 ms (00:01,436)
regards,
Ranier Vilela
Attachment | Content-Type | Size |
---|---|---|
speeding-up-bitmapset-v8.patch | application/octet-stream | 18.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2023-06-26 01:05:20 | Re: Row pattern recognition |
Previous Message | Alena Rybakina | 2023-06-25 23:39:50 | Re: eqjoinsel_semi still sucks ... |