From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru> |
Subject: | Re: Making all nbtree entries unique by having heap TIDs participate in comparisons |
Date: | 2019-03-19 01:51:43 |
Message-ID: | CAH2-Wz=WY-ghVTdjtp7RxQgrz3bTkEUxveOv-ZM8+RcY9m4wSw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 18, 2019 at 5:12 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Smarter choices on page splits pay off with higher client counts
> because they reduce contention at likely hot points. It's kind of
> crazy that the code in _bt_check_unique() sometimes has to move right,
> while holding an exclusive buffer lock on the original page and a
> shared buffer lock on its sibling page at the same time. It then has
> to hold a third buffer lock concurrently, this time on any heap pages
> it is interested in.
Actually, by the time we get to 16 clients, this workload does make
the indexes and tables smaller. Here is pg_buffercache output
collected after the first 16 client case:
Master
======
relname │ relforknumber │
size_main_rel_fork_blocks │ buffer_count │ avg_buffer_usg
─────────────────────────────────────────┼───────────────┼───────────────────────────┼──────────────┼────────────────────────
pgbench_history │ 0 │
123,484 │ 123,484 │ 4.9989715266755207
pgbench_accounts │ 0 │
34,665 │ 10,682 │ 4.4948511514697622
pgbench_accounts_pkey │ 0 │
5,708 │ 1,561 │ 4.8731582319026265
pgbench_tellers │ 0 │
489 │ 489 │ 5.0000000000000000
pgbench_branches │ 0 │
284 │ 284 │ 5.0000000000000000
pgbench_tellers_pkey │ 0 │
56 │ 56 │ 5.0000000000000000
....
Patch
=====
relname │ relforknumber │
size_main_rel_fork_blocks │ buffer_count │ avg_buffer_usg
─────────────────────────────────────────┼───────────────┼───────────────────────────┼──────────────┼────────────────────────
pgbench_history │ 0 │
127,864 │ 127,864 │ 4.9980447975974473
pgbench_accounts │ 0 │
33,933 │ 9,614 │ 4.3517786561264822
pgbench_accounts_pkey │ 0 │
5,487 │ 1,322 │ 4.8857791225416036
pgbench_tellers │ 0 │
204 │ 204 │ 4.9803921568627451
pgbench_branches │ 0 │
198 │ 198 │ 4.3535353535353535
pgbench_tellers_pkey │ 0 │
14 │ 14 │ 5.0000000000000000
....
The main fork for pgbench_history is larger with the patch, obviously,
but that's good. pgbench_accounts_pkey is about 4% smaller, which is
probably the most interesting observation that can be made here, but
the tables are also smaller. pgbench_accounts itself is ~2% smaller.
pgbench_branches is ~30% smaller, and pgbench_tellers is 60% smaller.
Of course, the smaller tables were already very small, so maybe that
isn't important. I think that this is due to more effective pruning,
possibly because we get better lock arbitration as a consequence of
better splits, and also because duplicates are in heap TID order. I
haven't observed this effect with larger databases, which have been my
focus.
It isn't weird that shared_buffers doesn't have all the
pgbench_accounts blocks, since, of course, this is highly skewed by
design -- most blocks were never accessed from the table.
This effect seems to be robust, at least with this workload. The
second round of benchmarks (which have their own pgbench -i
initialization) show very similar amounts of bloat at the same point.
It may not be that significant, but it's also not a fluke.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuro Yamada | 2019-03-19 02:02:57 | Re: [HACKERS] CLUSTER command progress monitor |
Previous Message | Tatsuro Yamada | 2019-03-19 01:43:23 | Re: [HACKERS] CLUSTER command progress monitor |