From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Use simplehash.h instead of dynahash in SMgr |
Date: | 2021-04-26 06:43:48 |
Message-ID: | CAApHDvqK3XF2fowu22UYOyuyiJFrEpRtwTaZBSy33j_vygqaew@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 26 Apr 2021 at 05:03, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> If your test so sensitive to hash function speed, then I'd suggest
> to try something even simpler:
>
> static inline uint32
> relfilenodebackend_hash(RelFileNodeBackend *rnode)
> {
> uint32 h = 0;
> #define step(x) h ^= (uint32)(x) * 0x85ebca6b; h = pg_rotate_right(h,
> 11); h *= 9;
> step(rnode->node.relNode);
> step(rnode->node.spcNode); // spcNode could be different for same
> relNode only
> // during table movement. Does it pay
> to hash it?
> step(rnode->node.dbNode);
> step(rnode->backend); // does it matter to hash backend?
> // It equals to InvalidBackendId for
> non-temporary relations
> // and temporary relations in same
> database never have same
> // relNode (have they?).
> return murmurhash32(hashkey);
> }
I tried that and it got a median result of 113.795 seconds over 14
runs with this recovery benchmark test.
LOG: size: 4096, members: 2032, filled: 0.496094, total chain: 1014,
max chain: 6, avg chain: 0.499016, total_collisions: 428,
max_collisions: 3, avg_collisions: 0.210630
I also tried the following hash function just to see how much
performance might be left from speeding it up:
static inline uint32
relfilenodebackend_hash(RelFileNodeBackend *rnode)
{
uint32 h;
h = pg_rotate_right32((uint32) rnode->node.relNode, 16) ^ ((uint32)
rnode->node.dbNode);
return murmurhash32(h);
}
I got a median of 112.685 seconds over 14 runs with:
LOG: size: 4096, members: 2032, filled: 0.496094, total chain: 1044,
max chain: 7, avg chain: 0.513780, total_collisions: 438,
max_collisions: 3, avg_collisions: 0.215551
So it looks like there might not be too much left given that v2 was
113.375 seconds (median over 10 runs)
> I'd like to see benchmark code. It quite interesting this place became
> measurable at all.
Sure.
$ cat recoverybench_insert_hash.sh
#!/bin/bash
pg_ctl stop -D pgdata -m smart
pg_ctl start -D pgdata -l pg.log -w
psql -f setup1.sql postgres > /dev/null
psql -c "create table log_wal (lsn pg_lsn not null);" postgres > /dev/null
psql -c "insert into log_wal values(pg_current_wal_lsn());" postgres > /dev/null
psql -c "insert into hp select x,0 from generate_series(1,100000000)
x;" postgres > /dev/null
psql -c "insert into log_wal values(pg_current_wal_lsn());" postgres > /dev/null
psql -c "select 'Used ' ||
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), lsn)) || ' of
WAL' from log_wal limit 1;" postgres
pg_ctl stop -D pgdata -m immediate -w
echo Starting Postgres...
pg_ctl start -D pgdata -l pg.log
$ cat setup1.sql
drop table if exists hp;
create table hp (a int primary key, b int not null) partition by hash(a);
select 'create table hp'||x|| ' partition of hp for values with
(modulus 1000, remainder '||x||');' from generate_Series(0,999) x;
\gexec
config:
shared_buffers = 10GB
checkpoint_timeout = 60min
max_wal_size = 20GB
min_wal_size = 20GB
For subsequent runs, if you apply the patch that does the PANIC at the
end of recovery, you'll just need to start the database up again to
perform recovery again. You can then just tail -f on your postgres
logs to watch for the "redo done" message which will show you the time
spent doing recovery.
David.
From | Date | Subject | |
---|---|---|---|
Next Message | Yugo NAGATA | 2021-04-26 06:46:21 | Re: Implementing Incremental View Maintenance |
Previous Message | Peter Smith | 2021-04-26 06:28:54 | Re: logical replication empty transactions |