Quick Links

Re: Reducing the size of BufferTag & remodeling forks

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing the size of BufferTag & remodeling forks
Date:	2015-07-03 17:16:03
Message-ID:	20150703171603.GK3291@awork2.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2015-07-03 13:59:07 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
>
> > 2) Replace relation forks, with the exception of the init fork which is
> > special anyway, with separate relfilenodes. Stored in seperate
> > columns in pg_class.
>
> Different AMs have different fork needs; for heaps you want one main
> fork, one VM, one fsm. But for indexes, the VM fork is not necessary,
> and some AMs might want different ones. For instance, GIN would benefit
> from having separate forks to store the internal indexes, and BRIN would
> benefit from a separate fork for the revmap.
>
> What I'm saying is that I'm not sure it's okay to store the forks in
> pg_class columns

Right. Part of the point of this design is that you could easily add
further forks without system wide knowledge and that it is *not*
required anymore that all relfilenodes are in one column. I think it'd
probably make sense to have at least _vm and _fsm in pg_class, but we
could easily add further ones elsewhere.

> instead perhaps we should have a separate catalog on
> which each relation can have many forks, or perhaps have the pg_class
> entry store an array (ick). Or perhaps rather than "relvmfork" (the
> pg_class column for the relfilenode for the VM fork) we could store
> relfilenode1, relfilenode2 where the value for each N fork is
> AM-specific. (so for heaps 1 is main, 2 is FSM, 3 is VM; for BRIN 1 is
> main, 2 is revmap; and so forth).

None of these sound particularly pretty to me. An array of relfilenodes
would probably be the least ugly one.

> FWIW the whole idea seems reasonable to me. I worry about concurrent
> traffic into the pg_relfilenode shared table -- if temp table creation
> is common across many databases, is it going to become a contention
> point?

I don't think it'll be. It's essentially just inserts into a tiny table
with a single index, right? We can do a bootload of inserts into one of
these.

In an *assert enabled* build:
CREATE TABLE pg_relfilenode() WITH OIDS;
ALTER TABLE pg_relfilenode ADD CONSTRAINT pg_relfilenode_pkey PRIMARY KEY(oid);

which is pretty much how pg_relfilenode would look like. Although we'd
not go through the whole executor for the inserts.

pgbench -h localhost -p 5440 postgres -n -f /tmp/pg_relfilenode.sql -j 16 -c 16 -T 20 -P 1
cat /tmp/pg_relfilenode.sql:
INSERT INTO pg_relfilenode DEFAULT VALUES;
progress: 5.0 s, 32168.4 tps, lat 0.495 ms stddev 1.728
progress: 6.0 s, 33719.6 tps, lat 0.473 ms stddev 0.773

andres(at)awork2:~/src/postgresql$ pgbench -h localhost -p 5440 postgres -n -f /tmp/temptable.sql -j 16 -c 16 -T 20 -P 1
CREATE TEMPORARY TABLE blarg() ON COMMIT DROP;
progress: 6.0 s, 5018.2 tps, lat 3.185 ms stddev 3.671
progress: 7.0 s, 4890.9 tps, lat 3.272 ms stddev 4.346

and that's with zero actual columns. If you instead add some:
CREATE TEMPORARY TABLE blarg(id serial primary key, data text, blarg int) ON COMMIT DROP;
progress: 7.0 s, 974.1 tps, lat 16.462 ms stddev 9.058
progress: 8.0 s, 999.9 tps, lat 16.045 ms stddev 7.011

So no, I don't think this'll be a relevant problem. We do so many
inserts for a single temp table that a single insert into another one
completely vanishes in comparison.

In response to

Re: Reducing the size of BufferTag & remodeling forks at 2015-07-03 16:59:07 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2015-07-03 17:21:21	Re: WAL logging problem in 9.4.3?
Previous Message	Martijn van Oosterhout	2015-07-03 17:14:26	Re: WAL logging problem in 9.4.3?