From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Reducing the size of BufferTag & remodeling forks |
Date: | 2015-07-03 17:16:03 |
Message-ID: | 20150703171603.GK3291@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2015-07-03 13:59:07 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
>
> > 2) Replace relation forks, with the exception of the init fork which is
> > special anyway, with separate relfilenodes. Stored in seperate
> > columns in pg_class.
>
> Different AMs have different fork needs; for heaps you want one main
> fork, one VM, one fsm. But for indexes, the VM fork is not necessary,
> and some AMs might want different ones. For instance, GIN would benefit
> from having separate forks to store the internal indexes, and BRIN would
> benefit from a separate fork for the revmap.
>
> What I'm saying is that I'm not sure it's okay to store the forks in
> pg_class columns
Right. Part of the point of this design is that you could easily add
further forks without system wide knowledge and that it is *not*
required anymore that all relfilenodes are in one column. I think it'd
probably make sense to have at least _vm and _fsm in pg_class, but we
could easily add further ones elsewhere.
> instead perhaps we should have a separate catalog on
> which each relation can have many forks, or perhaps have the pg_class
> entry store an array (ick). Or perhaps rather than "relvmfork" (the
> pg_class column for the relfilenode for the VM fork) we could store
> relfilenode1, relfilenode2 where the value for each N fork is
> AM-specific. (so for heaps 1 is main, 2 is FSM, 3 is VM; for BRIN 1 is
> main, 2 is revmap; and so forth).
None of these sound particularly pretty to me. An array of relfilenodes
would probably be the least ugly one.
> FWIW the whole idea seems reasonable to me. I worry about concurrent
> traffic into the pg_relfilenode shared table -- if temp table creation
> is common across many databases, is it going to become a contention
> point?
I don't think it'll be. It's essentially just inserts into a tiny table
with a single index, right? We can do a bootload of inserts into one of
these.
In an *assert enabled* build:
CREATE TABLE pg_relfilenode() WITH OIDS;
ALTER TABLE pg_relfilenode ADD CONSTRAINT pg_relfilenode_pkey PRIMARY KEY(oid);
which is pretty much how pg_relfilenode would look like. Although we'd
not go through the whole executor for the inserts.
pgbench -h localhost -p 5440 postgres -n -f /tmp/pg_relfilenode.sql -j 16 -c 16 -T 20 -P 1
cat /tmp/pg_relfilenode.sql:
INSERT INTO pg_relfilenode DEFAULT VALUES;
progress: 5.0 s, 32168.4 tps, lat 0.495 ms stddev 1.728
progress: 6.0 s, 33719.6 tps, lat 0.473 ms stddev 0.773
andres(at)awork2:~/src/postgresql$ pgbench -h localhost -p 5440 postgres -n -f /tmp/temptable.sql -j 16 -c 16 -T 20 -P 1
CREATE TEMPORARY TABLE blarg() ON COMMIT DROP;
progress: 6.0 s, 5018.2 tps, lat 3.185 ms stddev 3.671
progress: 7.0 s, 4890.9 tps, lat 3.272 ms stddev 4.346
and that's with zero actual columns. If you instead add some:
CREATE TEMPORARY TABLE blarg(id serial primary key, data text, blarg int) ON COMMIT DROP;
progress: 7.0 s, 974.1 tps, lat 16.462 ms stddev 9.058
progress: 8.0 s, 999.9 tps, lat 16.045 ms stddev 7.011
So no, I don't think this'll be a relevant problem. We do so many
inserts for a single temp table that a single insert into another one
completely vanishes in comparison.
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2015-07-03 17:21:21 | Re: WAL logging problem in 9.4.3? |
Previous Message | Martijn van Oosterhout | 2015-07-03 17:14:26 | Re: WAL logging problem in 9.4.3? |