Re: Column as arrays.. more efficient than columns?

From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Ron Johnson" <ron(dot)l(dot)johnson(at)cox(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Column as arrays.. more efficient than columns?
Date: 2007-09-07 12:49:47
Message-ID: b42b73150709070549m2b919184i80e04b1a2e5fa557@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 9/7/07, Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 09/06/07 20:53, Merlin Moncure wrote:
> [snip]
> >
> > arrays are interesting and have some useful problems. however, we
> > must first discuss the problems...first and foremost if you need to
> > read any particular item off the array you must read the entire array
> > from disk and you must right all items back to disk for writes.
>
> Reads and writes are done at the page level, so I'm not sure this is
> valid.

sure it is...since the denormalized record is much larger (especially
in array scenarios), the tuple is much larger meaning the page will
fill up much more quickly meaning more dead pages, more vacuuming,
etc. Besides that, the server has to do some work presenting the
array as part of the read which is overhead. I didn't go into a lot
of detail but the reasoning is sound. Here is a quick example showing
the problem.

merlin

create table denormalized
(
data int[]
);

create table normalized
(
id int primary key,
datum int
);

insert into normalized select v, v from generate_series(1, 100) v;
insert into denormalized select array(select generate_series(1,100));

create sequence rotator maxvalue 100 cycle;

-- bench denormalized (d.sql) --
update denormalized set data[n] = data[n] + 1 from (select
nextval('rotator') as n) q

merlin(at)mernix:~$ pgbench -c 4 -t 1000 -f d.sql
starting vacuum...end.
transaction type: Custom query
scaling factor: 1
number of clients: 4
number of transactions per client: 1000
number of transactions actually processed: 4000/4000
tps = 2452.188456 (including connections establishing)
tps = 2465.262905 (excluding connections establishing)

INFO: "normalized": found 0 removable, 100 nonremovable row versions
in 38 pages

-- bench normalized (n.sql) --
update normalized set datum = datum + 1 where id = (select nextval('rotator'));

merlin(at)mernix:~$ pgbench -c 4 -t 1000 -f n.sql
starting vacuum...end.
transaction type: Custom query
scaling factor: 1
number of clients: 4
number of transactions per client: 1000
number of transactions actually processed: 4000/4000
tps = 6494.402637 (including connections establishing)
tps = 6594.087741 (excluding connections establishing)

INFO: "denormalized": found 0 removable, 1 nonremovable row versions
in 223 page

merlin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron Johnson 2007-09-07 13:02:52 Re: Column as arrays.. more efficient than columns?
Previous Message Hannes Dorbath 2007-09-07 12:23:19 Re: Connection Pooling directly on Postgres Server