Quick Links

Re: Data Warehouse Reevaluation - MySQL vs Postgres --

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	Iain <iain(at)mst(dot)co(dot)jp>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Data Warehouse Reevaluation - MySQL vs Postgres --
Date:	2004-09-16 04:07:34
Message-ID:	41491186.1060708@joeconway.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Iain wrote:
>>That's exactly what we're doing, but using inherited tables instead of a
>>union view. With inheritance, there is no need to rebuild the view each
>>time a table is added or removed. Basically, in our application, tables
>>are partitioned by either month or week, depending on the type of data
>>involved, and queries are normally date qualified.
>
> That sounds interesting. I have to admit that I havn't touched iheritance in
> pg at all yet so I find it hard to imagine how this would work. If you have
> a chance, would you mind elaborating on it just a little?

OK, see below:
=====================

create table foo(f1 int, f2 date, f3 float8);

create table foo_2004_01() inherits (foo);
create table foo_2004_02() inherits (foo);
create table foo_2004_03() inherits (foo);

create index foo_2004_01_idx1 on foo_2004_01(f2);
create index foo_2004_02_idx1 on foo_2004_02(f2);
create index foo_2004_03_idx1 on foo_2004_03(f2);

insert into foo_2004_02 values(1,'2004-feb-15',3.14);

-- needed just for illustration since these are toy tables
set enable_seqscan to false;
explain analyze select * from foo where f2 = '2004-feb-15';

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=100000000.00..100000061.32 rows=16 width=16) (actual
time=0.224..0.310 rows=1 loops=1)
-> Append (cost=100000000.00..100000061.32 rows=16 width=16)
(actual time=0.214..0.294 rows=1 loops=1)
-> Seq Scan on foo (cost=100000000.00..100000022.50 rows=5
width=16) (actual time=0.004..0.004 rows=0 loops=1)
Filter: (f2 = '2004-02-15'::date)
-> Index Scan using foo_2004_01_idx1 on foo_2004_01 foo
(cost=0.00..17.07 rows=5 width=16) (actual time=0.101..0.101 rows=0 loops=1)
Index Cond: (f2 = '2004-02-15'::date)
-> Index Scan using foo_2004_02_idx1 on foo_2004_02 foo
(cost=0.00..4.68 rows=1 width=16) (actual time=0.095..0.101 rows=1 loops=1)
Index Cond: (f2 = '2004-02-15'::date)
-> Index Scan using foo_2004_03_idx1 on foo_2004_03 foo
(cost=0.00..17.07 rows=5 width=16) (actual time=0.066..0.066 rows=0 loops=1)
Index Cond: (f2 = '2004-02-15'::date)
Total runtime: 0.582 ms
(11 rows)

create table foo_2004_04() inherits (foo);
create index foo_2004_04_idx1 on foo_2004_04(f2);

explain analyze select * from foo where f2 = '2004-feb-15';

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=100000000.00..100000078.38 rows=21 width=16) (actual
time=0.052..0.176 rows=1 loops=1)
-> Append (cost=100000000.00..100000078.38 rows=21 width=16)
(actual time=0.041..0.159 rows=1 loops=1)
-> Seq Scan on foo (cost=100000000.00..100000022.50 rows=5
width=16) (actual time=0.004..0.004 rows=0 loops=1)
Filter: (f2 = '2004-02-15'::date)
-> Index Scan using foo_2004_01_idx1 on foo_2004_01 foo
(cost=0.00..17.07 rows=5 width=16) (actual time=0.012..0.012 rows=0 loops=1)
Index Cond: (f2 = '2004-02-15'::date)
-> Index Scan using foo_2004_02_idx1 on foo_2004_02 foo
(cost=0.00..4.68 rows=1 width=16) (actual time=0.016..0.022 rows=1 loops=1)
Index Cond: (f2 = '2004-02-15'::date)
-> Index Scan using foo_2004_03_idx1 on foo_2004_03 foo
(cost=0.00..17.07 rows=5 width=16) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (f2 = '2004-02-15'::date)
-> Index Scan using foo_2004_04_idx1 on foo_2004_04 foo
(cost=0.00..17.07 rows=5 width=16) (actual time=0.095..0.095 rows=0 loops=1)
Index Cond: (f2 = '2004-02-15'::date)
Total runtime: 0.443 ms
(13 rows)

For loading data, we COPY into foo, and have a trigger that redirects
the rows to the appropriate partition.

Notice that the partitions which do not contain any data of interest are
still probed for data, but since they have none it is very quick. In a
real life example I got the following results just this afternoon:

- aggregate row count = 471,849,665
- total number inherited tables = 216
(many are future dated and therefore contain no data)
- select one month's worth of data for one piece of equipment by serial
number (49,257 rows) = 526.015 ms

Not too bad -- quick enough for my needs. BTW, this is using NFS mounted
storage (NetApp NAS).

Joe

In response to

Re: Data Warehouse Reevaluation - MySQL vs Postgres -- at 2004-09-15 04:54:18 from Iain

Responses

Re: Data Warehouse Reevaluation - MySQL vs Postgres -- at 2004-09-16 05:08:34 from Iain

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Steve Atkins	2004-09-16 04:17:03	Re: Data Warehouse Reevaluation - MySQL vs Postgres --
Previous Message	Greg Stark	2004-09-16 03:55:24	Re: Partitioning