From: | Scara Maccai <m_lists(at)yahoo(dot)it> |
---|---|
To: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Performance problem with low correlation data |
Date: | 2009-07-06 17:32:29 |
Message-ID: | 730660.52675.qm@web24615.mail.ird.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I have a problem with the method that PG uses to access my data.
Data into testinsert is inserted every 15 minutes.
ne_id varies from 1 to 20000.
CREATE TABLE testinsert
(
ne_id integer NOT NULL,
t timestamp without time zone NOT NULL,
v integer[],
CONSTRAINT testinsert_pk PRIMARY KEY (ne_id, t)
)
CREATE UNIQUE INDEX testinsert_time_key
ON testinsert
USING btree
(t, ne_id);
This table has, then, a t correlation of 1, and a ne_id correlation close to 0.
I query this table using another table:
CREATE TABLE idtable
(
id integer NOT NULL,
groupname varchar(50)
CONSTRAINT idtable_pk PRIMARY KEY (id, groupname)
)
CREATE INDEX idtable_group_idx
ON idtable
USING btree
(groupname);
where each id is associated with a group:
select * from idtable left outer join testinsert on id=ne_id where groupname='a group name' and time between $a_date and $another_date
PG usually choose a nested loop join over all the ne_ids found for groupname='a group name'.
BUT, given the correlation in the table, this is a horrible data access: the table (15GB) gets read randomly, since data for one ne_id is scattered all over the table;
The "best" way to read the table would still be a nested loop, but a loop on the "t" values, not on the ne_id values, since data for the same timestamp is "close".
Or, even better, something like this would be very nice:
Bitmap Heap Scan
for each id found in idtable where groupname='a group name'
BitmapOr
BitmapIndexScan using ne_id and time between $a_date and $another_date
That is: I understand why PG is using that access method to fetch the indexes, but I would like it to fetch the heaps only after ALL the indexes have been read, so that it could reorder them...
So, given that:
How can I tell to PG to use an algorithm such as:
fetch the heap
for each quarter
for each id found where groupname='a group name'
fetch all the indexes
instead of:
for each id found where groupname='a group name'
fetch the heap
fetch all the indexes where ne_id=id time between $a_date and $another_date
????
(
some other infos:
1) results clustering the table are x10-x20 faster, but I can't cluster the table (it gets written every 15 minutes and read pretty often)
2) I know all the "t" values that I'm going to query, since there won't be more than 1 t per ne_id per 15 minutes;
so I could use a generate_series($a_date, $another_date, 15 minutes) if that could help somehow:
select * from
idtable
cross join generate_series($a_date, $another_date, 15 minutes) as myt
left outer join testinsert on id=ne_id and myt=t
where groupname='a group name'
but it doesn't help...
)
From | Date | Subject | |
---|---|---|---|
Next Message | Stuart McGraw | 2009-07-06 17:40:47 | Re: Windows installer for pg-8.4 confusion |
Previous Message | Andreas Wenk | 2009-07-06 17:22:58 | Re: Windows installer for pg-8.4 confusion |