From: | Scara Maccai <m_lists(at)yahoo(dot)it> |
---|---|
To: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Performance problem with low correlation data |
Date: | 2009-07-07 07:16:55 |
Message-ID: | 357794.55023.qm@web24604.mail.ird.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> But that would be a different query -- there's no
> restrictions on the
> t values in this one.
There is a restriction on the t values:
select * from idtable left outer join testinsert on id=ne_id where groupname='a group name' and time between $a_date and $another_date
> Have you tried something using IN or EXISTS instead of a
> join?
I still get nested loop join on the ne_id column...
> The
> algorithm you describe doesn't work for the join because it
> has to
> produce a record which includes the matching group columns.
Yeah, I thought about that.
Basically I guess the "perfect" algorithm would be something like:
Hash Join <---- this is needed to join values from both relations
-> Bitmap Heap Scan
for each id found in idtable where groupname='a group name'
BitmapOr
BitmapIndexScan using ne_id and time between $a_date and $another_date
-> select id from idtable where groupname='a group name'
> Actually I wonder if doing a sequential scan with a hash
> join against
> the group list wouldn't be a better option.
The table is pretty big (60M rows), sequential scans are the reason why my queries are so slow: since the correlation on the ne_id col is so bad, the planner chooses seq scans when dealing with most of the "t" values, even if the number of "ne_id" values is low.
For the moment I've found this solution:
whenever too many "t" are selected, which would lead the planner towards a seq scan (or a very poor bitmap index scan in case I disable seq scans) I create a temporary table:
create temporary table alldata as
select * FROM generate_series(mydatestart, mydateend, '15 minutes'::interval) as t
cross join idtable where groupname='a group name'
order by t,id;
analyze alldata;
select * from alldata left outer join testinsert using (ne_id,t);
basically I'm doing what I'd like PG to do:
since the correlation on the "t" col is good, and correlation on the "id" col is bad, query the index using the right order: "t" first, "id" then (given by the "order by t,id" on the creation of the temp table).
I would like PG to do that for me. Since it knows an index scan looping on ne_id would be wrong, I'd like it to create a "materialized" table where data is ordered by "t" first instead of going for the seq scan.
This would lead to a x10 - x100 improvement on query time.
From | Date | Subject | |
---|---|---|---|
Next Message | Andreas Wenk | 2009-07-07 07:33:02 | Sugestion a db modele like mysql workbrench |
Previous Message | Pavel Golub | 2009-07-07 06:17:27 | Re: BUG #4903: Sugestion a db modele like mysql workbrench |