From: | Ken Geis <kgeis(at)speakeasy(dot)org> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | plan problem |
Date: | 2004-04-06 20:25:54 |
Message-ID: | 40731252.2030002@speakeasy.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I am trying to find an efficient way to draw a random sample from a
complex query. I also want it to be easy to use within my application.
So I've defined a view that encapsulates the query. The id in the
"driving" table is exposed, and I run a query like:
select * from stats_record_view
where id in (select id from driver_stats
order by random()
limit 30000);
driver_stats.id is unique, the primary key. The problem I'm having is
that neither the ORDER BY nor the LIMIT change the uniqueness of that
column, but the planner doesn't know that. It does a HashAggregate to
make sure the results are unique. It thinks that 200 rows will come out
of that operation, and then 200 rows is small enough that it thinks a
Nested Loop is the best way to proceed from there.
I can post more query plan, but I don't think it would be that very
helpful. I'm considering just making a sample table and creating an
analogous view around that. I'd like to be able to keep this as simple
as possible though.
Ken
From | Date | Subject | |
---|---|---|---|
Next Message | Chris Browne | 2004-04-06 20:57:02 | Re: Raw devices vs. Filesystems |
Previous Message | Aaron Werman | 2004-04-06 20:22:46 | Re: possible improvement between G4 and G5 |