From: | "Peter Koczan" <pjkoczan(at)gmail(dot)com> |
---|---|
To: | pgsql-performance <pgsql-performance(at)postgresql(dot)org> |
Subject: | sequence query performance issues |
Date: | 2007-09-27 22:04:35 |
Message-ID: | 4544e0330709271504g364fd042u161966fa448934f6@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello,
I have a weird performance issue with a query I'm testing. Basically,
I'm trying to port a function that generates user uids, and since
postgres offers a sequence generator function, I figure I'd take
advantage of that. Basically, I generate our uid range, filter out
those which are in use, and randomly pick however many I need.
However, when I run it it takes forever (>10 minutes and I get nothing
so I cancelled the query) and cpu usage on the server is maxed out.
Here's my query (I'll post the explain output later so as not to
obscure my question):
=> select a.uid from generate_series(1000, 32767) as a(uid) where
a.uid not in (select uid from people) order by random() limit 1;
I thought that nulls were a problem, so I tried:
=> select a.uid from generate_series(1000, 32767) as a(uid) where
a.uid not in (select coalesce(uid,0) from people) order by random()
limit 1;
And that finished in less than a second.
I then tried:
=> select a.uid from generate_series(1000, 32767) as a(uid) where
a.uid not in (select coalesce(uid,0) from people where uid is not
null) order by random() limit 1;
And we're back to taking forever.
So I have 2 questions:
- Is there a better query for this purpose? Mine works when coalesced,
but it seems a little brute-force and the random() sorting, while
kinda nice, is slow.
- Is this in any way expected? I know that nulls sometimes cause
problems, but why is it taking forever even when trying to filter
those out?
Thanks.
Peter
The gory details:
- There is an btree index on people(uid), and there are ~6300 rows, of
which ~1300 have null uids.
- EXPLAIN output (I couldn't get EXPLAIN ANALYZE output from the first
two queries since they took too long):
=> explain select a.uid from generate_series(1000, 32767) as a(uid)
where a.uid not in (select uid from people) order by random() limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------
Limit (cost=40025.57..40025.60 rows=10 width=4)
-> Sort (cost=40025.57..40026.82 rows=500 width=4)
Sort Key: random()
-> Function Scan on generate_series a
(cost=693.16..40003.16 rows=500 width=4)
Filter: (NOT (subplan))
SubPlan
-> Materialize (cost=693.16..756.03 rows=6287 width=2)
-> Seq Scan on people (cost=0.00..686.87
rows=6287 width=2)
(8 rows)
=> explain select a.uid from generate_series(1000, 32767) as a(uid)
where a.uid not in (select uid from people where uid is not null)
order by random() limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------
Limit (cost=31486.71..31486.73 rows=10 width=4)
-> Sort (cost=31486.71..31487.96 rows=500 width=4)
Sort Key: random()
-> Function Scan on generate_series a
(cost=691.79..31464.29 rows=500 width=4)
Filter: (NOT (subplan))
SubPlan
-> Materialize (cost=691.79..741.00 rows=4921 width=2)
-> Seq Scan on people (cost=0.00..686.87
rows=4921 width=2)
Filter: (uid IS NOT NULL)
(9 rows)
=> explain select a.uid from generate_series(1000, 32767) as a(uid)
where a.uid not in (select coalesce(uid, 0) from people) order by
random() limit 1;
QUERY PLAN
----------------------------------------------------------------------------------------
Limit (cost=756.97..756.99 rows=10 width=4)
-> Sort (cost=756.97..758.22 rows=500 width=4)
Sort Key: random()
-> Function Scan on generate_series a (cost=718.30..734.55
rows=500 width=4)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on people (cost=0.00..702.59 rows=6287 width=2)
(7 rows)
=> explain analyze select a.uid from generate_series(1000, 32767) as
a(uid) where a.uid not in (select coalesce(uid, 0) from people) order
by random() limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=756.97..756.99 rows=10 width=4) (actual
time=370.444..370.554 rows=10 loops=1)
-> Sort (cost=756.97..758.22 rows=500 width=4) (actual
time=370.434..370.472 rows=10 loops=1)
Sort Key: random()
-> Function Scan on generate_series a (cost=718.30..734.55
rows=500 width=4) (actual time=70.018..199.540 rows=26808 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on people (cost=0.00..702.59 rows=6287
width=2) (actual time=0.023..29.167 rows=6294 loops=1)
Total runtime: 372.224 ms
(8 rows)
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Huxton | 2007-09-28 08:06:11 | Re: sequence query performance issues |
Previous Message | Ron Mayer | 2007-09-27 18:07:48 | Re: Searching for the cause of a bad plan |