From: | Harald Fuchs <hari(dot)fuchs(at)googlemail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | generate_series woes |
Date: | 2008-04-14 09:21:58 |
Message-ID: | puhce43i6h.fsf@srv.protecting.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I think there's something sub-optimal with generate_series.
In the following, "documents" is a table with more than 120000 rows,
vacuumed and analyzed before the queries.
EXPLAIN ANALYZE
SELECT count (d.id), floor (s.val / 5000)
FROM generate_series (1::INT, 5009) AS s (val)
LEFT JOIN documents d ON d.id = s.val
GROUP BY 2
ORDER BY 2;
This returns:
Sort (cost=4231.52..4232.02 rows=200 width=8) (actual time=41.886..41.887 rows=2 loops=1)
Sort Key: (floor(((s.val / 5000))::double precision))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=4219.88..4223.88 rows=200 width=8) (actual time=41.843..41.846 rows=2 loops=1)
-> Nested Loop Left Join (cost=0.00..4214.88 rows=1000 width=8) (actual time=1.274..38.193 rows=5009 loops=1)
-> Function Scan on generate_series s (cost=0.00..12.50 rows=1000 width=4) (actual time=1.209..3.102 rows=5009 loops=1)
-> Index Scan using documents_pkey on documents d (cost=0.00..4.18 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=5009)
Index Cond: (d.id = s.val)
Total runtime: 42.218 ms
Now let's wrap generate_series into an SQL function:
CREATE FUNCTION genser (int, int) RETURNS SETOF int AS $$
SELECT * FROM generate_series ($1, $2) AS g(x);
$$ LANGUAGE sql;
EXPLAIN ANALYZE
SELECT count (d.id), floor (s.val / 5000)
FROM genser (1::INT, 5009) AS s (val)
LEFT JOIN documents d ON d.id = s.val
GROUP BY 2
ORDER BY 2;
Not surprisingly, this returns the same plan:
Sort (cost=4479.02..4479.52 rows=200 width=8) (actual time=43.606..43.607 rows=2 loops=1)
Sort Key: (floor(((s.val / 5000))::double precision))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=4467.38..4471.38 rows=200 width=8) (actual time=43.559..43.561 rows=2 loops=1)
-> Nested Loop Left Join (cost=0.00..4462.38 rows=1000 width=8) (actual time=3.564..39.740 rows=5009 loops=1)
-> Function Scan on genser s (cost=0.00..260.00 rows=1000 width=4) (actual time=3.503..5.435 rows=5009 loops=1)
-> Index Scan using documents_pkey on documents d (cost=0.00..4.18 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=5009)
Index Cond: (d.id = s.val)
Total runtime: 44.047 ms
(9 rows)
But look what happens if we tell PostgreSQL how many rows "genser"
will return:
CREATE FUNCTION genser (int, int) RETURNS SETOF int AS $$
SELECT * FROM generate_series ($1, $2) AS g(x);
$$ LANGUAGE sql ROWS 5009;
EXPLAIN ANALYZE
SELECT count (d.id), floor (s.val / 5000)
FROM genser (1::INT, 5009) AS s (val)
LEFT JOIN documents d ON d.id = s.val
GROUP BY 2
ORDER BY 2;
Now we get a better plan:
Sort (cost=15545.54..15546.04 rows=200 width=8) (actual time=27.857..27.859 rows=2 loops=1)
Sort Key: (floor(((s.val / 5000))::double precision))
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=15533.89..15537.89 rows=200 width=8) (actual time=27.817..27.819 rows=2 loops=1)
-> Merge Right Join (cost=1610.15..15508.85 rows=5009 width=8) (actual time=7.714..24.133 rows=5009 loops=1)
Merge Cond: (d.id = s.val)
-> Index Scan using documents_pkey on documents d (cost=0.00..13472.20 rows=125518 width=4) (actual time=0.045..6.112 rows=5010 loops=1)
-> Sort (cost=1610.15..1622.67 rows=5009 width=4) (actual time=7.651..9.501 rows=5009 loops=1)
Sort Key: s.val
Sort Method: quicksort Memory: 427kB
-> Function Scan on genser s (cost=0.00..1302.34 rows=5009 width=4) (actual time=3.559..5.262 rows=5009 loops=1)
Total runtime: 28.445 ms
(12 rows)
Since generate_series is a builtin function, can't it tell how many
rows it will return?
From | Date | Subject | |
---|---|---|---|
Next Message | imageguy | 2008-04-14 12:09:52 | Re: Number or parameters for functions - limited to 32 ? |
Previous Message | Richard Huxton | 2008-04-14 08:37:28 | Re: The default text search configuration will be set to "simple" ? |