| From: | Harald Fuchs <hari(dot)fuchs(at)googlemail(dot)com> | 
|---|---|
| To: | pgsql-general(at)postgresql(dot)org | 
| Subject: | generate_series woes | 
| Date: | 2008-04-14 09:21:58 | 
| Message-ID: | puhce43i6h.fsf@srv.protecting.net | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
I think there's something sub-optimal with generate_series.
In the following, "documents" is a table with more than 120000 rows,
vacuumed and analyzed before the queries.
  EXPLAIN ANALYZE
  SELECT count (d.id), floor (s.val / 5000)
  FROM generate_series (1::INT, 5009) AS s (val)
  LEFT JOIN documents d ON d.id = s.val
  GROUP BY 2
  ORDER BY 2;
This returns:
 Sort  (cost=4231.52..4232.02 rows=200 width=8) (actual time=41.886..41.887 rows=2 loops=1)
   Sort Key: (floor(((s.val / 5000))::double precision))
   Sort Method:  quicksort  Memory: 25kB
   ->  HashAggregate  (cost=4219.88..4223.88 rows=200 width=8) (actual time=41.843..41.846 rows=2 loops=1)
         ->  Nested Loop Left Join  (cost=0.00..4214.88 rows=1000 width=8) (actual time=1.274..38.193 rows=5009 loops=1)
               ->  Function Scan on generate_series s  (cost=0.00..12.50 rows=1000 width=4) (actual time=1.209..3.102 rows=5009 loops=1)
               ->  Index Scan using documents_pkey on documents d  (cost=0.00..4.18 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=5009)
                     Index Cond: (d.id = s.val)
 Total runtime: 42.218 ms
Now let's wrap generate_series into an SQL function:
  CREATE FUNCTION genser (int, int) RETURNS SETOF int AS $$
    SELECT * FROM generate_series ($1, $2) AS g(x);
  $$ LANGUAGE sql;
  EXPLAIN ANALYZE
  SELECT count (d.id), floor (s.val / 5000)
  FROM genser (1::INT, 5009) AS s (val)
  LEFT JOIN documents d ON d.id = s.val
  GROUP BY 2
  ORDER BY 2;
Not surprisingly, this returns the same plan:
 Sort  (cost=4479.02..4479.52 rows=200 width=8) (actual time=43.606..43.607 rows=2 loops=1)
   Sort Key: (floor(((s.val / 5000))::double precision))
   Sort Method:  quicksort  Memory: 25kB
   ->  HashAggregate  (cost=4467.38..4471.38 rows=200 width=8) (actual time=43.559..43.561 rows=2 loops=1)
         ->  Nested Loop Left Join  (cost=0.00..4462.38 rows=1000 width=8) (actual time=3.564..39.740 rows=5009 loops=1)
               ->  Function Scan on genser s  (cost=0.00..260.00 rows=1000 width=4) (actual time=3.503..5.435 rows=5009 loops=1)
               ->  Index Scan using documents_pkey on documents d  (cost=0.00..4.18 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=5009)
                     Index Cond: (d.id = s.val)
 Total runtime: 44.047 ms
(9 rows)
But look what happens if we tell PostgreSQL how many rows "genser"
will return:
  CREATE FUNCTION genser (int, int) RETURNS SETOF int AS $$
    SELECT * FROM generate_series ($1, $2) AS g(x);
  $$ LANGUAGE sql ROWS 5009;
  EXPLAIN ANALYZE
  SELECT count (d.id), floor (s.val / 5000)
  FROM genser (1::INT, 5009) AS s (val)
  LEFT JOIN documents d ON d.id = s.val
  GROUP BY 2
  ORDER BY 2;
Now we get a better plan:
 Sort  (cost=15545.54..15546.04 rows=200 width=8) (actual time=27.857..27.859 rows=2 loops=1)
   Sort Key: (floor(((s.val / 5000))::double precision))
   Sort Method:  quicksort  Memory: 25kB
   ->  HashAggregate  (cost=15533.89..15537.89 rows=200 width=8) (actual time=27.817..27.819 rows=2 loops=1)
         ->  Merge Right Join  (cost=1610.15..15508.85 rows=5009 width=8) (actual time=7.714..24.133 rows=5009 loops=1)
               Merge Cond: (d.id = s.val)
               ->  Index Scan using documents_pkey on documents d  (cost=0.00..13472.20 rows=125518 width=4) (actual time=0.045..6.112 rows=5010 loops=1)
               ->  Sort  (cost=1610.15..1622.67 rows=5009 width=4) (actual time=7.651..9.501 rows=5009 loops=1)
                     Sort Key: s.val
                     Sort Method:  quicksort  Memory: 427kB
                     ->  Function Scan on genser s  (cost=0.00..1302.34 rows=5009 width=4) (actual time=3.559..5.262 rows=5009 loops=1)
 Total runtime: 28.445 ms
(12 rows)
Since generate_series is a builtin function, can't it tell how many
rows it will return?
| From | Date | Subject | |
|---|---|---|---|
| Next Message | imageguy | 2008-04-14 12:09:52 | Re: Number or parameters for functions - limited to 32 ? | 
| Previous Message | Richard Huxton | 2008-04-14 08:37:28 | Re: The default text search configuration will be set to "simple" ? |