Re: planer picks a bad plan (seq-scan instead of index)

From: Russell Smith <mr-russ(at)pws(dot)com(dot)au>
To: "Thomas H(dot)" <me(at)alternize(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: planer picks a bad plan (seq-scan instead of index)
Date: 2006-11-09 11:31:10
Message-ID: 4553117E.4070107@pws.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thomas H. wrote:
> hi list.
>
> as soon as i left-join an additional table, the query takes 24sec
> instead of 0.2sec, although the added fields have no impact on the
> resultset:
>
> --------------------
> SELECT * FROM shop.dvds
> LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id
> WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like
> '%superman re%' OR lower(dvd_edition) LIKE '%superman re%')
> --------------------
> Hash Left Join (cost=8402.16..10733.16 rows=39900 width=1276) (actual
> time=260.712..260.722 rows=2 loops=1)
> Hash Cond: (dvds.dvd_mov_id = movies.mov_id)
> Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR
> (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text))
> -> Seq Scan on dvds (cost=0.00..1292.00 rows=39900 width=1062)
> (actual time=0.036..23.594 rows=20866 loops=1)
> -> Hash (cost=8194.93..8194.93 rows=82893 width=214) (actual
> time=168.121..168.121 rows=37417 loops=1)
> -> Seq Scan on movies (cost=0.00..8194.93 rows=82893
> width=214) (actual time=0.024..131.401 rows=37417 loops=1)
> Total runtime: 264.193 ms
> 2 rows fetched
> --------------------
>
1. You MUST sequence scan dvds, as there is no way to do an index search
on a like with % at the beginning.
2. You are asking for a left join on dvds, which means you want all
records, so you must sequence scan dvds. The filters are all OR, so you
can't say that a records is excluded until AFTER the join is completed.
3. The join estimates that 39900 rows will come out, but only 2 do,
thats out by a large factor 10^4, which means that it's not going to
join movies after the filter is applied.

> now, an additional table (containing 600k records) is added through a
> left join. all the sudden the query takes 24sec. although there are
> indices on both dvds.dvd_ean and data_soundmedia.sm_info_ean, the
> planer does not make use of the indices but rather chooses to do 2
> seq-scans.
The 2 items from the first query are still true. You just have a lot
more records to play with now by joining in a 3rd table. Which means
there is no way to reduce the possible output set before the join.

-> Sort (cost=286162.37..287781.38 rows=647601 width=660) (actual
time=19336.011..20328.247 rows=646633 loops=1)
This sort here is where nearly all of the 24 seconds goes. I am assuming
at this point that the Merge Join is chosen because of the stats problem
listed next;

> Merge Left Join (cost=309190.05..313899.09 rows=159086 width=1722)
(actual time=19876.552..21902.007 rows=20866 loops=1)
This estimate is out by a factor of 10. I'd increase the statistics on
the join columns to see if it decides to use a different join method.

If the estimate for the Merge join is fixed to be closed, then it's
likely an index scan would be chosen, 159000 is about 25% of the table.
Assuming it's small on disk then it's faster to do a seq_scan than all
the random I/O to read the index, then the heap to produce results.

How big is data_soundmedia? Mb size, not rows.
How much is your sort_mem/work_mem?
>
> --------------------
> SELECT * FROM shop.dvds
> LEFT JOIN oldtables.movies ON mov_id = dvd_mov_id
> LEFT JOIN shop.data_soundmedia ON sm_info_ean = dvd_ean
> WHERE (lower(mov_name) LIKE '%superman re%' OR lower(dvd_name) like
> '%superman re%' OR lower(dvd_edition) LIKE '%superman re%')
> --------------------
> Hash Left Join (cost=317592.21..326882.92 rows=159086 width=1936)
> (actual time=21021.023..22242.253 rows=2 loops=1)
> Hash Cond: (dvds.dvd_mov_id = movies.mov_id)
> Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR
> (lower((dvds.dvd_name)::text) ~~ '%superman re%'::text) OR
> (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text))
> -> Merge Left Join (cost=309190.05..313899.09 rows=159086
> width=1722) (actual time=19876.552..21902.007 rows=20866 loops=1)
> Merge Cond: ("outer"."?column20?" = "inner"."?column29?")
> -> Sort (cost=23027.68..23127.43 rows=39900 width=1062)
> (actual time=507.886..520.143 rows=20866 loops=1)
> Sort Key: (dvds.dvd_ean)::text
> -> Seq Scan on dvds (cost=0.00..1292.00 rows=39900
> width=1062) (actual time=0.047..100.415 rows=20866 loops=1)
> -> Sort (cost=286162.37..287781.38 rows=647601 width=660)
> (actual time=19336.011..20328.247 rows=646633 loops=1)
> Sort Key: (data_soundmedia.sm_info_ean)::text
> -> Seq Scan on data_soundmedia (cost=0.00..31080.01
> rows=647601 width=660) (actual time=0.074..2834.831 rows=647601 loops=1)
> -> Hash (cost=8194.93..8194.93 rows=82893 width=214) (actual
> time=177.033..177.033 rows=37417 loops=1)
> -> Seq Scan on movies (cost=0.00..8194.93 rows=82893
> width=214) (actual time=0.118..129.716 rows=37417 loops=1)
> Total runtime: 24419.939 ms
> 2 rows fetched
> --------------------
>
> shouldn't the planer join the additional table *after* filtering? even
> if it does first joining then filtering, why isn't the existing index
> not used?
I would fix the above anomalies before asking these questions.

Regards

Russell Smith

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2006-11-09 11:34:40 Re: autovacuum blues
Previous Message Richard Huxton 2006-11-09 11:28:49 Re: RULE - special variables?