| From: | Peter Wilson <petew(at)yellowhawk(dot)co(dot)uk> | 
|---|---|
| To: | pgsql-general(at)postgresql(dot)org | 
| Subject: | Re: feeding big script to psql | 
| Date: | 2005-08-03 19:34:47 | 
| Message-ID: | dcr68o$29n5$1@news.hub.org | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
Tom Lane wrote:
> Peter Wilson <petew(at)yellowhawk(dot)co(dot)uk> writes:
>> Tom Lane wrote:
>>>> Oh?  Could you provide a test case for this?  I can certainly believe
>>>> that the planner might choose a bad plan if it has no statistics, but
>>>> it shouldn't take a long time to do it.
> 
>> On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is
>> what pgadminIII does when you press the explain button.
> 
> Ah.  Well, this is an ideal example of why you need statistics ---
> without 'em, the planner is more or less flying blind about the number
> of matching rows.  The original plan had
> 
>>                ->  Index Scan using ca_pk on contact_att subb  (cost=0.00..6.01 rows=1 width=8) (actual time=0.207..234.423 rows=3 loops=2791)
>>                      Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
>>                      Filter: ((contact_id)::numeric = 3854.000000)
> 
> while your "after a vacuum" (I suppose really a vacuum analyze) plan has
> 
>>                      ->  Index Scan using ca_pk on contact_att subb  (cost=0.00..1433.95 rows=78 width=8) (actual time=0.367..259.617 rows=3 loops=1)
>>                            Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
>>                            Filter: ((contact_id)::numeric = 3854.000000)
> 
> This is the identical scan plan ... but now that the planner realizes
> it's going to be pretty expensive, it arranges the join in a way that
> requires only one scan of contact_att and not 2791 of 'em.
> 
> The key point here is that the index condition on instance/client_id
> is not selective --- it'll pull out a lot of rows.  All but 3 of 'em are
> then discarded by the contact_id condition, but the damage in terms
> of runtime was already done.  With stats, the planner can realize this
> --- without stats, it has no chance.
> 
> Looking at your table definition, I suppose you were expecting the
> contact_id condition to be used with the index, but since contact_id is
> bigint, comparing it to a numeric-type constant is not considered indexable.
> You want to lose the ".000000" in the query.
> 
> 			regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
> 
Thanks for that Tom - especially the bit about removing the .00000 from
the numbers. I'm pretty new to some of this database stuff - even newer at
trying to optimise queries and 'think like the planner'. Never occurred to
me the number format would have that effect.
Removing the zeroes actaully knocked a few ms of the execution times in
real-life querries :-)
Just out of interest - is there an opportunity for the planner to realise
the sub-select is basically invariant for the outer-query and execute once,
regardless of stats. Seems like the loop-invariant optimisation in a 'C'
compiler. If you have to do something once v. doing it 2791 times then
I'd plop for the once!
Thanks again Tom, much appreciated for that little nugget
Pete
--
Peter Wilson. YellowHawk Ltd, http://www.yellowhawk.co.uk
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Aaron Harsh | 2005-08-03 19:35:26 | oids and pg_class_oid_index constraint violations | 
| Previous Message | Tom Lane | 2005-08-03 19:27:53 | Re: ssl problem with postgres 8.0 |