Re: feeding big script to psql

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Wilson <petew(at)yellowhawk(dot)co(dot)uk>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: feeding big script to psql
Date: 2005-08-02 21:44:30
Message-ID: 24774.1123019070@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Peter Wilson <petew(at)yellowhawk(dot)co(dot)uk> writes:
> Tom Lane wrote:
>>> Oh? Could you provide a test case for this? I can certainly believe
>>> that the planner might choose a bad plan if it has no statistics, but
>>> it shouldn't take a long time to do it.

> On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is
> what pgadminIII does when you press the explain button.

Ah. Well, this is an ideal example of why you need statistics ---
without 'em, the planner is more or less flying blind about the number
of matching rows. The original plan had

> -> Index Scan using ca_pk on contact_att subb (cost=0.00..6.01 rows=1 width=8) (actual time=0.207..234.423 rows=3 loops=2791)
> Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
> Filter: ((contact_id)::numeric = 3854.000000)

while your "after a vacuum" (I suppose really a vacuum analyze) plan has

> -> Index Scan using ca_pk on contact_att subb (cost=0.00..1433.95 rows=78 width=8) (actual time=0.367..259.617 rows=3 loops=1)
> Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
> Filter: ((contact_id)::numeric = 3854.000000)

This is the identical scan plan ... but now that the planner realizes
it's going to be pretty expensive, it arranges the join in a way that
requires only one scan of contact_att and not 2791 of 'em.

The key point here is that the index condition on instance/client_id
is not selective --- it'll pull out a lot of rows. All but 3 of 'em are
then discarded by the contact_id condition, but the damage in terms
of runtime was already done. With stats, the planner can realize this
--- without stats, it has no chance.

Looking at your table definition, I suppose you were expecting the
contact_id condition to be used with the index, but since contact_id is
bigint, comparing it to a numeric-type constant is not considered indexable.
You want to lose the ".000000" in the query.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Edmund Dengler 2005-08-02 21:52:25 Re: Failure to use indexes (fwd)
Previous Message Scott Marlowe 2005-08-02 21:22:09 Re: Failure to use indexes (fwd)