Re: feeding big script to psql

From: Peter Wilson <petew(at)yellowhawk(dot)co(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: feeding big script to psql
Date: 2005-08-02 20:19:15
Message-ID: dcokg3$31d5$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:
> Peter Wilson <petew(at)yellowhawk(dot)co(dot)uk> writes:
>> I found a while ago that after inserting a lot of rows into a clean
>> Postgres table it would take several minutes just to analyse a command,
>> not even starting the execution.
>
> Oh? Could you provide a test case for this? I can certainly believe
> that the planner might choose a bad plan if it has no statistics, but
> it shouldn't take a long time to do it.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Hi Tom,
I've spent half an hour restoring the database from the backup I took. This was back in December '04. The database dump is about 95Mbytes, none of the
tables are particularly big.

On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is what pgadminIII does when you press the explain button. According to the manual,
this actually executes the command and gives actual times (I think pgadmin is wrong here - if you ask it to explain a command you want the plan it's
going to use - not wait for it to finish!)

That said - I'll include the details here, at least partly because the tables are small in database terms, and I don't understand how this query even
with lots of scanning can take 100% CPU for over 10 minutes (654 seconds for explain/analyze 651 seconds for execute).

OK - the query is:

SELECT DISTINCT c.client_id, c.instance, c.contact_id, c.uname FROM contacts c
WHERE c.client_id = 'gadget'
AND c.instance = '0'
AND ( c.type = 'COMMUNITY'
OR c.type = 'OU'
OR c.type = 'INDIVIDUAL'
)
AND c.contact_id in (
SELECT subb.community_id FROM contact_att subb
WHERE subb.client_id = 'gadget'
AND subb.instance = '0'
AND subb.contact_id = 3854.000000
) ;

By itself the sub-select executes in 235ms and yields a set that does not vary depending on the outer select - it's a constant set in effect. It would
seem that in the worst case assuming no index or size information, the planner should spot the invariance of the subselect and a sequential scan of
the 'contacts' table would be the worst result I would expect.

There are two tables involved in this query.

'contacts' contains 3728 records.
'contact_att' contains 19217 records.

The query plan yields :

Unique (cost=12.05..12.06 rows=1 width=90) (actual time=654491.967..654491.984 rows=3 loops=1)
-> Sort (cost=12.05..12.05 rows=1 width=90) (actual time=654491.958..654491.959 rows=3 loops=1)
Sort Key: c.client_id, c.instance, c.contact_id, c.uname
-> Nested Loop IN Join (cost=0.00..12.04 rows=1 width=90) (actual time=577763.884..654491.864 rows=3 loops=1)
Join Filter: ("outer".contact_id = "inner".community_id)
-> Index Scan using cos_uk on contacts c (cost=0.00..6.02 rows=1 width=90) (actual time=0.227..123.862 rows=2791 loops=1)
Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
Filter: ((("type")::text = 'COMMUNITY'::text) OR (("type")::text = 'OU'::text) OR (("type")::text = 'INDIVIDUAL'::text))
-> Index Scan using ca_pk on contact_att subb (cost=0.00..6.01 rows=1 width=8) (actual time=0.207..234.423 rows=3 loops=2791)
Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text))
Filter: ((contact_id)::numeric = 3854.000000)
Total runtime: 654492.320 ms

Definitions for the two relevant tables are:

create table contacts (
INSTANCE CHARACTER (1) NOT NULL ,
client_id varchar (50) not null ,
contact_id bigint default nextval('contacts_contact_id_seq'::text),
UNAME VARCHAR (32) NOT NULL ,
TYPE VARCHAR (20) NOT NULL DEFAULT 'INDIVIDUAL',
parent bigint,
NAME VARCHAR (240) ,
PHONE VARCHAR (32) ,
FAX VARCHAR (32) ,
EMAIL VARCHAR (240) ,
BASIC_PW VARCHAR (128) ,
DESCRIPTION VARCHAR (240),
custom_data varchar(8192),
CONSTRAINT COS_PK PRIMARY KEY(INSTANCE,CLIENT_ID,CONTACT_ID)
)

create table contact_att(
instance character(1),
client_id varchar(50) not null,
contact_id bigint,
community_id bigint,
inherited smallint,
CONSTRAINT CA_PK PRIMARY KEY(INSTANCE,CLIENT_ID,CONTACT_ID,COMMUNITY_ID)
)

CREATE INDEX CO_PA_IND ON CONTACTS (PARENT);
CREATE INDEX CO_TY_IND ON CONTACTS (TYPE);

--
Hope you find that useful! If there is anything else I can provide you with then please do let me know.

Pete

------------------------------------------------------------------------
Peter Wilson - YellowHawk Ltd, http://www.yellowhawk.co.uk

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Wilson 2005-08-02 20:24:56 Re: feeding big script to psql
Previous Message Jaime Casanova 2005-08-02 19:57:40 Re: indexes are fucked