RE: Performance in subquery

From: "Culberson, Philip" <philip(dot)culberson(at)dat(dot)com>
To: "'Brian Haney'" <brian(at)ibsystems(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: RE: Performance in subquery
Date: 2000-04-17 21:01:55
Message-ID: A95EFC3B707BD311986C00A0C9E95B6A9DE68C@datmail03.dat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Two things come to mind.

1) Make sure you have run VACUUM ANALYZE on your database.

2) Use a join rather than a sub-select, e.g.

SELECT
p.prod_name
FROM
products p, prod_attr pa
WHERE
p.prod_id = pa.prod_id AND
pa.prod_attr_text LIKE '%Linux%';

A word of caution though... Because you are using a wildcard in the first
position of your search string ('%Linux%'), you preclude the use of any
index and force a sequential scan!

Hope this helps,

Phil Culberson
DAT Services

-----Original Message-----
From: Brian Haney [mailto:brian(at)ibsystems(dot)com]
Sent: Monday, April 17, 2000 12:52 PM
To: pgsql-general(at)postgresql(dot)org
Subject: [GENERAL] Performance in subquery

I have multivendor product catalog and want to be able to search for
arbitrary text in a table of product attributes. Two of the several tables
are:

Table = products
+----------------------------------+----------------------------------+-----
--+
| Field | Type |
Length|
+----------------------------------+----------------------------------+-----
--+
| prod_id | int4 not null default nextval |
4 |
| vendor_id | int4 |
4 |
| category_id | int4 |
4 |
| prod_name | text not null |
var |
| prod_description | text not null |
var |
| prod_price_low | numeric | 30.6
|
| prod_price_high | numeric | 30.6
|
| prod_promo | text |
var |
| prod_promo_url | text |
var |
| prod_datasht_url | text |
var |
| prod_buynow_url | text |
var |
| prod_asp_url | text |
var |
| prod_active | bool default 'false' |
1 |
| last_modified | timestamp |
4 |
| prod_demo_url | text |
var |
| prod_ephone | text |
var |
+----------------------------------+----------------------------------+-----
--+
Index: products_pkey

Table = prod_attr
+----------------------------------+----------------------------------+-----
--+
| Field | Type |
Length|
+----------------------------------+----------------------------------+-----
--+
| attr_id | int4 not null |
4 |
| prod_id | int4 not null |
4 |
| prod_attr_text | text not null |
var |
+----------------------------------+----------------------------------+-----
--+
Indices: prod_attr_pkey
prod_attr_text_idx

The products table has 538 records and the prod_attr table has 7870.

When I enter the query:

SELECT prod_id FROM prod_attr WHERE prod_attr_text LIKE '%Linux%';

it performs quite well to give me a list of the product IDs almost
instantaneously.

But when I query:

select prod_name from products where prod_id in ( SELECT prod_id FROM
prod_attr
WHERE prod_attr_text LIKE '%Linux%');

It takes over 30 seconds to get the results. Here are the EXPLAINs:

First query:

Seq Scan on prod_attr (cost=317.84 rows=1 width=4)

Second query:

Seq Scan on products (cost=31.75 rows=538 width=12)
SubPlan
-> Seq Scan on prod_attr (cost=317.84 rows=1 width=4)

As you can see, I have created and index for prod_attr
(prod_attr_text), but it has had no discernable effect.

The explains imply to me that the real cost of the second query is
scanning the 7870 records, but then I would expect the second query to
take only slightly longer than the first. I suspect something else is
causing the second query to be so doggone slow.

Any help would be greatly appreciated.

--
Brian Haney VP Engineering/CTO
brian(at)ibsystems(dot)com Internet Business Systems, Inc.

Browse pgsql-general by date

  From Date Subject
Next Message Chris Carbaugh 2000-04-17 21:20:26 RE: excell to postgres
Previous Message Oelkers, Phil 2000-04-17 20:49:24 test nomail