Exclusion List

From: "Michael Richards" <michael(at)fastmail(dot)ca>
To: pgsql-sql(at)postgresql(dot)org
Subject: Exclusion List
Date: 2001-08-01 15:28:44
Message-ID: 3B68202C.0001A3.93146@frodo.searchcanada.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I've got 2 tables, one with a list of users (has only about 5000
entries) and another with a list of userids that have already been
sent letters. I'm trying to efficiently join these two so I get every
user who hasn't been sent a letter. The problem is, coupled with the
5 other joins on the user table the query can take about 5 minutes to
run. It's performing the joins and then reducing the list.

The reduction of the list doesn't seem to be terribly efficient. Here
are some strategies I've been looking at:

select id from users WHERE
id not in (select userid from sentletters where lettertype=1) AND
aclgroup IN (1,2);

Seq Scan on users (cost=0.00..217751.39 rows=5369 width=4)
SubPlan
-> Seq Scan on sentletters (cost=0.00..81.06 rows=4405 width=4)

select id from users WHERE not exists
(select userid from sentletters where lettertype=1 AND userid=id)
AND aclgroup IN (1,2);

Seq Scan on users (cost=0.00..10980.07 rows=1 width=4)
SubPlan
-> Index Scan using sentletters_userid_key on sentletters
(cost=0.00..2.02 rows=1 width=4)

select id from users AS u
LEFT JOIN sentletters AS sl ON (lettertype=1 AND sl.userid=u.id)
where sl.userid IS NULL AND u.aclgroup IN (1,2);

Hash Join (cost=81.06..550.18 rows=5322 width=12)
-> Seq Scan on users u (cost=0.00..152.53 rows=5322 width=4)
-> Hash (cost=70.05..70.05 rows=4405 width=8)
-> Seq Scan on sentletters sl (cost=0.00..70.05 rows=4405
width=8)

All but the last which is an odd way to do it have nasty query plan
and this isn't even talking about the joins.

I then need to join these results with a table that has about 200,000
rows and is joined on the userid and some other key elements.

Any suggestions on this? The full query causing the problem is:
select u.id,u.firstname,u.surname,f2.strval,f3.strval,f4.strval,
f5.strval,u2.firstname,u2.surname,m.strval
from
users as u
JOIN dft_formdata as f1 ON
(u.id=f1.userid AND u.aclgroup IN (1,2) AND f1.formid=1 AND
f1.fieldid=1)
LEFT JOIN dft_formdata as f2 ON
(u.id=f2.userid AND f2.formid=1 AND f2.fieldid=2)
JOIN dft_formdata as f3 on
(u.id=f3.userid AND f3.formid=1 AND f3.fieldid=3)
JOIN dft_formdata as f4 on
(u.id=f4.userid AND f4.formid=1 AND f4.fieldid=5)
JOIN relations as r on
(u.id=r.relatedto AND r.type=2)
JOIN users as u2 on
(u2.id=r.userid AND u2.aclgroup=200)
JOIN dft_formdata as f5 on
(u.id=f5.userid AND f5.formid=1 AND f5.fieldid=4)
JOIN dft_formmvl as m on
(m.id=f5.intval and m.mvlid=1)
JOIN ft_formdata as f6 on
(u.id=f6.userid AND f6.formid=1 AND f6.fieldid=155 AND
f6.intval=3)
WHERE not exists
(select userid from sentletters WHERE userid=u.id);

Nested Loop (cost=0.00..11280.10 rows=1 width=164)
-> Nested Loop (cost=0.00..11276.36 rows=1 width=160)
-> Nested Loop (cost=0.00..11274.33 rows=1 width=144)
-> Nested Loop (cost=0.00..11270.59 rows=1 width=124)
-> Nested Loop (cost=0.00..11268.56 rows=1 width=96)
-> Nested Loop (cost=0.00..10981.55 rows=1 width=88)
-> Nested Loop (cost=0.00..10977.82 rows=1 width=72)
-> Nested Loop (cost=0.00..10974.10 rows=1 width=56)
-> Nested Loop (cost=0.00..10970.37 rows=1 width=32)
-> Seq Scan on users u (cost=0.00..10966.65 rows=1
width=28)
SubPlan
-> Index Scan using sentletters_userid_key on
sentletters (cost=0.00..2.01 rows=1 width=4)
-> Index Scan using dft_formdata_userid_field on
dft_formdata f1 (cost=0.00..3.71 rows=1 width=4)
-> Index Scan using dft_formdata_userid_field on
dft_formdata f2 (cost=0.00..3.71 rows=1 width=24)
-> Index Scan using dft_formdata_userid_field on dft_formdata
f3 (cost=0.00..3.71 rows=1 width=16)
-> Index Scan using dft_formdata_userid_field on dft_formdata
f4 (cost=0.00..3.71 rows=1 width=16)
-> Seq Scan on relations r (cost=0.00..185.43 rows=5079
width=8)
-> Index Scan using users_pkey on users u2 (cost=0.00..2.02
rows=1 width=28)
-> Index Scan using dft_formdata_userid_field on dft_formdata f5
(cost=0.00..3.71 rows=1 width=20)
-> Index Scan using dft_formmvl_pkey on dft_formmvl m
(cost=0.00..2.02 rows=1 width=16)
-> Index Scan using dft_formdata_userid_field on dft_formdata f6
(cost=0.00..3.71 rows=1 width=4)

Yes I know the query itself is really nasty but I think 5 minutes is
a little excessive.

-Michael
_________________________________________________________________
http://fastmail.ca/ - Fast Free Web Email for Canadians
>From pgsql-sql-owner(at)postgresql(dot)org Wed Aug 1 11:51:27 2001
Received: from davinci.ethosmedia.com (davinci.ethosmedia.com [209.10.40.250])
by postgresql.org (8.11.3/8.11.1) with ESMTP id f71FpQf87538
for <pgsql-sql(at)postgresql(dot)org>; Wed, 1 Aug 2001 11:51:26 -0400 (EDT)
(envelope-from josh(at)agliodbs(dot)com)
Received: from [209.10.40.250] (account <josh(at)agliodbs(dot)com>)
by davinci.ethosmedia.com (CommuniGate Pro WebUser 3.4.1)
with HTTP id 95041 for <pgsql-sql(at)postgresql(dot)org>; Wed, 01 Aug 2001 08:51:21 -0700
From: "Josh Berkus" <josh(at)agliodbs(dot)com>
Subject: Re: Converting epoch to timestamp?
To: pgsql-sql(at)postgresql(dot)org
X-Mailer: CommuniGate Pro Web Mailer v.3.4.1
Date: Wed, 01 Aug 2001 08:51:21 -0700
Message-ID: <web-95041(at)davinci(dot)ethosmedia(dot)com>
In-Reply-To: <9309(dot)996674573(at)sss(dot)pgh(dot)pa(dot)us>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Archive-Number: 200108/6
X-Sequence-Number: 4227

Tom,

> regression=# select timestamp(996673954);
> timestamp
> ------------------------
> 2001-08-01 09:52:34-04
> (1 row)
>
> (This last didn't use to work, but it seems fine in 7.0 and 7.1. It
> will fail in 2038 when timestamps stop looking like int4, but by then
> hopefully we'll have changed things...)

Yeah, sure. That's what my father said in 1964 when they talked about
the potential problems with 2-digit dates on the UNIVAC II ...

;-)

-Josh

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2001-08-01 16:23:16 Re: Exclusion List
Previous Message Gunnar Rønning 2001-08-01 14:14:29 Re: When PostgreSQL compliant JDBC 2.0?