Re: Poor Performance with Distinct Subqueries with EXISTS and EXCEPT

From: Thomas F(dot)O'Connell <tfo(at)sitening(dot)com>
To: Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com>
Cc: "PgSql General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Poor Performance with Distinct Subqueries with EXISTS and EXCEPT
Date: 2004-12-04 07:58:26
Message-ID: 474D5EBE-45CA-11D9-B555-000D93AE0944@sitening.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Interestingly, I tried the new version with and without enable_seqscan
on, and the version without indexes performs better because, I think,
it returns more rows than an index lookup would enhance.

Thanks again for your help. This is certainly an improvement over my
original version.

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Dec 2, 2004, at 6:42 AM, Pierre-Frédéric Caillaud wrote:

> Let's re-take your query from the start. At each step you should
> explain analyze the query to check if it runs smoothly.
>
> 1. You want the messages which have no actions. Rather than a
> subselect, I'd use a LEFT JOIN :
>
> untested syntax :
> SELECT m.id FROM message m LEFT JOIN message_action ma ON
> m.id=ma.messages_id WHERE ma.messages_id IS NULL;
>
> On my machine, I have a zones table with 3000 rows and a cities table
> with 2 million rows, each place having a zone_id :
>
> EXPLAIN ANALYZE SELECT z.zone_id FROM geo.zones z LEFT JOIN geo.cities
> c ON c.zone_id=z.zone_id WHERE c.id IS NULL;
> Merge Left Join (cost=0.00..142063.06 rows=3663 width=4) (actual
> time=8726.203..8726.203 rows=0 loops=1)
> Merge Cond: ("outer".zone_id = "inner".zone_id)
> Filter: ("inner".id IS NULL)
> -> Index Scan using zones_pkey on zones z (cost=0.00..99.10
> rows=3663 width=4) (actual time=15.027..43.987 rows=3663 loops=1)
> -> Index Scan using cities_zones_idx on cities c
> (cost=0.00..116030.55 rows=2073935 width=8) (actual
> time=25.164..5823.496 rows=2073935 loops=1)
> Total runtime: 8726.327 ms
> (6 lignes)
>
> 8 seconds, this gives you an idea with that many records.
> You should check your indexes are used !
>
> Now you have the messages which have no actions, you must get the
> user email domains :
>
> SELECT split_part( u.email, '@', 2 ) as domain
> FROM users u, message m
> LEFT JOIN message_action ma ON m.id=ma.messages_id
> WHERE u.id=m.user_id
> AND ma.messages_id IS NULL;
>
> Can you time this query ? Are the indexes used ?
> Now, let's remove the duplicates :
>
> SELECT split_part( u.email, '@', 2 ) as domain
> FROM users u, message m
> LEFT JOIN message_action ma ON m.id=ma.messages_id
> WHERE u.id=m.user_id
> AND ma.messages_id IS NULL
> GROUP By domain;
>
> GROUP BY is faster than DISTINCT (in some cases).
>
> How does it go ?

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Együd Csaba 2004-12-04 07:58:52 Postgres8 win2k server autovacuum
Previous Message OpenMacNews 2004-12-04 07:35:30 [SOLVED] Re: initdb error: "could not identify current directory" (or, what have i done now?)