From: | Thomas F(dot)O'Connell <tfo(at)sitening(dot)com> |
---|---|
To: | Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com> |
Cc: | "PgSql General" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Poor Performance with Distinct Subqueries with EXISTS and EXCEPT |
Date: | 2004-12-04 07:58:26 |
Message-ID: | 474D5EBE-45CA-11D9-B555-000D93AE0944@sitening.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Interestingly, I tried the new version with and without enable_seqscan
on, and the version without indexes performs better because, I think,
it returns more rows than an index lookup would enhance.
Thanks again for your help. This is certainly an improvement over my
original version.
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Dec 2, 2004, at 6:42 AM, Pierre-Frédéric Caillaud wrote:
> Let's re-take your query from the start. At each step you should
> explain analyze the query to check if it runs smoothly.
>
> 1. You want the messages which have no actions. Rather than a
> subselect, I'd use a LEFT JOIN :
>
> untested syntax :
> SELECT m.id FROM message m LEFT JOIN message_action ma ON
> m.id=ma.messages_id WHERE ma.messages_id IS NULL;
>
> On my machine, I have a zones table with 3000 rows and a cities table
> with 2 million rows, each place having a zone_id :
>
> EXPLAIN ANALYZE SELECT z.zone_id FROM geo.zones z LEFT JOIN geo.cities
> c ON c.zone_id=z.zone_id WHERE c.id IS NULL;
> Merge Left Join (cost=0.00..142063.06 rows=3663 width=4) (actual
> time=8726.203..8726.203 rows=0 loops=1)
> Merge Cond: ("outer".zone_id = "inner".zone_id)
> Filter: ("inner".id IS NULL)
> -> Index Scan using zones_pkey on zones z (cost=0.00..99.10
> rows=3663 width=4) (actual time=15.027..43.987 rows=3663 loops=1)
> -> Index Scan using cities_zones_idx on cities c
> (cost=0.00..116030.55 rows=2073935 width=8) (actual
> time=25.164..5823.496 rows=2073935 loops=1)
> Total runtime: 8726.327 ms
> (6 lignes)
>
> 8 seconds, this gives you an idea with that many records.
> You should check your indexes are used !
>
> Now you have the messages which have no actions, you must get the
> user email domains :
>
> SELECT split_part( u.email, '@', 2 ) as domain
> FROM users u, message m
> LEFT JOIN message_action ma ON m.id=ma.messages_id
> WHERE u.id=m.user_id
> AND ma.messages_id IS NULL;
>
> Can you time this query ? Are the indexes used ?
> Now, let's remove the duplicates :
>
> SELECT split_part( u.email, '@', 2 ) as domain
> FROM users u, message m
> LEFT JOIN message_action ma ON m.id=ma.messages_id
> WHERE u.id=m.user_id
> AND ma.messages_id IS NULL
> GROUP By domain;
>
> GROUP BY is faster than DISTINCT (in some cases).
>
> How does it go ?
From | Date | Subject | |
---|---|---|---|
Next Message | Együd Csaba | 2004-12-04 07:58:52 | Postgres8 win2k server autovacuum |
Previous Message | OpenMacNews | 2004-12-04 07:35:30 | [SOLVED] Re: initdb error: "could not identify current directory" (or, what have i done now?) |