From: | Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com> |
---|---|
To: | panam <panam(at)gmx(dot)net> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Hash Anti Join performance degradation |
Date: | 2011-05-26 18:13:22 |
Message-ID: | BANLkTimqewBM=G47Qdp+6D7nsBQG8FN7kA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
2011/5/26 panam <panam(at)gmx(dot)net>:
> Hi all,
>
>
> Cédric Villemain-3 wrote:
>>
>> without explaining further why the antijoin has bad performance
>> without cluster, I wonder why you don't use this query :
>>
>> SELECT b.id,
>> max(m.id)
>> FROM box b, message m
>> WHERE m.box_id = b.id
>> GROUP BY b.id;
>>
>> looks similar and fastest.
>>
> I actually did use a similar strategy in the meantime (during my problem
> with the "left join" query we are talking about here all the time) for
> mitigation.
> It was
> SELECT MAX(e.id) FROM event_message e WHERE e.box_id = id
> and it performed worse in comparison to the "left join" query in the general
> case (i.e. before my problems began).
> At the end of this post is an explanation why I think I cannot use the
> solution you suggested above.
>
>
> Kevin Grittner wrote:
>>
>> Each connection can allocate work_mem, potentially several times.
>> On a machines without hundreds of GB of RAM, that pair of settings
>> could cause severe swapping.
>>
> Indeed, thanks for the warning. These settings are not for production but to
> exclude a performance degradation because of small cache sizes.
>
>
> Kevin Grittner wrote:
>>
>> I think you would need a left join to actually get identical
>> results:
>>
>> SELECT b.id, max(m.id)
>> FROM box b
>> LEFT JOIN message m ON m.box_id = b.id
>> GROUP BY b.id;
>>
>> But yeah, I would expect this approach to be much faster. Rather
>> easier to understand and harder to get wrong, too.
>>
>>
> Correct, it is much faster, even with unclustered ids.
> However, I think I cannot use it because of the way that query is generated
> (by hibernate).
> The (simplyfied) base query is just
>
> SELECT b.id from box
>
> the subquery
>
> (SELECT m1.id FROM message m1
> LEFT JOIN message m2
> ON (m1.box_id = m2.box_id AND m1.id < m2.id )
> WHERE m2.id IS NULL AND m1.box_id = b.id) as lastMessageId
>
> is due to a hibernate formula (containing more or less plain SQL) to
> determine the last message id for that box. It ought to return just one row,
> not multiple. So I am constrained to the subquery in all optimization
> attemps (I cannot combine them as you did), at least I do not see how. If
> you have an idea for a more performant subquery though, let me know, as this
> can easily be replaced.
In production, if you have a decent IO system, you can lower
random_page_cost and it may be faster using index (by default, with
the use case you provided it choose a seqscan). It can be a bit tricky
if you have to lower random_page_cost so much that it destroy others
query plan but increase the perf of the current one. if it happens,
post again :) (sometime need to change other cost parameters but it
needs to be handle with care)
I am not an hibernate expert, but I'll surprised if you can not drive
hibernate to do what you want.
>
> Thanks for your help and suggestions
> panam
>
> --
> View this message in context: http://postgresql.1045698.n5.nabble.com/Hash-Anti-Join-performance-degradation-tp4420974p4429125.html
> Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
From | Date | Subject | |
---|---|---|---|
Next Message | Joshua D. Drake | 2011-05-26 18:21:42 | #PgWest 2011: CFP now open |
Previous Message | Kevin Grittner | 2011-05-26 18:05:55 | Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum |
From | Date | Subject | |
---|---|---|---|
Next Message | Tory M Blue | 2011-05-26 22:34:55 | Performance block size. |
Previous Message | panam | 2011-05-26 18:04:35 | Re: Hash Anti Join performance degradation |