From: | panam <panam(at)gmx(dot)net> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Hash Anti Join performance degradation |
Date: | 2011-05-26 16:08:07 |
Message-ID: | 1306426087061-4429125.post@n5.nabble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Hi all,
Cédric Villemain-3 wrote:
>
> without explaining further why the antijoin has bad performance
> without cluster, I wonder why you don't use this query :
>
> SELECT b.id,
> max(m.id)
> FROM box b, message m
> WHERE m.box_id = b.id
> GROUP BY b.id;
>
> looks similar and fastest.
>
I actually did use a similar strategy in the meantime (during my problem
with the "left join" query we are talking about here all the time) for
mitigation.
It was
SELECT MAX(e.id) FROM event_message e WHERE e.box_id = id
and it performed worse in comparison to the "left join" query in the general
case (i.e. before my problems began).
At the end of this post is an explanation why I think I cannot use the
solution you suggested above.
Kevin Grittner wrote:
>
> Each connection can allocate work_mem, potentially several times.
> On a machines without hundreds of GB of RAM, that pair of settings
> could cause severe swapping.
>
Indeed, thanks for the warning. These settings are not for production but to
exclude a performance degradation because of small cache sizes.
Kevin Grittner wrote:
>
> I think you would need a left join to actually get identical
> results:
>
> SELECT b.id, max(m.id)
> FROM box b
> LEFT JOIN message m ON m.box_id = b.id
> GROUP BY b.id;
>
> But yeah, I would expect this approach to be much faster. Rather
> easier to understand and harder to get wrong, too.
>
>
Correct, it is much faster, even with unclustered ids.
However, I think I cannot use it because of the way that query is generated
(by hibernate).
The (simplyfied) base query is just
SELECT b.id from box
the subquery
(SELECT m1.id FROM message m1
LEFT JOIN message m2
ON (m1.box_id = m2.box_id AND m1.id < m2.id )
WHERE m2.id IS NULL AND m1.box_id = b.id) as lastMessageId
is due to a hibernate formula (containing more or less plain SQL) to
determine the last message id for that box. It ought to return just one row,
not multiple. So I am constrained to the subquery in all optimization
attemps (I cannot combine them as you did), at least I do not see how. If
you have an idea for a more performant subquery though, let me know, as this
can easily be replaced.
Thanks for your help and suggestions
panam
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Hash-Anti-Join-performance-degradation-tp4420974p4429125.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2011-05-26 16:14:38 | Re: "errno" not set in case of "libm" functions (HPUX) |
Previous Message | Euler Taveira de Oliveira | 2011-05-26 16:00:05 | Re: BUG #6041: Unlogged table was created bad in slave node |
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Johansen | 2011-05-26 16:21:45 | Re: LIMIT and UNION ALL |
Previous Message | Merlin Moncure | 2011-05-26 16:02:16 | Re: The shared buffers challenge |