From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, panam <panam(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PERFORM] Hash Anti Join performance degradation |
Date: | 2011-06-01 11:40:27 |
Message-ID: | BANLkTim-DqDC2AbVJ_1t-XAS4NYq2tQYZg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
On Tue, May 31, 2011 at 11:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> With respect to the root of the issue (why does the anti-join take so
>> long?), my first thought was that perhaps the OP was very unlucky and
>> had a lot of values that hashed to the same bucket. But that doesn't
>> appear to be the case.
>
> Well, yes it is. Notice what the subquery is doing: for each row in
> "box", it's pulling all matching "box_id"s from message and running a
> self-join across those rows. The hash join condition is a complete
> no-op. And some of the box_ids have hundreds of thousands of rows.
>
> I'd just write it off as being a particularly stupid way to find the
> max(), except I'm not sure why deleting just a few thousand rows
> improves things so much. It looks like it ought to be an O(N^2)
> situation, so the improvement should be noticeable but not amazing.
Yeah, this is what I was getting at, though perhaps I didn't say it
well. If the last 78K rows were particularly pathological in some
way, that might explain something, but as far as one can see they are
not a whole heck of a lot different from the rest of the data.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Page | 2011-06-01 12:04:26 | Re: pg_listener in 9.0 |
Previous Message | Heikki Linnakangas | 2011-06-01 11:37:41 | Re: Cube Index Size |
From | Date | Subject | |
---|---|---|---|
Next Message | panam | 2011-06-01 12:40:55 | Re: [PERFORM] Hash Anti Join performance degradation |
Previous Message | Reuven M. Lerner | 2011-06-01 09:19:30 | Re: Speeding up loops in pl/pgsql function |