From: | Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com> |
---|---|
To: | panam <panam(at)gmx(dot)net> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Hash Anti Join performance degradation |
Date: | 2011-05-24 11:16:44 |
Message-ID: | BANLkTi=EB-LzgBaFy7BEzKzuaDNmrVX9uw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
2011/5/24 panam <panam(at)gmx(dot)net>:
> Hi,
>
> In my DB, there is a large table containing messages and one that contains
> message boxes.
> Messages are assigned to boxes via a child parent link m->b.
> In order to obtain the last message for a specific box, I use the following
> SQL:
>
> SELECT m1.id FROM message m1 LEFT JOIN message m2 ON (m1.box_id = m2.box_id
> AND m1.id < m2.id) WHERE m2.id IS NULL AND m1.box_id = id;
>
> This worked quite well for a long time. But now, suddenly the performance of
> this query drastically degraded as new messages were added.
> If these new messages are removed again, everything is back to normal. If
> other messages are removed instead, the problem remains, so it does not seem
> to be a memory issue. I fear I have difficulties to understand what is going
> wrong here.
We need more information here. The case is in fact interesting.
What's the PostgreSQL version, and did you have log of vacuum and
checkpoint activity ? (no vacuum full/cluster or such thing running ?)
Obvisouly, Craig suggestion to read
http://wiki.postgresql.org/wiki/SlowQueryQuestions is relevant as it
helps to have all common information required to analyze the issue.
>
> This is the query plan when everything is fine:
>
> "Seq Scan on public.box this_ (cost=0.00..10467236.32 rows=128 width=696)
> (actual time=0.169..7683.978 rows=128 loops=1)"
> " Output: this_.id, this_.login, (SubPlan 1)"
> " Buffers: shared hit=188413 read=94635 written=135, temp read=22530
> written=22374"
> " SubPlan 1"
> " -> Hash Anti Join (cost=41323.25..81775.25 rows=20427 width=8)
> (actual time=59.571..59.877 rows=1 loops=128)"
> " Output: m1.id"
> " Hash Cond: (m1.box_id = m2.box_id)"
> " Join Filter: (m1.id < m2.id)"
> " Buffers: shared hit=188412 read=94633 written=135, temp
> read=22530 written=22374"
> " -> Bitmap Heap Scan on public.message m1 (cost=577.97..40212.28
> rows=30640 width=16) (actual time=3.152..9.514 rows=17982 loops=128)"
> " Output: m1.id, m1.box_id"
> " Recheck Cond: (m1.box_id = $0)"
> " Buffers: shared hit=131993 read=9550 written=23"
> " -> Bitmap Index Scan on "message_box_Idx"
> (cost=0.00..570.31 rows=30640 width=0) (actual time=2.840..2.840 rows=18193
> loops=128)"
> " Index Cond: (m1.box_id = $0)"
> " Buffers: shared hit=314 read=6433 written=23"
> " -> Hash (cost=40212.28..40212.28 rows=30640 width=16) (actual
> time=26.840..26.840 rows=20014 loops=115)"
> " Output: m2.box_id, m2.id"
> " Buckets: 4096 Batches: 4 (originally 2) Memory Usage:
> 5444kB"
> " Buffers: shared hit=56419 read=85083 written=112, temp
> written=7767"
> " -> Bitmap Heap Scan on public.message m2
> (cost=577.97..40212.28 rows=30640 width=16) (actual time=2.419..20.007
> rows=20014 loops=115)"
> " Output: m2.box_id, m2.id"
> " Recheck Cond: (m2.box_id = $0)"
> " Buffers: shared hit=56419 read=85083 written=112"
> " -> Bitmap Index Scan on "message_box_Idx"
> (cost=0.00..570.31 rows=30640 width=0) (actual time=2.166..2.166 rows=20249
> loops=115)"
> " Index Cond: (m2.box_id = $0)"
> " Buffers: shared hit=6708"
> "Total runtime: 7685.202 ms"
>
> This is the plan when the query gets sluggish:
>
> "Seq Scan on public.box this_ (cost=0.00..10467236.32 rows=128 width=696)
> (actual time=0.262..179333.086 rows=128 loops=1)"
> " Output: this_.id, this_.login, (SubPlan 1)"
> " Buffers: shared hit=189065 read=93983 written=10, temp read=22668
> written=22512"
> " SubPlan 1"
> " -> Hash Anti Join (cost=41323.25..81775.25 rows=20427 width=8)
> (actual time=1264.700..1400.886 rows=1 loops=128)"
> " Output: m1.id"
> " Hash Cond: (m1.box_id = m2.box_id)"
> " Join Filter: (m1.id < m2.id)"
> " Buffers: shared hit=189064 read=93981 written=10, temp read=22668
> written=22512"
> " -> Bitmap Heap Scan on public.message m1 (cost=577.97..40212.28
> rows=30640 width=16) (actual time=3.109..9.850 rows=18060 loops=128)"
> " Output: m1.id, m1.box_id"
> " Recheck Cond: (m1.box_id = $0)"
> " Buffers: shared hit=132095 read=9448"
> " -> Bitmap Index Scan on "message_box_Idx"
> (cost=0.00..570.31 rows=30640 width=0) (actual time=2.867..2.867 rows=18193
> loops=128)"
> " Index Cond: (m1.box_id = $0)"
> " Buffers: shared hit=312 read=6435"
> " -> Hash (cost=40212.28..40212.28 rows=30640 width=16) (actual
> time=27.533..27.533 rows=20102 loops=115)"
> " Output: m2.box_id, m2.id"
> " Buckets: 4096 Batches: 4 (originally 2) Memory Usage:
> 5522kB"
> " Buffers: shared hit=56969 read=84533 written=10, temp
> written=7811"
> " -> Bitmap Heap Scan on public.message m2
> (cost=577.97..40212.28 rows=30640 width=16) (actual time=2.406..20.492
> rows=20102 loops=115)"
> " Output: m2.box_id, m2.id"
> " Recheck Cond: (m2.box_id = $0)"
> " Buffers: shared hit=56969 read=84533 written=10"
> " -> Bitmap Index Scan on "message_box_Idx"
> (cost=0.00..570.31 rows=30640 width=0) (actual time=2.170..2.170 rows=20249
> loops=115)"
> " Index Cond: (m2.box_id = $0)"
> " Buffers: shared hit=6708"
> "Total runtime: 179334.310 ms"
>
>
> So from my limited experience, the only significant difference I see is that
> the Hash Anti Join takes a lot more time in plan 2, but I do not understand
> why.
> Ideas somebody?
>
> Thanks
> panam
>
>
> --
> View this message in context: http://postgresql.1045698.n5.nabble.com/Hash-Anti-Join-performance-degradation-tp4420974p4420974.html
> Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
From | Date | Subject | |
---|---|---|---|
Next Message | Emanuel Calvo | 2011-05-24 11:30:27 | Re: Error compiling sepgsql in PG9.1 |
Previous Message | Kohei KaiGai | 2011-05-24 10:57:26 | Re: sepgsql: fix relkind handling on foreign tables |
From | Date | Subject | |
---|---|---|---|
Next Message | panam | 2011-05-24 14:34:57 | Re: Hash Anti Join performance degradation |
Previous Message | Vitalii Tymchyshyn | 2011-05-24 09:12:34 | Re: reducing random_page_cost from 4 to 2 to force index scan |