From: | "Kevin Viraud" <kevin(dot)viraud(at)rocket-internet(dot)de> |
---|---|
To: | <pgsql-performance(at)postgresql(dot)org> |
Subject: | Weird CASE WHEN behaviour causing query to be suddenly very slow |
Date: | 2015-03-31 08:53:35 |
Message-ID: | 00a001d06b90$2d1bbe80$87533b80$@rocket-internet.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi,
I have an issue with a rather large CASE WHEN and I cannot figure out why
it is so slow...
First, here is my test query :
SELECT CASE WHEN dwh_company_id = 1
THEN CASE
WHEN wv.source ~ '^$' THEN 'Not tracked'
WHEN wv.source ~ '^1$' THEN 'Not tracked1'
WHEN wv.source ~ '^2$' THEN 'Not tracked2'
WHEN wv.source ~ '^3$' THEN 'Not tracked3'
WHEN wv.source ~ '^4$' THEN 'Not tracked4'
WHEN wv.source ~ '^5$' THEN 'Not tracked5'
WHEN wv.source ~ '^6$' THEN 'Not tracked6'
WHEN wv.source ~ '^7$' THEN 'Not tracked7'
WHEN wv.source ~ '^8$' THEN 'Not tracked8'
WHEN wv.source ~ '^9$' THEN 'Not tracked9'
WHEN wv.source ~ '^10$' THEN 'Not tracked10'
WHEN wv.source ~ '^11$' THEN 'Not tracked11'
WHEN wv.source ~ '^12$' THEN 'Not tracked12'
WHEN wv.source ~ '^13$' THEN 'Not tracked13'
WHEN wv.source ~ '^14$' THEN 'Not tracked14'
WHEN wv.source ~ '^15$' THEN 'Not tracked15'
WHEN wv.source ~ '^16$' THEN 'Not tracked16'
WHEN wv.source ~ '^17$' THEN 'Not tracked17'
WHEN wv.source ~ '^18$' THEN 'Not tracked18'
WHEN wv.source ~ '^19$' THEN 'Not tracked19'
WHEN wv.source ~ '^20$' THEN 'Not tracked20'
WHEN wv.source ~ '^21$' THEN 'Not tracked21'
WHEN wv.source ~ '^22$' THEN 'Not tracked22'
WHEN wv.source ~ '^23$' THEN 'Not tracked23'
WHEN wv.source ~ '^24$' THEN 'Not tracked24'
WHEN wv.source ~ '^25$' THEN 'Not tracked25'
WHEN wv.source ~ '^26$' THEN 'Not tracked26'
WHEN wv.source ~ '^27$' THEN 'Not tracked27'
WHEN wv.source ~ '^28$' THEN 'Not tracked28'
--WHEN wv.source ~ '^29$' THEN 'Not tracked29'
WHEN wv.source ~ '^30$' THEN 'Not tracked30'
WHEN wv.source ~ '^31$' THEN 'Not tracked31'
WHEN wv.source ~ '^32$' THEN 'Not tracked32'
END
ELSE
'Others'
END as channel
FROM (
SELECT wv.id,
wv.ga_id,
split_part(wv.ga_source_medium, ' / ',
1) as source,
ga.dwh_source_id,
s.dwh_company_id
FROM marketing.web_visits wv
INNER JOIN dwh_metadata.google_analytics ga
ON ga.ga_id = wv.ga_id
INNER JOIN dwh_manager.sources s ON
ga.dwh_source_id =s.dwh_source_id
--WHERE s.dwh_company_id = 1
LIMIT 100000
) wv
This is a pretty simple case, my subquery (or CTE when using WITH
statement) should return 5 fields with more or less this structure :
Id : character(32)
Ga_id : bigint
Source : character(32)
Medium : character(32)
dwh_company_id : bigint
On top of which I apply a case when statement.
Now the weird thing is, using this query I notice a significant drop in
performance as the "case when" is getting bigger. If I run the query as if,
I get the following exec plain and execution time:
Subquery Scan on wv (cost=6.00..29098.17 rows=100000 width=36) (actual
time=0.828..22476.917 rows=100000 loops=1)
Buffers: shared hit=3136
-> Limit (cost=6.00..11598.17 rows=100000 width=58) (actual
time=0.209..133.429 rows=100000 loops=1)
Buffers: shared hit=3136
-> Hash Join (cost=6.00..1069811.24 rows=9228690 width=58)
(actual time=0.208..119.297 rows=100000 loops=1)
Hash Cond: (wv_1.ga_id = ga.ga_id)
Buffers: shared hit=3136
-> Seq Scan on web_visits wv_1 (cost=0.00..877005.78
rows=20587078 width=50) (actual time=0.004..18.412 rows=100000 loops=1)
Buffers: shared hit=3133
-> Hash (cost=5.50..5.50 rows=40 width=12) (actual
time=0.184..0.184 rows=111 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 5kB
Buffers: shared hit=3
-> Hash Join (cost=1.88..5.50 rows=40 width=12)
(actual time=0.056..0.148 rows=111 loops=1)
Hash Cond: (ga.dwh_source_id = s.dwh_source_id)
Buffers: shared hit=3
-> Seq Scan on google_analytics ga
(cost=0.00..2.89 rows=89 width=8) (actual time=0.007..0.028 rows=111
loops=1)
Buffers: shared hit=2
-> Hash (cost=1.39..1.39 rows=39 width=8)
(actual time=0.042..0.042 rows=56 loops=1)
Buckets: 1024 Batches: 1 Memory Usage:
3kB
Buffers: shared hit=1
-> Seq Scan on sources s (cost=0.00..1.39
rows=39 width=8) (actual time=0.005..0.020 rows=56 loops=1)
Buffers: shared hit=1
Planning time: 0.599 ms
Execution time: 22486.216 ms
Then try commenting out only one line in the case when and the query run 10x
faster :
Subquery Scan on wv (cost=6.00..28598.17 rows=100000 width=36) (actual
time=0.839..2460.002 rows=100000 loops=1)
Buffers: shared hit=3136
-> Limit (cost=6.00..11598.17 rows=100000 width=58) (actual
time=0.210..112.043 rows=100000 loops=1)
Buffers: shared hit=3136
-> Hash Join (cost=6.00..1069811.24 rows=9228690 width=58)
(actual time=0.209..99.513 rows=100000 loops=1)
Hash Cond: (wv_1.ga_id = ga.ga_id)
Buffers: shared hit=3136
-> Seq Scan on web_visits wv_1 (cost=0.00..877005.78
rows=20587078 width=50) (actual time=0.004..14.048 rows=100000 loops=1)
Buffers: shared hit=3133
-> Hash (cost=5.50..5.50 rows=40 width=12) (actual
time=0.184..0.184 rows=111 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 5kB
Buffers: shared hit=3
-> Hash Join (cost=1.88..5.50 rows=40 width=12)
(actual time=0.058..0.146 rows=111 loops=1)
Hash Cond: (ga.dwh_source_id = s.dwh_source_id)
Buffers: shared hit=3
-> Seq Scan on google_analytics ga
(cost=0.00..2.89 rows=89 width=8) (actual time=0.007..0.025 rows=111
loops=1)
Buffers: shared hit=2
-> Hash (cost=1.39..1.39 rows=39 width=8)
(actual time=0.042..0.042 rows=56 loops=1)
Buckets: 1024 Batches: 1 Memory Usage:
3kB
Buffers: shared hit=1
-> Seq Scan on sources s (cost=0.00..1.39
rows=39 width=8) (actual time=0.006..0.021 rows=56 loops=1)
Buffers: shared hit=1
Planning time: 0.583 ms
Execution time: 2467.484 ms
Why this drop in performance for only one (in this simple example) condition
? I do not really understand it. If I add more conditions to the query (let
say 1 or 2) it is also getting slower. And it's not a few ms, it is around 5
sec or so. (which is huge considering I only take in my example 1/500 of my
data with LIMIT.
Before we deviate from the problem I have (which is why the sudden drop of
performance) let me clarify a few things about this query :
- The purpose is not to rewrite it, with a join or whatever, the
case when actually comes from a function which is auto-generated by another
app we have
- My example is pretty simple and regex expressions could be
replaced by equals, the real case when query contains way more complicated
regex
- This is subset of my CASE WHEN, it is much bigger, I cut it at
the "bottleneck" point for this post.
Thanks a lot.
Best Regards,
Kevin
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2015-03-31 09:08:58 | Re: Weird CASE WHEN behaviour causing query to be suddenly very slow |
Previous Message | Tom Lane | 2015-03-30 15:52:09 | Re: views much slower in 9.3 than 8.4 |