Re: pg12 partitions show bad performance vs pg96

From: Mariel Cherkassky <mariel(dot)cherkassky(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: pg12 partitions show bad performance vs pg96
Date: 2020-03-09 10:15:50
Message-ID: CA+t6e1mx_KiR735zbPFmkG-HBmLOOGRX3-ZMWNok7F-ieTdB3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

*8 ms seems pretty slow to planning that query. Does the planning timedrop
if you execute this multiple times in the same session? Does thetime change
if you try again without any foreign keys? *

No one is using the system besides me, therefore after running the query
one time
most of the data is in the cache... If I run it multiple times the query
time is reduced :
Planning Time: 0.361 ms
Execution Time: 0.110 ms

*Can you share the results of that?*
Sure. I did the same procedure but this time I inserted 100m records
instead of 10m. This time the results were by far worse in pg12 :

PG12 :
postgres=# explain analyze select * from iot_data da,iot_device de where
de.name in ('50a') and de.id = da.device and da.metadata=50 and
lower(da.data) like '50';

QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1002.26..1563512.35 rows=10 width=44) (actual
time=95161.056..95218.764 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=2.26..1562511.35 rows=4 width=44) (actual
time=94765.860..94765.861 rows=0 loops=3)
Hash Cond: (da_2.device = de.id)
-> Parallel Append (cost=0.00..1562506.90 rows=814 width=37)
(actual time=94765.120..94765.120 rows=0 loops=3)
-> Parallel Seq Scan on iot_data_2 da_2
(cost=0.00..584064.14 rows=305 width=37) (actual time=36638.829..36638.829
rows=0 loops=3)
Filter: ((metadata = 50) AND (lower(data) ~~
'50'::text))
Rows Removed by Filter: 12460009
-> Parallel Seq Scan on iot_data_1 da_1
(cost=0.00..504948.69 rows=262 width=36) (actual time=43990.427..43990.427
rows=0 loops=2)
Filter: ((metadata = 50) AND (lower(data) ~~
'50'::text))
Rows Removed by Filter: 16158316
-> Parallel Seq Scan on iot_data_0 da
(cost=0.00..473490.00 rows=247 width=37) (actual time=86396.665..86396.665
rows=0 loops=1)
Filter: ((metadata = 50) AND (lower(data) ~~
'50'::text))
Rows Removed by Filter: 30303339
-> Hash (cost=2.25..2.25 rows=1 width=7) (never executed)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1
width=7) (never executed)
Filter: (name = '50a'::text)
Planning Time: 45.724 ms
Execution Time: 95252.712 ms
(20 rows)

PG96 :
postgres=# explain analyze select * from iot_data da,iot_device de where
de.name in ('50a') and de.id = da.device and da.metadata=50 and
lower(da.data) like '50';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2583361.51 rows=20 width=44) (actual
time=18345.229..18345.229 rows=0 loops=1)
Join Filter: (da.device = de.id)
-> Seq Scan on iot_device de (cost=0.00..2.25 rows=1 width=7) (actual
time=0.022..0.037 rows=1 loops=1)
Filter: (name = '50a'::text)
Rows Removed by Filter: 99
-> Seq Scan on iot_data da (cost=0.00..2583334.84 rows=1954 width=37)
(actual time=18345.187..18345.187 rows=0 loops=1)
Filter: ((metadata = 50) AND (lower(data) ~~ '50'::text))
Rows Removed by Filter: 100000000
Planning time: 35.450 ms
Execution time: 18345.301 ms
(10 rows)

*The most basic guidelines for table partitioning are, don't partitionyour
tables unless it's a net win. If partitioning was alwaysfaster, we'd just
have designed Postgres to implicitly partition allof your tables for you.
There are some other guidelines in [1].*

Isnt creating partition should increase the execution time ? I mean,
instead of running on a table with 10m records, I can run over a partition
with 3m records. isnt less data means better performance for simple queries
like the one I used ?
I read the best practice for the docs, and I think that I met most of them
- I chose the right partition key(in this case it was device),
Regarding the amount of partitions - I choose 3 just to test the results. I
didnt create a lot of partitions, and my logic tells me that querying a
table with 3m records should be faster than 10m records.. Am I missing
something ?

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mariel Cherkassky 2020-03-09 10:31:15 Re: pg12 partitions show bad performance vs pg96
Previous Message Justin Pryzby 2020-03-09 03:10:05 Re: pg12 partitions show bad performance vs pg96