Re: force partition pruning

From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: Niels Jespersen <NJN(at)dst(dot)dk>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: force partition pruning
Date: 2021-05-10 12:47:02
Message-ID: CAM+6J96NB=w33TpMc_ikMEtU1R716rjYNyweA1Zei+abZF=gtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I do not know how to put this in words,
but see below when the predicate is explicitly applied to the main table
with partition.

postgres=# \d+ prt1
Partitioned table "public.prt1"
Column | Type | Collation | Nullable | Default | Storage |
Stats target | Description
--------+-------------------+-----------+----------+---------+----------+--------------+-------------
a | integer | | not null | | plain |
|
b | integer | | | | plain |
|
c | character varying | | | | extended |
|
Partition key: RANGE (a)
Partitions: prt1_p1 FOR VALUES FROM (0) TO (250),
prt1_p2 FOR VALUES FROM (250) TO (500),
prt1_p3 FOR VALUES FROM (500) TO (600)

(failed reverse-i-search)`': ^C
postgres=# \d+ b
Table "public.b"
Column | Type | Collation | Nullable | Default | Storage | Stats target
| Description
--------+---------+-----------+----------+---------+---------+--------------+-------------
id | integer | | not null | | plain |
|
Indexes:
"b_id_idx" btree (id)
Access method: heap

postgres=# table b;
id
-----
200
400
(2 rows)

-- basically if the table is joined and predicate can be applied to the
outer table which has constraints matching,
partition pruning takes place.

I do not know the theory, or even what i did is correct, but just FYI.

postgres=# explain analyze select prt1.* from prt1 where a in ( select id
from b where id in (1, 100, 200) );
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=1.05..9.36 rows=2 width=13) (actual
time=0.034..0.074 rows=1 loops=1)
Hash Cond: (prt1.a = b.id)
-> Append (cost=0.00..7.50 rows=300 width=13) (actual
time=0.006..0.043 rows=300 loops=1)
-> Seq Scan on prt1_p1 prt1_1 (cost=0.00..2.25 rows=125
width=13) (actual time=0.005..0.013 rows=125 loops=1)
-> Seq Scan on prt1_p2 prt1_2 (cost=0.00..2.25 rows=125
width=13) (actual time=0.003..0.009 rows=125 loops=1)
-> Seq Scan on prt1_p3 prt1_3 (cost=0.00..1.50 rows=50 width=13)
(actual time=0.002..0.004 rows=50 loops=1)
-> Hash (cost=1.03..1.03 rows=2 width=4) (actual time=0.005..0.005
rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on b (cost=0.00..1.03 rows=2 width=4) (actual
time=0.003..0.003 rows=1 loops=1)
Filter: (id = ANY ('{1,100,200}'::integer[]))
Rows Removed by Filter: 1
Planning Time: 0.181 ms
Execution Time: 0.089 ms
(13 rows)

postgres=# explain analyze select prt1.* from prt1 where a in ( select id
from b where b.id = prt1.a) and a in (1, 100, 200);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Index Scan using iprt1_p1_a on prt1_p1 prt1 (cost=0.14..14.03 rows=2
width=13) (actual time=0.024..0.025 rows=1 loops=1)
Index Cond: (a = ANY ('{1,100,200}'::integer[]))
Filter: (SubPlan 1)
Rows Removed by Filter: 1
SubPlan 1
-> Seq Scan on b (cost=0.00..1.02 rows=1 width=4) (actual
time=0.003..0.003 rows=0 loops=2)
Filter: (id = prt1.a)
Rows Removed by Filter: 1
Planning Time: 0.120 ms
Execution Time: 0.041 ms
(10 rows)

postgres=# explain analyze select prt1.* from prt1 where exists ( select 1
from b where b.id = prt1.a) and a in (1, 100, 200);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=1.04..3.79 rows=2 width=13) (actual
time=0.024..0.028 rows=1 loops=1)
Hash Cond: (prt1.a = b.id)
-> Seq Scan on prt1_p1 prt1 (cost=0.00..2.72 rows=3 width=13) (actual
time=0.011..0.017 rows=2 loops=1)
Filter: (a = ANY ('{1,100,200}'::integer[]))
Rows Removed by Filter: 123
-> Hash (cost=1.02..1.02 rows=2 width=4) (actual time=0.004..0.004
rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on b (cost=0.00..1.02 rows=2 width=4) (actual
time=0.001..0.002 rows=2 loops=1)
Planning Time: 0.192 ms
Execution Time: 0.043 ms
(10 rows)

postgres=# explain analyze select prt1.* from prt1 inner join b on prt1.a
= b.id where a in (1, 100, 200);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Hash Join (cost=1.04..3.79 rows=2 width=13) (actual time=0.024..0.028
rows=1 loops=1)
Hash Cond: (prt1.a = b.id)
-> Seq Scan on prt1_p1 prt1 (cost=0.00..2.72 rows=3 width=13) (actual
time=0.012..0.018 rows=2 loops=1)
Filter: (a = ANY ('{1,100,200}'::integer[]))
Rows Removed by Filter: 123
-> Hash (cost=1.02..1.02 rows=2 width=4) (actual time=0.003..0.004
rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on b (cost=0.00..1.02 rows=2 width=4) (actual
time=0.001..0.002 rows=2 loops=1)
Planning Time: 0.181 ms
Execution Time: 0.043 ms
(10 rows)

On Mon, 10 May 2021 at 17:09, Niels Jespersen <NJN(at)dst(dot)dk> wrote:

> Hi all
>
>
>
> I need a litte advice on how to
>
>
>
> Postgres 13.2
>
>
>
> A metadata query pulls partition keys:
>
>
>
> select m.period_version from register.register_metadata m where
> m.current_version and m.period between '201712' and '201912' ;
>
>
>
> A query using these in an in-list easily makes the planner do partition
> pruning.
>
>
>
> select * from register.register d where d.period_version in ('201712_1',
> '201812_1', '201912_1');
>
>
>
> However combining the metadataquery into the dataquery makes the planner
> decide to scan all partitions.
>
>
>
> select * from register.register d where d.period_version in (select
> m.period_version from register.register_metadata m where m.current_version
> and m.period between '201712' and '201912');
>
>
>
> I am quite aware that the latter query requires partition pruning to take
> place during execution not during planning.
>
>
>
> My question here is how do I package the two-step proces into an interface
> that analysts can actually use?
>
>
>
> One possibility is to have a prepare step that creates a temporary view
> with the hard-coded values built-in. And then query data via the temp view.
> This works ok, but there is an issue with possible naming conflicts on the
> temp view (not that this could not be worked around).
>
>
>
> Ideally I would like a function to figure out the query and then return
> the data from that dynamically executed query. Complicating matters is the
> fact that there are more than one set of data/metatable tables and each
> datatable has a different set of columns. This excludes a table returning
> function since that must list the columns present.
>
>
>
>
>
> Best regards
>
>
>
>
>
> *Niels Jespersen*
>
> Chief Adviser
>
> IT Center
>
> Mobile phone:+45 42 42 93 73
> Email: njn(at)dst(dot)dk
>
> Statistics Denmark, Sejrøgade 11, DK-2100 Copenhagen
>
> www.dst.dk/en | Twitter <https://twitter.com/dstdk> | LinkedIn
> <https://www.linkedin.com/company/statistics-denmark/> | Facebook
> <https://www.facebook.com/danmarksstatistik>
>
>
>
>
>
>
>
>
>
>
>
>
>

--
Thanks,
Vijay
Mumbai, India

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Nolan 2021-05-10 16:08:19 Re: Copyright vs Licence
Previous Message cen 2021-05-10 11:44:36 Re: Copyright vs Licence