Re: query against single partition uses index, against master table does seq scan

From: Mike Broers <mbroers(at)gmail(dot)com>
To: Ganesh Kannan <ganesh(dot)kannan(at)weatheranalytics(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query against single partition uses index, against master table does seq scan
Date: 2016-09-21 17:37:19
Message-ID: CAB9893h+dWjufZLATAJmpmF1kc91cy_TyxHCy6+g=Aoz1H=3cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks for your response - Is 'selectively choosing what partition'
different than utilizing each partitions index when scanning each
partition? To clarify, I expect to find results in each partition, but to
have postgres use each partitions index instead of full table scans. It
seems redundant to add a where clauses to match each exclusion criteria but
i will try that and report back - thank you for the suggestion.

On Wed, Sep 21, 2016 at 12:15 PM, Ganesh Kannan <
ganesh(dot)kannan(at)weatheranalytics(dot)com> wrote:

> Postgres does not have capability to selectively choose child tables
> unless the query's "WHERE" clause is simple, and it matches (exactly) the
> CHECK constraint definition. I have resolved similar issue by explicitly
> adding check constraint expression in every SQL against the master table.
> This is also determined by the constraint_exclusion setting value. Check
> the manual (9.5): https://www.postgresql.org/docs/current/static/ddl-
> partitioning.html.
>
>
> I would try tweaking WHERE clause to match Check constraint definition.
> Global partitioning index (like in Oracle) would help, but its just my wish.
>
>
>
> Regards,
> Ganesh Kannan
>
>
>
> ------------------------------
> *From:* pgsql-performance-owner(at)postgresql(dot)org <pgsql-performance-owner@
> postgresql.org> on behalf of Mike Broers <mbroers(at)gmail(dot)com>
> *Sent:* Wednesday, September 21, 2016 12:53 PM
> *To:* pgsql-performance(at)postgresql(dot)org
> *Subject:* [PERFORM] query against single partition uses index, against
> master table does seq scan
>
> Hello, I am curious about the performance of queries against a master
> table that seem to do seq scans on each child table. When the same query
> is issued at a partition directly it uses the partition index and is very
> fast.
>
> The partition constraint is in the query criteria. We have non
> overlapping check constraints and constraint exclusion is set to partition.
>
> Here is the master table
> Column Type
> Modifiers
> aggregate_id bigint not null default
> nextval('seq_aggregate'::regclass)
> landing_id integer not null
> client_program_id integer
> sequence_number bigint
> start_datetime timestamp without time zone not null
> end_datetime timestamp without time zone not null
> body jsonb not null
> client_parsing_status_code character(1)
> validation_status_code character(1)
> client_parsing_datetime timestamp without time zone
> validation_datetime timestamp without time zone
> latest_flag_datetime timestamp without time zone
> latest_flag boolean not null
> Indexes:
> "pk_aggregate" PRIMARY KEY, btree (aggregate_id)
> "ix_aggregate_landing_id_aggregate_id_parsing_status" btree
> (landing_id, aggregate_id, client_parsing_status_code)
> "ix_aggregate_landing_id_start_datetime" btree (landing_id,
> start_datetime)
> "ix_aggregate_latest_flag" btree (latest_flag_datetime) WHERE
> latest_flag = false
> "ix_aggregate_validation_status_code" btree (validation_datetime)
> WHERE validation_status_code = 'P'::bpchar AND latest_flag = true
> Check constraints:
> "ck_aggregate_client_parsing_status_code" CHECK
> (client_parsing_status_code IS NULL OR (client_parsing_status_code = ANY
> (ARRAY['P'::bpchar, 'F'::bpchar, 'I'::bpchar])))
> "ck_aggregate_validation_status_code" CHECK (validation_status_code
> IS NULL OR (validation_status_code = ANY (ARRAY['P'::bpchar, 'F'::bpchar,
> 'I'::bpchar])))
> Foreign-key constraints:
> "fk_aggregate_client_program" FOREIGN KEY (client_program_id)
> REFERENCES client_program(client_program_id)
> "fk_aggregate_landing" FOREIGN KEY (landing_id) REFERENCES
> landing(landing_id)
> Number of child tables: 17 (Use \d+ to list them.)
>
> and here is a child table showing a check constraint
> Table "stage.aggregate__00007223"
> Column Type
> Modifiers
> ────────────────────────── ───────────────────────────
> aggregate_id bigint not null default
> nextval('seq_aggregate'::regclass)
> landing_id integer not null
> client_program_id integer
> sequence_number bigint
> start_datetime timestamp without time zone not null
> end_datetime timestamp without time zone not null
> body jsonb not null
> client_parsing_status_code character(1)
> validation_status_code character(1)
> client_parsing_datetime timestamp without time zone
> validation_datetime timestamp without time zone
> latest_flag_datetime timestamp without time zone
> latest_flag boolean not null
> Indexes:
> "pk_aggregate__00007223" PRIMARY KEY, btree (aggregate_id), tablespace
> "archive"
> "ix_aggregate__00007223_landing_id_aggregate_id_parsing_status" btree
> (landing_id, aggregate_id, client_parsing_status_code), tablespace "archive"
> "ix_aggregate__00007223_landing_id_start_datetime" btree (landing_id,
> start_datetime), tablespace "archive"
> "ix_aggregate__00007223_latest_flag" btree (latest_flag_datetime)
> WHERE latest_flag = false, tablespace "archive"
> "ix_aggregate__00007223_validation_status_code" btree
> (validation_datetime) WHERE validation_status_code = 'P'::bpchar AND
> latest_flag = true, tablespace "archive"
> Check constraints:
> "ck_aggregate__00007223_landing_id" CHECK (landing_id >= 7223 AND
> landing_id < 9503)
> "ck_aggregate_client_parsing_status_code" CHECK
> (client_parsing_status_code IS NULL OR (client_parsing_status_code = ANY
> (ARRAY['P'::bpchar, 'F'::bpchar, 'I'::bpchar])))
> "ck_aggregate_validation_status_code" CHECK (validation_status_code
> IS NULL OR (validation_status_code = ANY (ARRAY['P'::bpchar, 'F'::bpchar,
> 'I'::bpchar])))
> Inherits: aggregate
> Tablespace: "archive"
>
> Here is an example of the query explain plan against the master table:
>
> select landing_id from landing L
> where exists
> (
> select landing_id
> from stage.aggregate A
> WHERE (A.body#>>'{Cost}')::BIGINT >= 1000000000
> and L.landing_id = A.Landing_id
> )
> and L.source_id = 36
>
>
> Hash Join (cost=59793745.91..59793775.14 rows=28 width=4)
> Hash Cond: (a.landing_id = l.landing_id)
> -> HashAggregate (cost=59792700.41..59792721.46 rows=2105 width=4)
> Group Key: a.landing_id
> -> Append (cost=0.00..59481729.32 rows=124388438 width=4)
> -> Seq Scan on aggregate a (cost=0.00..0.00 rows=1 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00000000 a_1
> (cost=0.00..1430331.50 rows=2105558 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00000470 a_2
> (cost=0.00..74082.10 rows=247002 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00001435 a_3
> (cost=0.00..8174909.44 rows=17610357 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00001685 a_4
> (cost=0.00..11011311.44 rows=23516624 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00003836 a_5
> (cost=0.00..5833050.44 rows=13102557 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00005638 a_6
> (cost=0.00..5950768.16 rows=12342003 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00007223 a_7
> (cost=0.00..6561806.24 rows=13203237 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00009503 a_8
> (cost=0.00..5420961.64 rows=10931794 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00011162 a_9
> (cost=0.00..4262902.64 rows=8560011 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00012707 a_10
> (cost=0.00..4216271.28 rows=9077921 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00014695 a_11
> (cost=0.00..3441205.72 rows=7674495 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00016457 a_12
> (cost=0.00..688010.74 rows=1509212 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00016805 a_13
> (cost=0.00..145219.14 rows=311402 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00016871 a_14 (cost=0.00..21.40
> rows=190 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00016874 a_15
> (cost=0.00..478011.62 rows=1031110 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00017048 a_16 (cost=0.00..21.40
> rows=190 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Seq Scan on aggregate__00017049 a_17
> (cost=0.00..1792844.42 rows=3164774 width=4)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
> 1000000000)
> -> Hash (cost=1042.69..1042.69 rows=225 width=4)
> -> Seq Scan on landing l (cost=0.00..1042.69 rows=225 width=4)
> Filter: (source_id = 36)
>
> And here is an example of the query using the index when ran against a
> partition directly
>
> select landing_id from landing L
> where exists
> (
> select landing_id
> from stage.aggregate__00007223 A
> WHERE (A.body#>>'{Cost}')::BIGINT >= 1000000000
> and L.landing_id = A.Landing_id
> )
> and L.source_id = 36
>
> Nested Loop Semi Join (cost=0.56..3454.75 rows=5 width=4)
> -> Seq Scan on landing l (cost=0.00..1042.69 rows=225 width=4)
> Filter: (source_id = 36)
> -> Index Scan using ix_aggregate__00007223_landing_id_start_datetime
> on aggregate__00007223 a (cost=0.56..359345.74 rows=36173 width=4)
> Index Cond: (landing_id = l.landing_id)
> Filter: (((body #>> '{Cost}'::text[]))::bigint >= 1000000000)
>
>
> The parent table never had rows, and pg_class had relpages=0. I saw a
> suggestion in a different thread about updating this value to greater than
> 0 so I tried that but didnt get a different plan. We have
> autovacuum/analyze enabled and also run nightly vacuum/analyze on the
> database to keep stats up to date.
>
> I'm new to troubleshooting partition query performance and not sure what I
> am missing here. Any advice is appreciated.
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mike Broers 2016-09-21 21:00:31 Re: query against single partition uses index, against master table does seq scan
Previous Message Ganesh Kannan 2016-09-21 17:15:05 Re: query against single partition uses index, against master table does seq scan