Re: query against single partition uses index, against master table does seq scan

From: Mike Broers <mbroers(at)gmail(dot)com>
To: Ganesh Kannan <ganesh(dot)kannan(at)weatheranalytics(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query against single partition uses index, against master table does seq scan
Date: 2016-09-21 21:00:31
Message-ID: CAB9893gM+cS7pDqiibbjCBncUG5epWfFFYjV=ArdW51wNbQHzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

When I excluded the non indexed search criteria the query on aggregate used
the indexes on each partition, without specifying the constraint exclusion
criteria. When I added the constraint exclusion criteria to the non
indexed criteria, it still used seq scans.

I ended up getting an acceptable plan by using a subquery on the indexed
partition and using those results to scan for the unindexed value.

On Wed, Sep 21, 2016 at 12:37 PM, Mike Broers <mbroers(at)gmail(dot)com> wrote:

> Thanks for your response - Is 'selectively choosing what partition'
> different than utilizing each partitions index when scanning each
> partition? To clarify, I expect to find results in each partition, but to
> have postgres use each partitions index instead of full table scans. It
> seems redundant to add a where clauses to match each exclusion criteria but
> i will try that and report back - thank you for the suggestion.
>
> On Wed, Sep 21, 2016 at 12:15 PM, Ganesh Kannan <ganesh.kannan@
> weatheranalytics.com> wrote:
>
>> Postgres does not have capability to selectively choose child tables
>> unless the query's "WHERE" clause is simple, and it matches (exactly) the
>> CHECK constraint definition. I have resolved similar issue by explicitly
>> adding check constraint expression in every SQL against the master table.
>> This is also determined by the constraint_exclusion setting value. Check
>> the manual (9.5): https://www.postgresql.org/docs/current/static/ddl-pa
>> rtitioning.html.
>>
>>
>> I would try tweaking WHERE clause to match Check constraint definition.
>> Global partitioning index (like in Oracle) would help, but its just my wish.
>>
>>
>>
>> Regards,
>> Ganesh Kannan
>>
>>
>>
>> ------------------------------
>> *From:* pgsql-performance-owner(at)postgresql(dot)org <
>> pgsql-performance-owner(at)postgresql(dot)org> on behalf of Mike Broers <
>> mbroers(at)gmail(dot)com>
>> *Sent:* Wednesday, September 21, 2016 12:53 PM
>> *To:* pgsql-performance(at)postgresql(dot)org
>> *Subject:* [PERFORM] query against single partition uses index, against
>> master table does seq scan
>>
>> Hello, I am curious about the performance of queries against a master
>> table that seem to do seq scans on each child table. When the same query
>> is issued at a partition directly it uses the partition index and is very
>> fast.
>>
>> The partition constraint is in the query criteria. We have non
>> overlapping check constraints and constraint exclusion is set to partition.
>>
>> Here is the master table
>> Column Type
>> Modifiers
>> aggregate_id bigint not null default
>> nextval('seq_aggregate'::regclass)
>> landing_id integer not null
>> client_program_id integer
>> sequence_number bigint
>> start_datetime timestamp without time zone not null
>> end_datetime timestamp without time zone not null
>> body jsonb not null
>> client_parsing_status_code character(1)
>> validation_status_code character(1)
>> client_parsing_datetime timestamp without time zone
>> validation_datetime timestamp without time zone
>> latest_flag_datetime timestamp without time zone
>> latest_flag boolean not null
>> Indexes:
>> "pk_aggregate" PRIMARY KEY, btree (aggregate_id)
>> "ix_aggregate_landing_id_aggregate_id_parsing_status" btree
>> (landing_id, aggregate_id, client_parsing_status_code)
>> "ix_aggregate_landing_id_start_datetime" btree (landing_id,
>> start_datetime)
>> "ix_aggregate_latest_flag" btree (latest_flag_datetime) WHERE
>> latest_flag = false
>> "ix_aggregate_validation_status_code" btree (validation_datetime)
>> WHERE validation_status_code = 'P'::bpchar AND latest_flag = true
>> Check constraints:
>> "ck_aggregate_client_parsing_status_code" CHECK
>> (client_parsing_status_code IS NULL OR (client_parsing_status_code = ANY
>> (ARRAY['P'::bpchar, 'F'::bpchar, 'I'::bpchar])))
>> "ck_aggregate_validation_status_code" CHECK (validation_status_code
>> IS NULL OR (validation_status_code = ANY (ARRAY['P'::bpchar, 'F'::bpchar,
>> 'I'::bpchar])))
>> Foreign-key constraints:
>> "fk_aggregate_client_program" FOREIGN KEY (client_program_id)
>> REFERENCES client_program(client_program_id)
>> "fk_aggregate_landing" FOREIGN KEY (landing_id) REFERENCES
>> landing(landing_id)
>> Number of child tables: 17 (Use \d+ to list them.)
>>
>> and here is a child table showing a check constraint
>> Table "stage.aggregate__00007223"
>> Column Type
>> Modifiers
>> ────────────────────────── ───────────────────────────
>> aggregate_id bigint not null default
>> nextval('seq_aggregate'::regclass)
>> landing_id integer not null
>> client_program_id integer
>> sequence_number bigint
>> start_datetime timestamp without time zone not null
>> end_datetime timestamp without time zone not null
>> body jsonb not null
>> client_parsing_status_code character(1)
>> validation_status_code character(1)
>> client_parsing_datetime timestamp without time zone
>> validation_datetime timestamp without time zone
>> latest_flag_datetime timestamp without time zone
>> latest_flag boolean not null
>> Indexes:
>> "pk_aggregate__00007223" PRIMARY KEY, btree (aggregate_id),
>> tablespace "archive"
>> "ix_aggregate__00007223_landing_id_aggregate_id_parsing_status"
>> btree (landing_id, aggregate_id, client_parsing_status_code), tablespace
>> "archive"
>> "ix_aggregate__00007223_landing_id_start_datetime" btree
>> (landing_id, start_datetime), tablespace "archive"
>> "ix_aggregate__00007223_latest_flag" btree (latest_flag_datetime)
>> WHERE latest_flag = false, tablespace "archive"
>> "ix_aggregate__00007223_validation_status_code" btree
>> (validation_datetime) WHERE validation_status_code = 'P'::bpchar AND
>> latest_flag = true, tablespace "archive"
>> Check constraints:
>> "ck_aggregate__00007223_landing_id" CHECK (landing_id >= 7223 AND
>> landing_id < 9503)
>> "ck_aggregate_client_parsing_status_code" CHECK
>> (client_parsing_status_code IS NULL OR (client_parsing_status_code = ANY
>> (ARRAY['P'::bpchar, 'F'::bpchar, 'I'::bpchar])))
>> "ck_aggregate_validation_status_code" CHECK (validation_status_code
>> IS NULL OR (validation_status_code = ANY (ARRAY['P'::bpchar, 'F'::bpchar,
>> 'I'::bpchar])))
>> Inherits: aggregate
>> Tablespace: "archive"
>>
>> Here is an example of the query explain plan against the master table:
>>
>> select landing_id from landing L
>> where exists
>> (
>> select landing_id
>> from stage.aggregate A
>> WHERE (A.body#>>'{Cost}')::BIGINT >= 1000000000
>> and L.landing_id = A.Landing_id
>> )
>> and L.source_id = 36
>>
>>
>> Hash Join (cost=59793745.91..59793775.14 rows=28 width=4)
>> Hash Cond: (a.landing_id = l.landing_id)
>> -> HashAggregate (cost=59792700.41..59792721.46 rows=2105 width=4)
>> Group Key: a.landing_id
>> -> Append (cost=0.00..59481729.32 rows=124388438 width=4)
>> -> Seq Scan on aggregate a (cost=0.00..0.00 rows=1
>> width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00000000 a_1
>> (cost=0.00..1430331.50 rows=2105558 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00000470 a_2
>> (cost=0.00..74082.10 rows=247002 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00001435 a_3
>> (cost=0.00..8174909.44 rows=17610357 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00001685 a_4
>> (cost=0.00..11011311.44 rows=23516624 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00003836 a_5
>> (cost=0.00..5833050.44 rows=13102557 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00005638 a_6
>> (cost=0.00..5950768.16 rows=12342003 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00007223 a_7
>> (cost=0.00..6561806.24 rows=13203237 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00009503 a_8
>> (cost=0.00..5420961.64 rows=10931794 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00011162 a_9
>> (cost=0.00..4262902.64 rows=8560011 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00012707 a_10
>> (cost=0.00..4216271.28 rows=9077921 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00014695 a_11
>> (cost=0.00..3441205.72 rows=7674495 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00016457 a_12
>> (cost=0.00..688010.74 rows=1509212 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00016805 a_13
>> (cost=0.00..145219.14 rows=311402 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00016871 a_14 (cost=0.00..21.40
>> rows=190 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00016874 a_15
>> (cost=0.00..478011.62 rows=1031110 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00017048 a_16 (cost=0.00..21.40
>> rows=190 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Seq Scan on aggregate__00017049 a_17
>> (cost=0.00..1792844.42 rows=3164774 width=4)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >=
>> 1000000000)
>> -> Hash (cost=1042.69..1042.69 rows=225 width=4)
>> -> Seq Scan on landing l (cost=0.00..1042.69 rows=225 width=4)
>> Filter: (source_id = 36)
>>
>> And here is an example of the query using the index when ran against a
>> partition directly
>>
>> select landing_id from landing L
>> where exists
>> (
>> select landing_id
>> from stage.aggregate__00007223 A
>> WHERE (A.body#>>'{Cost}')::BIGINT >= 1000000000
>> and L.landing_id = A.Landing_id
>> )
>> and L.source_id = 36
>>
>> Nested Loop Semi Join (cost=0.56..3454.75 rows=5 width=4)
>> -> Seq Scan on landing l (cost=0.00..1042.69 rows=225 width=4)
>> Filter: (source_id = 36)
>> -> Index Scan using ix_aggregate__00007223_landing_id_start_datetime
>> on aggregate__00007223 a (cost=0.56..359345.74 rows=36173 width=4)
>> Index Cond: (landing_id = l.landing_id)
>> Filter: (((body #>> '{Cost}'::text[]))::bigint >= 1000000000)
>>
>>
>> The parent table never had rows, and pg_class had relpages=0. I saw a
>> suggestion in a different thread about updating this value to greater than
>> 0 so I tried that but didnt get a different plan. We have
>> autovacuum/analyze enabled and also run nightly vacuum/analyze on the
>> database to keep stats up to date.
>>
>> I'm new to troubleshooting partition query performance and not sure what
>> I am missing here. Any advice is appreciated.
>>
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jim Nasby 2016-09-21 21:42:04 Re: Strange nested loop for an INSERT
Previous Message Mike Broers 2016-09-21 17:37:19 Re: query against single partition uses index, against master table does seq scan